DETAILED ACTION
This action is in response to the claims filed 05/25/2022 for application 16/230,909. Claims 1-3, 10-12, and 20 have been amended. Claims 1-20 are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/25/2022 has been entered.
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1,
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
identifying a multimodal dataset of a data item, the multimodal dataset comprising a username and user profile data on a network site;
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes;
generating a first data modality from the user profile data;
forming modality mixture candidates from the first data modality and second data modality;
combining the modality mixture candidates multiplicatively and identifying top performing modalities as output data
generating a second data modality from the user profile data
generating a classification of the data item … 
select informative vectors of the multimodal vectors; 
selecting display content corresponding to the classification of the data item;
generating … message from the multimodal dataset;
	These limitations each recite a mental process of deciding, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
	a neural network
storing the classification of the data item
electronic message
overlaying the display content on the … message 
publishing the … message that includes the display content on a network site.
multimodal classification system implemented by one or more processors of a machine
multimodal selection engine implemented by the one or more processors of the machine
storage device of the machine
Using a neural network to perform the classification, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). 
Storing the results produced by an abstract idea is insignificant extra-solution activity that does not meaningfully limit the claim (see MPEP 2106.05(g)).  
“Electronic message” is generally linked to the use of the judicial exception to a particular technological environment or field of use.
Regarding “publishing the message that includes the display content on a network site” and “overlaying the display content on the message”, transmitting data to a network site and displaying it is "insignificant extra-solution activity" (see MPEP 2106.04(d) and 2106.05(g)). 
Regarding the additional elements “multimodal classification system implemented by one or more processors of a machine”, “multimodal selection engine implemented by the one or more processors of the machine”, and “storage device of the machine” are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Please see MPEP 2106.05(f)
Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination.
With regards to “using a neural network”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)).
The limitation of storing the results of the abstract idea is identified by MPEP 2106.05(d)(II)(iv), “storing and retrieving information in memory” as well-understood, routine, and conventional and thus cannot provide an inventive concept.  
Further, limiting the “generating a classification” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and additionally, transmitting a message and displaying it is well-understood, routine, conventional activity of receiving or transmitting data over a network. (see MPEP 2106.05(d)(II)(i)), and thus cannot provide an inventive concept. With regards to “electronic message”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B, thus do not add an inventive concept or provide significantly more to the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “multimodal classification system implemented by one or more processors of a machine, multimodal selection engine implemented by the one or more processors of the machine, and storage device of the machine to perform steps of the claimed process amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The claim is not patent eligible.

Regarding claim 2,
Claim 2 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… multiplicatively combining the multimodal vectors to select the informative vectors, wherein multiplicatively combining the multimodal vectors nulls non-informative vectors of the multimodal vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 3,
Claim 3 incorporates the rejection of claim 2.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
wherein the first model generator includes a Long Short-Term Memory (LSTM) generator, and the second model generator includes a Deep Neural Network (DNN) generator.
	This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 4,
Claim 4 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… generating candidate mixtures by additively combining the multimodal vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

	Regarding claim 5,
Claim 5 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
the selected informative vectors include one or more of the generated candidate mixtures.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.


	Regarding claim 6,
Claim 6 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim does not recite any additional idea.
Step 2A Prong 2: The claim recites the additional element of
the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user.
Using the multimodal selection method on a multimodal network site user data, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, because limiting the “multimodal network site user data” step to being performed by the multimodal selection method is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and thus does not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.

Regarding claim 7,
Claim 7 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim does not recite any additional abstract idea.
Step 2A Prong 2: The claim recites the additional element of
the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network
Using a neural network to generate the multimodal vectors, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 

Step 2B:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, because limiting the “generating multimodal vectors” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and thus does not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.
	
Regarding claim 8,
Claim 8 incorporates the rejection of claim 1.
Further, claim 8 recites only more specific of the judicial exceptions recited in claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
network site
client device
Network site and client device are generally linked to the use of the judicial exception to a particular technological environment or field of use.
Thus, the additional element do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: These additional elements are not sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. (see MPEP § 2106.05.I.A.) or provide an inventive concept in Step 2B.
With regards to network site and client device, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B.
The claim is not patent eligible.

Regarding claim 9,
Claim 9 incorporates the rejection of claim 8.
Further, claim 9 recites only more specific of the judicial exceptions recited in claim 8, and does not recite any further additional elements.
Therefore, this claim is not patent eligible for the reasons set forth in claim 8 above.  

Regarding claim 10,
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
identifying a multimodal dataset of a data item, the multimodal dataset comprising a username and user profile data on a network site;
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes;
generating a first data modality from the user profile data;
forming modality mixture candidates from the first data modality and second data modality;
combining the modality mixture candidates multiplicatively and identifying top performing modalities as output data
generating a second data modality from the user profile data
generating a classification of the data item … 
select informative vectors of the multimodal vectors; 
selecting display content corresponding to the classification of the data item;
generating … message from the multimodal dataset;
	These limitations each recite a mental process of deciding, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
	one or more processors of a machine
	memory
a neural network
storing the classification of the data item
electronic message
overlaying the display content on the … message 
publishing the … message that includes the display content on a network site.
multimodal classification system implemented by one or more processors of a machine
multimodal selection engine implemented by the one or more processors of the machine
storage device of the machine
Using a neural network to perform the classification, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). 
Storing the results produced by an abstract idea is insignificant extra-solution activity that does not meaningfully limit the claim (see MPEP 2106.05(g)).  
“Electronic message” is generally linked to the use of the judicial exception to a particular technological environment or field of use.
Regarding “publishing the message that includes the display content on a network site” and “overlaying the display content on the message”, transmitting data to a network site and displaying it is "insignificant extra-solution activity" (see MPEP 2106.04(d) and 2106.05(g)). 
Regarding the additional elements “one or more processors of a machine”, “memory”, “multimodal classification system implemented by one or more processors of a machine”, “multimodal selection engine implemented by the one or more processors of the machine”, and “storage device of the machine” are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Please see MPEP 2106.05(f)
Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination.
With regards to “using a neural network”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)).
The limitation of storing the results of the abstract idea is identified by MPEP 2106.05(d)(II)(iv), “storing and retrieving information in memory” as well-understood, routine, and conventional and thus cannot provide an inventive concept.  
Further, limiting the “generating a classification” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and additionally, transmitting a message and displaying it is well-understood, routine, conventional activity of receiving or transmitting data over a network. (see MPEP 2106.05(d)(II)(i)), and thus cannot provide an inventive concept. With regards to “electronic message”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B, thus do not add an inventive concept or provide significantly more to the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “one or more processors of a machine”, “memory”, “multimodal classification system implemented by one or more processors of a machine, multimodal selection engine implemented by the one or more processors of the machine, and storage device of the machine to perform steps of the claimed process amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The claim is not patent eligible.

Regarding claim 11,
Claim 11 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… multiplicatively combining the multimodal vectors to select the informative vectors, wherein multiplicatively combining the multimodal vectors nulls non-informative vectors of the multimodal vectors
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 12,
Claim 12 incorporates the rejection of claim 11.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
wherein the first model generator includes a Long Short-Term Memory (LSTM) generator, and the second model generator includes a Deep Neural Network (DNN) generator.
	This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 10 above. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 13,
Claim 13 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… generating candidate mixtures by additively combining the multimodal vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.


Regarding claim 14,
Claim 14 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
the selected informative vectors include one or more of the generated candidate mixtures.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 15,
Claim 15 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim does not recite any additional idea.
Step 2A Prong 2: The claim recites the additional element of
the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user.
Using the multimodal selection method on a multimodal network site user data, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 

Step 2B:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, because limiting the “multimodal network site user data” step to being performed by the multimodal selection method is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and thus does not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.

Regarding claim 16,
Claim 16 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.    

Regarding claim 17,
Claim 17 incorporates the rejection of claim 10.
Further, claim 17 recites only more specific of the judicial exceptions recited in claim 10.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
network site
client device
Network site and client device are generally linked to the use of the judicial exception to a particular technological environment or field of use.
Thus, the additional element do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: These additional elements are not sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. (see MPEP § 2106.05.I.A.) or provide an inventive concept in Step 2B.
With regards to network site and client device, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B.
The claim is not patent eligible.
Regarding claim 18,
Claim 18 incorporates the rejection of claim 17.
Further, claim 18 recites only more specific of the judicial exceptions recited in claim 17, and does not recite any further additional elements.
Therefore, this claim is not patent eligible for the reasons set forth in claim 8 above.  

Regarding claim 19,
Claim 19 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… multiplicatively combining the multimodal vectors to select the informative vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 20,
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
identifying a multimodal dataset of a data item, the multimodal dataset comprising a username and user profile data on a network site;
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes;
generating a first data modality from the user profile data;
forming modality mixture candidates from the first data modality and second data modality;
combining the modality mixture candidates multiplicatively and identifying top performing modalities as output data
generating a second data modality from the user profile data
generating a classification of the data item … 
select informative vectors of the multimodal vectors; 
selecting display content corresponding to the classification of the data item;
generating … message from the multimodal dataset;
	These limitations each recite a mental process of deciding, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
	a non-transitory machine-readable storage device	
a neural network
storing the classification of the data item
electronic message
overlaying the display content on the … message 
publishing the … message that includes the display content on a network site.
multimodal classification system implemented by one or more processors of a machine
multimodal selection engine implemented by the one or more processors of the machine
storage device of the machine
Using a neural network to perform the classification, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). 
Storing the results produced by an abstract idea is insignificant extra-solution activity that does not meaningfully limit the claim (see MPEP 2106.05(g)).  
“Electronic message” is generally linked to the use of the judicial exception to a particular technological environment or field of use.
Regarding “publishing the message that includes the display content on a network site” and “overlaying the display content on the message”, transmitting data to a network site and displaying it is "insignificant extra-solution activity" (see MPEP 2106.04(d) and 2106.05(g)). 
Regarding the additional elements “a non-transitory machine-readable storage device”, “multimodal classification system implemented by one or more processors of a machine”, “multimodal selection engine implemented by the one or more processors of the machine”, and “storage device of the machine” are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Please see MPEP 2106.05(f)
Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination.
With regards to “using a neural network”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)).
The limitation of storing the results of the abstract idea is identified by MPEP 2106.05(d)(II)(iv), “storing and retrieving information in memory” as well-understood, routine, and conventional and thus cannot provide an inventive concept.  
Further, limiting the “generating a classification” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and additionally, transmitting a message and displaying it is well-understood, routine, conventional activity of receiving or transmitting data over a network. (see MPEP 2106.05(d)(II)(i)), and thus cannot provide an inventive concept. With regards to “electronic message”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B, thus do not add an inventive concept or provide significantly more to the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a non-transitory machine-readable storage device”, “multimodal classification system implemented by one or more processors of a machine, multimodal selection engine implemented by the one or more processors of the machine, and storage device of the machine to perform steps of the claimed process amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 4-11, 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (“Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO”, hereinafter "Zhao") in view of  Laliberte ("US 20160359987 A1", hereinafter "Laliberte") and further in view of Long et al. ("Fully Convolutional Networks for Semantic Segmentation", hereinafter "Long") and further in view of Dutta (US 10783167 B1, hereinafter "Dutta").

Regarding claim 1, 
Zhao teaches a method comprising:
identifying a multimodal dataset of a data item; (Zhao, pg. 1939, fig. 2; col. 1, section III.A, para. 1, “we assign heterogeneous sub-networks to different modalities”; para.2, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities.”; pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos. These 30 thousands images are classified into 31 classes”).
generating, using a multimodal classification system implemented by one or more processors of a machine (See pg.1936, bottom left col, “We gratefully
acknowledge the support of NVIDIA Corporation with the donation of the
Tesla K40 GPU used for this research”), multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes; (Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.”).
forming, at an additive layer of a multimodal selection engine, modality mixture candidates from the first data modality and the second data modality; 
(“Specifically, to avoid the interference across modalities, we connect each sub-network to the objective function layer with part of the nodes in this layer. In terms of implementation, we can pre-train and fine-tune different subnetworks separately. The only connection across modalities is the same concept prior (i.e. label information). This auxiliary layer is used only for fine-tuning all the networks and it is discarded once all the networks are well trained. In this way, we fine-tune the whole sub-networks to yield high-level abstract feature representations for the classification task. After a series of non-linear transformations, these abstract features are able to express complex patterns. With this additional auxiliary layer in the fine-tuning stage, we combine the concept prior with deep generative learning. Meanwhile, we obtain the refined feature representations from the top layer of each branch sub-network, on a group-by-group basis. These new feature representations are concatenated as the input of the feature selection component.” [pg. 1940, right col, bottom para; Examiner is interpreting the auxiliary layer to be equivalent to “an additive layer” as it concatenates features from other layers of the network.])
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the additive layer (“With this additional auxiliary layer in the fine-tuning stage, we combine the concept prior with deep generative learning. Meanwhile, we obtain the refined feature representations from the top layer of each branch sub-network, on a group-by-group basis. These new feature representations are concatenated as the input of the feature selection component.” [pg. 1940, right col, bottom para])
generating, using the multimodal selection engine implemented by the one or more processors of the machine, a classification of the data item based on the output data from a neural network of the multimodal classification system trained to select informative vectors of the multimodal vectors; (Zhao, pg. 1937, col. 2, para. 1, “We applied our method to three real world datasets with several irrelevant noisy feature groups mixed for image classification tasks. Experimental results show that this framework can discover the relevant feature groups effectively and achieves better classification accuracies compared with several baseline approaches for heterogenous feature selection.”; Zhao, pg. 1939, col. 1, para. 2, “In the proposed framework, we utilize these multi-modal networks and the sparse group lasso jointly to select the feature groups that are relevant to classification tasks.”; Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.”; Zhao, pg. 1945, col. 1, para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value.”). 
	However Zhao does not explicitly teach the multimodal dataset comprising a username and a user profile data on a network site
	generating, at a first model generator, a first data modality from the username; generating, at a second model generator, a second data modality from the user profile data;
	Laliberte teaches the multimodal dataset comprising a username and a user profile data on a network site (“To view page 260 in the present example, the user selects the profile function 206 as well as the touch-selectable icon 218 of the sharing platform of interest. In response, the interface opens page 260 which is directed to the user's profile for the selected platform, as indicated by the corresponding static platform icon 262, and which displays a user profile data window 264, as well as a touch-sensitive button 266 to update selected profile credentials/parameters, and a touch-sensitive button 268 to have the user add another profile. In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072])
	generating, at a first model generator, a first data modality from the username (“In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072; See [¶0039] discloses: sharing personal content “For instance, it is currently commonplace for individuals to post or share personal content (e.g. text, images, pictures, videos, etc.) via one or more sharing platforms, be they social media platforms such as Facebook™ Twitter™, Pinterest™, Instagram™, etc.”)); 
generating, at a second model generator, a second data modality from the user profile data (“To view page 260 in the present example, the user selects the profile function 206 as well as the touch-selectable icon 218 of the sharing platform of interest. In response, the interface opens page 260 which is directed to the user's profile for the selected platform, as indicated by the corresponding static platform icon 262, and which displays a user profile data window 264, as well as a touch-sensitive button 266 to update selected profile credentials/parameters, and a touch-sensitive button 268 to have the user add another profile. In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072]; See [¶0039] discloses: sharing personal content “For instance, it is currently commonplace for individuals to post or share personal content (e.g. text, images, pictures, videos, etc.) via one or more sharing platforms, be they social media platforms such as Facebook™ Twitter™, Pinterest™, Instagram™, etc.”));
	Zhao teaches a heterogenous feature selection method with multi-modal neural networks. Laliberte teaches user content sharing system for social media networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhao’s teachings by substituting the modality data of Zhao with the user data modality data as taught by Laliberte. Using multi-modal data is well-known in the field of machine learning and thus one would have been motivated to make this modification in order to yield predictable results. 
	Zhao/Laliberte fails to explicitly teach 
combining, at a multiplicative layer of the multimodal selection engine, the modality mixture candidates multiplicatively and identifying top performing modalities as output data, 
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the additive layer and the multiplicative layer;
Long teaches combining, at a multiplicative layer of the multimodal selection engine, the modality mixture candidates multiplicatively and identifying top performing modalities as output data (“Our DAG nets learn to combine coarse, high layer information with fine, low layer information. Pooling and prediction layers are shown as grids that reveal relative spatial coarseness, while intermediate layers are shown as vertical lines. First row (FCN-32s): Our singlestream net, described in Section 4.1, upsamples stride 32 predictions back to pixels in a single step. Second row (FCN-16s): Combining predictions from both the final layer and the pool4 layer, at stride 16, lets our net predict finer details, while retaining high-level semantic information. Third row (FCN-8s): Additional predictions from pool3, at stride 8, provide further precision.” [pg. 3435, Figure 3 caption; See further, pg. 3433, top left para, discloses multiplicatively: “a matrix multiplication for convolution or average pooling, a spatial max for max pooling, or an elementwise nonlinearity for an activation function, and so on for other types of layers.”), 
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the multiplicative layer (See Figure 3, pg. 3435; Our DAG nets learn to combine coarse, high layer information with fine, low layer information. Pooling and prediction layers are shown as grids that reveal relative spatial coarseness, while intermediate layers are shown as vertical lines. First row (FCN-32s): Our singlestream net, described in Section 4.1, upsamples stride 32 predictions back to pixels in a single step. Second row (FCN-16s): Combining predictions from both the final layer and the pool4 layer, at stride 16, lets our net predict finer details, while retaining high-level semantic information. Third row (FCN-8s): Additional predictions from pool3, at stride 8, provide further precision.)
	Zhao teaches a heterogenous feature selection method with multi-modal neural networks. Laliberte teaches user content sharing system for social media networks. Long teaches a method using CNNs for semantic segmentation. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Zhao/Laliberte to implement the multiplicative layer and outputting top performing features as taught by Long. One would have been motivated to make this modification in order to produce more accurate outputs with efficient inference. [Abstract, Long]
Zhao/Laliberte/Long fails to explicitly teach storing, in a storage device of the machine, the classification of the data item; selecting display content corresponding to the classification of the data item; generating an electronic message from the multimodal dataset; 
overlaying the display content on the electronic message; and 
publishing the electronic message that includes the display content on the network site. 
Dutta, however, teaches storing, in a storage device of the machine the classification of the data item; (Dutta, fig. 7: MEMORY 718, CLASSIFICATION DATA 102; Dutta, col. 20, ln. 67 – col. 21, ln. 2, “The classification module may be configured to generate classification data 102 or modify existing classification data 102 based on user interaction data 120”; Dutta, col. 5, ln. 64-66, “The classification data 102(1) and item data 106 may be stored in association with one or more classification servers 108 or other types of computing devices.”). selecting display content corresponding to the classification of the data item; (Dutta, col. 6, ln. 4-9, “the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.”; Dutta, col. 6, ln. 17-20, “In some implementations, selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”). generating an electronic message (Dutta, Fig. 1A:106) from the multimodal dataset; (Dutta, Fig. 1A; col. 5, ln. 26-33; “FIG. 1A depicts … Classification data 102 may include a plurality of classification labels 104 which may be applied to items. For example, classification labels 104 may include alphanumeric descriptors, images, or other types of data that may be used to differentiate particular types of items from other types of items.”; col. 6, ln. 17-22; “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed. For example, selection of the “Running” label may cause item data 106 associated with different types of running shoes to be displayed in the user interface 112.”). overlaying the display content (Dutta, Fig. 1A:104) on the electronic message (Dutta, Fig. 1A:106); and publishing the electronic message (Dutta, Fig. 1A:106) that includes the display content (Dutta, Fig. 1A:104) on the network site (Dutta, Fig. 1A:114). (Dutta, col. 6, ln. 17-21, “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”; col. 5, ln. 62-64; “item data 106 indicative of characteristics of particular items may include an indication of one or more of the classification labels 104.”). 
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao/Laliberte/Long by storing a classification of a data item, selecting display content, generating an electronic message, overlaying the display content, and publishing the electronic message as taught by Dutta. The motivation to do so is that the selected content can be presented on a display over a network site. (“the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.” (Dutta, Fig. 1A; col. 6 ln. 4-9)).

Regarding claim 2,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises multiplicatively combining the multimodal vectors to select the informative vectors. (Zhao, pg. 1939, col. 1, para. 4, “The Feature Selection Component aims to find the optimal weights for all the feature groups by solving the optimization problem with sparse group lasso. As a result, the features with small weights are dropped out.”; Zhao, pg. 1939, col. 1, para. 5, “When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out.”), wherein multiplicatively combining the multimodal vectors nulls non-informative vectors of the multimodal vectors. (Zhao, pg. 1941, col. 1, para. 3, “Not only some feature groups but also some features within the same group are discarded if their weights are zero. The features whose weights are nonzeros are selected.”; Zhao, pg. 1942, col. 2, para. 1, “According to this importance vector, the feature groups with nonzero weights are selected and they are considered more relevant to the current task. These features are used for the final recognition task. At the same time, if the sparsity parameter λ1≠0, some features within the same group are also left out in order to improve the efficiency of the model.”; Zhao, pg. 1945, col. 1 para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value. In contrast, the other three methods assign incorrect weights to those irrelevant features because of the distinction of the heterogeneous features. For example, with the GLLR method using the original features, high weights have been assigned to the group of random noise. Even though the MKL method assigns every feature group with a different weight, it cannot select those feature groups that are more relevant to the classification task. We notice that the results of MtBGS are close to ours. For the dataset of Animal-10, MtBGS assigns a zero weight to the random noise group but a relatively high weight to the noisy feature group. For the NUS-WIDE-Object dataset, MtBGS sets the weights for all the feature groups to nonzeros. With higher sparsity coefficients, MtBGS can filter out most of the feature groups, nevertheless it gives a poor classification performance.”).

Regarding claim 4, 
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises generating candidate mixtures by additively combining the multimodal vectors. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.” (Concatenating is interpreted as adding or combining feature vectors together to generate one feature vector, i.e., "additively combining".)).


Regarding claim 5,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein the selected informative vectors include one or more … generated candidate mixtures. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.” (This concatenation is interpreted as one or more generated candidate mixtures.)).

Regarding claim 6, The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Laliberte further teaches wherein the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user (“For instance, upon selecting the accounts function 202, the interface may be operated to render an accounts page 246 that displays a user account window 248 in which different user account-related data may be presented (e.g. posts to date 250 using the interface or system, total compensation earned 252 using the system, a redeemable balance left in the account 254, and an average compensation rate overall 256, to name a few examples)” [¶0071])
Same motivation to combine the teachings of Zhao/Laliberte/Long/Dutta


Regarding claim 7,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network. (Zhao, pg. 1940, col. 1, para. 2; “In this paper, we choose SDA as the base deep architecture for the sub-networks as the inputs are numerical vectors.”).

Regarding claim 8,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein the multimodal dataset comprises … image data generated by a client device of the user, and text data authored by the user. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, the Zhao/Laliberte/Long/Dutta combination does not explicitly teach profile data of a user of the network site, but Laliberte teaches this limitation. (Laliberte Fig. 20; Fig. 23; [0139] “with reference now to FIG. 20, an exemplary screenshot 2000 is provided of a user profile page for user “MIKE123” showing profile posting and contact details such as a number of posted images or videos, a number of users following and a number of users being followed by this user.”; “in FIG. 23, a screenshot 2300 is provided of an image and video sharing platform interface in which a user's text-based post 2302 (e.g. a text message otherwise destined for Twitter™, SnapChat™ or other such text-based sharing platforms) is set using an embodiment of the system”.).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include profile data of a user of the network. The motivation to do so is that registered users can provide useful information to the service to receive compensation for their use of the system. (“Since registered users of the content integration platform will generally provide accurate/legitimate contact and/or demographic information in registering to the service in order to receive compensation for their use of the system, external content providers will in response receive valuable information not only on the content originators selecting to push their brand, and their social network to which the branded content was pushed, but also confirmed viewership of this embedded branded content by virtue of the scan code concurrently embedded within this integrated content.” (Laliberte [0102])).

Regarding claim 9,
The Zhao/Laliberte/Long/Dutta/Laliberte combination teaches the method of claim 8, (and thus the rejection of claim 8 is incorporated). Zhao further teaches wherein the electronic message …. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, Zhao does not explicitly teach includes ephemeral message., but Laliberte teaches this limitation. (Laliberte Fig. 23; [0139] “in the post 2304 of FIG. 23, the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.”; “the system may be configured to integrate the text-message in a dynamically selected user-related image, selected for example as a function of the time of day (e.g. night time vs. daytime scenery), … profile status or the like.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include ephemeral message. The motivation to do so is that the emotion intended by user’s message can be reflected. (“the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.” (Laliberte Fig. 23; [0139]).

Regarding claim 10,
Zhou teaches a system comprising: one or more processors of a machine; and a memory storing instructions that, when executed by the one or more processors, cause the machine to perform operations comprising: (pg. 1944, col. 2, par. 3, “The GLLR and MtBGS are implemented with the sparse learning package of SLEP. The SVM in all our experiments is implemented with the LIBSVM5 software package. We implemented the multi-modal neural networks with the deep learning library of Theano.6 Considering the computational demand for training the multi-modal neural networks, we run our algorithm on GPU to accelerate the training procedure.”), in which a processor and a memory storing instructions are inherent. Zhou clearly implements their method on a computer.
identifying a multimodal dataset of a data item; (Zhao, pg. 1939, fig. 2; col. 1, section III.A, para. 1, “we assign heterogeneous sub-networks to different modalities”; para.2, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities.”; pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos. These 30 thousands images are classified into 31 classes”).
generating, using a multimodal classification system implemented by one or more processors of a machine (See pg.1936, bottom left col, “We gratefully
acknowledge the support of NVIDIA Corporation with the donation of the
Tesla K40 GPU used for this research”), multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes; (Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.”).
forming, at an additive layer of a multimodal selection engine, modality mixture candidates from the first data modality and the second data modality; 
(“Specifically, to avoid the interference across modalities, we connect each sub-network to the objective function layer with part of the nodes in this layer. In terms of implementation, we can pre-train and fine-tune different subnetworks separately. The only connection across modalities is the same concept prior (i.e. label information). This auxiliary layer is used only for fine-tuning all the networks and it is discarded once all the networks are well trained. In this way, we fine-tune the whole sub-networks to yield high-level abstract feature representations for the classification task. After a series of non-linear transformations, these abstract features are able to express complex patterns. With this additional auxiliary layer in the fine-tuning stage, we combine the concept prior with deep generative learning. Meanwhile, we obtain the refined feature representations from the top layer of each branch sub-network, on a group-by-group basis. These new feature representations are concatenated as the input of the feature selection component.” [pg. 1940, right col, bottom para; Examiner is interpreting the auxiliary layer to be equivalent to “an additive layer” as it concatenates features from other layers of the network.])
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the additive layer (“With this additional auxiliary layer in the fine-tuning stage, we combine the concept prior with deep generative learning. Meanwhile, we obtain the refined feature representations from the top layer of each branch sub-network, on a group-by-group basis. These new feature representations are concatenated as the input of the feature selection component.” [pg. 1940, right col, bottom para])
generating, using the multimodal selection engine implemented by the one or more processors of the machine, a classification of the data item based on the output data from a neural network of the multimodal classification system trained to select informative vectors of the multimodal vectors; (Zhao, pg. 1937, col. 2, para. 1, “We applied our method to three real world datasets with several irrelevant noisy feature groups mixed for image classification tasks. Experimental results show that this framework can discover the relevant feature groups effectively and achieves better classification accuracies compared with several baseline approaches for heterogenous feature selection.”; Zhao, pg. 1939, col. 1, para. 2, “In the proposed framework, we utilize these multi-modal networks and the sparse group lasso jointly to select the feature groups that are relevant to classification tasks.”; Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.”; Zhao, pg. 1945, col. 1, para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value.”). 
	However Zhao does not explicitly teach the multimodal dataset comprising a username and a user profile data on a network site
	generating, at a first model generator, a first data modality from the username; generating, at a second model generator, a second data modality from the user profile data;
	Laliberte teaches the multimodal dataset comprising a username and a user profile data on a network site (“To view page 260 in the present example, the user selects the profile function 206 as well as the touch-selectable icon 218 of the sharing platform of interest. In response, the interface opens page 260 which is directed to the user's profile for the selected platform, as indicated by the corresponding static platform icon 262, and which displays a user profile data window 264, as well as a touch-sensitive button 266 to update selected profile credentials/parameters, and a touch-sensitive button 268 to have the user add another profile. In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072])
	generating, at a first model generator, a first data modality from the username (“In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072; See [¶0039] discloses: sharing personal content “For instance, it is currently commonplace for individuals to post or share personal content (e.g. text, images, pictures, videos, etc.) via one or more sharing platforms, be they social media platforms such as Facebook™ Twitter™, Pinterest™, Instagram™, etc.”)); 
generating, at a second model generator, a second data modality from the user profile data (“To view page 260 in the present example, the user selects the profile function 206 as well as the touch-selectable icon 218 of the sharing platform of interest. In response, the interface opens page 260 which is directed to the user's profile for the selected platform, as indicated by the corresponding static platform icon 262, and which displays a user profile data window 264, as well as a touch-sensitive button 266 to update selected profile credentials/parameters, and a touch-sensitive button 268 to have the user add another profile. In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072]; See [¶0039] discloses: sharing personal content “For instance, it is currently commonplace for individuals to post or share personal content (e.g. text, images, pictures, videos, etc.) via one or more sharing platforms, be they social media platforms such as Facebook™ Twitter™, Pinterest™, Instagram™, etc.”));
	Zhao teaches a heterogenous feature selection method with multi-modal neural networks. Laliberte teaches user content sharing system for social media networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhao’s teachings by substituting the modality data of Zhao with the user data modality data as taught by Laliberte. Using multi-modal data is well-known in the field of machine learning and thus one would have been motivated to make this modification in order to yield predictable results. 
	Zhao/Laliberte fails to explicitly teach 
combining, at a multiplicative layer of the multimodal selection engine, the modality mixture candidates multiplicatively and identifying top performing modalities as output data, 
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the additive layer and the multiplicative layer;
Long teaches combining, at a multiplicative layer of the multimodal selection engine, the modality mixture candidates multiplicatively and identifying top performing modalities as output data (“Our DAG nets learn to combine coarse, high layer information with fine, low layer information. Pooling and prediction layers are shown as grids that reveal relative spatial coarseness, while intermediate layers are shown as vertical lines. First row (FCN-32s): Our singlestream net, described in Section 4.1, upsamples stride 32 predictions back to pixels in a single step. Second row (FCN-16s): Combining predictions from both the final layer and the pool4 layer, at stride 16, lets our net predict finer details, while retaining high-level semantic information. Third row (FCN-8s): Additional predictions from pool3, at stride 8, provide further precision.” [pg. 3435, Figure 3 caption; See further, pg. 3433, top left para, discloses multiplicatively: “a matrix multiplication for convolution or average pooling, a spatial max for max pooling, or an elementwise nonlinearity for an activation function, and so on for other types of layers.”), 
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the multiplicative layer (See Figure 3, pg. 3435; Our DAG nets learn to combine coarse, high layer information with fine, low layer information. Pooling and prediction layers are shown as grids that reveal relative spatial coarseness, while intermediate layers are shown as vertical lines. First row (FCN-32s): Our singlestream net, described in Section 4.1, upsamples stride 32 predictions back to pixels in a single step. Second row (FCN-16s): Combining predictions from both the final layer and the pool4 layer, at stride 16, lets our net predict finer details, while retaining high-level semantic information. Third row (FCN-8s): Additional predictions from pool3, at stride 8, provide further precision.)
	Zhao teaches a heterogenous feature selection method with multi-modal neural networks. Laliberte teaches user content sharing system for social media networks. Long teaches a method using CNNs for semantic segmentation. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Zhao/Laliberte to implement the multiplicative layer and outputting top performing features as taught by Long. One would have been motivated to make this modification in order to produce more accurate outputs with efficient inference. [Abstract, Long]
Zhao/Laliberte/Long fails to explicitly teach storing, in a storage device of the machine, the classification of the data item; selecting display content corresponding to the classification of the data item; generating an electronic message from the multimodal dataset; 
overlaying the display content on the electronic message; and 
publishing the electronic message that includes the display content on the network site. 
Dutta, however, teaches storing, in a storage device of the machine the classification of the data item; (Dutta, fig. 7: MEMORY 718, CLASSIFICATION DATA 102; Dutta, col. 20, ln. 67 – col. 21, ln. 2, “The classification module may be configured to generate classification data 102 or modify existing classification data 102 based on user interaction data 120”; Dutta, col. 5, ln. 64-66, “The classification data 102(1) and item data 106 may be stored in association with one or more classification servers 108 or other types of computing devices.”). selecting display content corresponding to the classification of the data item; (Dutta, col. 6, ln. 4-9, “the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.”; Dutta, col. 6, ln. 17-20, “In some implementations, selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”). generating an electronic message (Dutta, Fig. 1A:106) from the multimodal dataset; (Dutta, Fig. 1A; col. 5, ln. 26-33; “FIG. 1A depicts … Classification data 102 may include a plurality of classification labels 104 which may be applied to items. For example, classification labels 104 may include alphanumeric descriptors, images, or other types of data that may be used to differentiate particular types of items from other types of items.”; col. 6, ln. 17-22; “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed. For example, selection of the “Running” label may cause item data 106 associated with different types of running shoes to be displayed in the user interface 112.”). overlaying the display content (Dutta, Fig. 1A:104) on the electronic message (Dutta, Fig. 1A:106); and publishing the electronic message (Dutta, Fig. 1A:106) that includes the display content (Dutta, Fig. 1A:104) on the network site (Dutta, Fig. 1A:114). (Dutta, col. 6, ln. 17-21, “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”; col. 5, ln. 62-64; “item data 106 indicative of characteristics of particular items may include an indication of one or more of the classification labels 104.”). 
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao/Laliberte/Long by storing a classification of a data item, selecting display content, generating an electronic message, overlaying the display content, and publishing the electronic message as taught by Dutta. The motivation to do so is that the selected content can be presented on a display over a network site. (“the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.” (Dutta, Fig. 1A; col. 6 ln. 4-9)).

Regarding claim 11,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises multiplicatively combining the multimodal vectors to select the informative vectors. (Zhao, pg. 1939, col. 1, para. 4, “The Feature Selection Component aims to find the optimal weights for all the feature groups by solving the optimization problem with sparse group lasso. As a result, the features with small weights are dropped out.”; Zhao, pg. 1939, col. 1, para. 5, “When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out.”), wherein multiplicatively combining the multimodal vectors nulls non-informative vectors of the multimodal vectors. (Zhao, pg. 1941, col. 1, para. 3, “Not only some feature groups but also some features within the same group are discarded if their weights are zero. The features whose weights are nonzeros are selected.”; Zhao, pg. 1942, col. 2, para. 1, “According to this importance vector, the feature groups with nonzero weights are selected and they are considered more relevant to the current task. These features are used for the final recognition task. At the same time, if the sparsity parameter λ1≠0, some features within the same group are also left out in order to improve the efficiency of the model.”; Zhao, pg. 1945, col. 1 para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value. In contrast, the other three methods assign incorrect weights to those irrelevant features because of the distinction of the heterogeneous features. For example, with the GLLR method using the original features, high weights have been assigned to the group of random noise. Even though the MKL method assigns every feature group with a different weight, it cannot select those feature groups that are more relevant to the classification task. We notice that the results of MtBGS are close to ours. For the dataset of Animal-10, MtBGS assigns a zero weight to the random noise group but a relatively high weight to the noisy feature group. For the NUS-WIDE-Object dataset, MtBGS sets the weights for all the feature groups to nonzeros. With higher sparsity coefficients, MtBGS can filter out most of the feature groups, nevertheless it gives a poor classification performance.”).

Regarding claim 13,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises generating candidate mixtures by additively combining the multimodal vectors. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.” (Concatenating is interpreted as adding or combining feature vectors together to generate one feature vector, i.e., "additively combining".)).

Regarding claim 14,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 13, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein the selected informative vectors include one or more … generated candidate mixtures. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.” (This concatenation is interpreted as one or more generated candidate mixtures.)).

Regarding claim 15, The Zhao/Laliberte/Long/Dutta combination teaches the system of claim 10, (and thus the rejection of claim 10 is incorporated). Laliberte further teaches wherein the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user (“For instance, upon selecting the accounts function 202, the interface may be operated to render an accounts page 246 that displays a user account window 248 in which different user account-related data may be presented (e.g. posts to date 250 using the interface or system, total compensation earned 252 using the system, a redeemable balance left in the account 254, and an average compensation rate overall 256, to name a few examples)” [¶0071])
Same motivation to combine the teachings of Zhao/Laliberte/Long/Dutta

Regarding claim 16,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network. (Zhao, pg. 1939, col. 2, para. 3 – pg. 1940, col. 1, para. 1; “To process different data, several architectures have been developed to construct the internal structure of the deep neural networks, including deep neural networks (DNN) [40], deep belief networks (DBN) [41], stacked denoising autoencoders (SDA) [42], and convolutional neural networks (CNN) [43]. With these deep architectures, different performance can be achieved for a variety of data sources.”).



Regarding claim 17,
The Zhao/Laliberte/Long/Dutta combination teaches the system of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein the multimodal dataset comprises … image data generated by a client device of the user, and text data authored by the user. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, the Zhao/Laliberte/Long/Dutta combination does not explicitly teach profile data of a user of the network site, but Laliberte teaches this limitation. (Laliberte Fig. 20; Fig. 23; [0139] “with reference now to FIG. 20, an exemplary screenshot 2000 is provided of a user profile page for user “MIKE123” showing profile posting and contact details such as a number of posted images or videos, a number of users following and a number of users being followed by this user.”; “in FIG. 23, a screenshot 2300 is provided of an image and video sharing platform interface in which a user's text-based post 2302 (e.g. a text message otherwise destined for Twitter™, SnapChat™ or other such text-based sharing platforms) is set using an embodiment of the system”.).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include profile data of a user of the network. The motivation to do so is that registered users can provide useful information to the service to receive compensation for their use of the system. (“Since registered users of the content integration platform will generally provide accurate/legitimate contact and/or demographic information in registering to the service in order to receive compensation for their use of the system, external content providers will in response receive valuable information not only on the content originators selecting to push their brand, and their social network to which the branded content was pushed, but also confirmed viewership of this embedded branded content by virtue of the scan code concurrently embedded within this integrated content.” (Laliberte [0102])).


Regarding claim 18,
The Zhao/Laliberte/Long/Dutta/Laliberte combination teaches the system of claim 17, (and thus the rejection of claim 17 is incorporated). Zhao further teaches wherein the electronic message …. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, Zhao/Laliberte/Long/Dutta does not explicitly teach includes ephemeral message., but Laliberte teaches this limitation. (Laliberte Fig. 23; [0139] “in the post 2304 of FIG. 23, the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.”; “the system may be configured to integrate the text-message in a dynamically selected user-related image, selected for example as a function of the time of day (e.g. night time vs. daytime scenery), … profile status or the like.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include ephemeral message. The motivation to do so is that the emotion intended by user’s message can be reflected. (“the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.” (Laliberte Fig. 23; [0139]).

Regarding claim 19,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises multiplicatively combining the multimodal vectors to select the informative vectors. (Zhao, pg. 1939, col. 1, para. 4, “The Feature Selection Component aims to find the optimal weights for all the feature groups by solving the optimization problem with sparse group lasso. As a result, the features with small weights are dropped out.”; Zhao, pg. 1939, col. 1, para. 5, “When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out.”). 

Regarding claim 20, 
Zhou teaches a non-transitory machine-readable storage device embodying instructions that, when executed by a machine, cause the machine to perform operations comprising: (pg. 1944, col. 2, par. 3, "The GLLR and MtBGS are implemented with the sparse learning package of SLEP. The SVM in all our experiments is implemented with the LIBSVM5 software package. We implemented the multi-modal neural networks with the deep learning library of Theano.6 Considering the computational demand for training the multi-modal neural networks, we run our algorithm on GPU to accelerate the training procedure."), in which a processor and a memory storing instructions are inherent. Zhou clearly implements their method on a computer.
identifying a multimodal dataset of a data item; (Zhao, pg. 1939, fig. 2; col. 1, section III.A, para. 1, “we assign heterogeneous sub-networks to different modalities”; para.2, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities.”; pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos. These 30 thousands images are classified into 31 classes”).
generating, using a multimodal classification system implemented by one or more processors of a machine (See pg.1936, bottom left col, “We gratefully
acknowledge the support of NVIDIA Corporation with the donation of the
Tesla K40 GPU used for this research”), multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes; (Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.”).
forming, at an additive layer of a multimodal selection engine, modality mixture candidates from the first data modality and the second data modality; 
(“Specifically, to avoid the interference across modalities, we connect each sub-network to the objective function layer with part of the nodes in this layer. In terms of implementation, we can pre-train and fine-tune different subnetworks separately. The only connection across modalities is the same concept prior (i.e. label information). This auxiliary layer is used only for fine-tuning all the networks and it is discarded once all the networks are well trained. In this way, we fine-tune the whole sub-networks to yield high-level abstract feature representations for the classification task. After a series of non-linear transformations, these abstract features are able to express complex patterns. With this additional auxiliary layer in the fine-tuning stage, we combine the concept prior with deep generative learning. Meanwhile, we obtain the refined feature representations from the top layer of each branch sub-network, on a group-by-group basis. These new feature representations are concatenated as the input of the feature selection component.” [pg. 1940, right col, bottom para; Examiner is interpreting the auxiliary layer to be equivalent to “an additive layer” as it concatenates features from other layers of the network.])
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the additive layer (“With this additional auxiliary layer in the fine-tuning stage, we combine the concept prior with deep generative learning. Meanwhile, we obtain the refined feature representations from the top layer of each branch sub-network, on a group-by-group basis. These new feature representations are concatenated as the input of the feature selection component.” [pg. 1940, right col, bottom para])
generating, using the multimodal selection engine implemented by the one or more processors of the machine, a classification of the data item based on the output data from a neural network of the multimodal classification system trained to select informative vectors of the multimodal vectors; (Zhao, pg. 1937, col. 2, para. 1, “We applied our method to three real world datasets with several irrelevant noisy feature groups mixed for image classification tasks. Experimental results show that this framework can discover the relevant feature groups effectively and achieves better classification accuracies compared with several baseline approaches for heterogenous feature selection.”; Zhao, pg. 1939, col. 1, para. 2, “In the proposed framework, we utilize these multi-modal networks and the sparse group lasso jointly to select the feature groups that are relevant to classification tasks.”; Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.”; Zhao, pg. 1945, col. 1, para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value.”). 
	However Zhao does not explicitly teach the multimodal dataset comprising a username and a user profile data on a network site
	generating, at a first model generator, a first data modality from the username; generating, at a second model generator, a second data modality from the user profile data;
	Laliberte teaches the multimodal dataset comprising a username and a user profile data on a network site (“To view page 260 in the present example, the user selects the profile function 206 as well as the touch-selectable icon 218 of the sharing platform of interest. In response, the interface opens page 260 which is directed to the user's profile for the selected platform, as indicated by the corresponding static platform icon 262, and which displays a user profile data window 264, as well as a touch-sensitive button 266 to update selected profile credentials/parameters, and a touch-sensitive button 268 to have the user add another profile. In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072])
	generating, at a first model generator, a first data modality from the username (“In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072; See [¶0039] discloses: sharing personal content “For instance, it is currently commonplace for individuals to post or share personal content (e.g. text, images, pictures, videos, etc.) via one or more sharing platforms, be they social media platforms such as Facebook™ Twitter™, Pinterest™, Instagram™, etc.”)); 
generating, at a second model generator, a second data modality from the user profile data (“To view page 260 in the present example, the user selects the profile function 206 as well as the touch-selectable icon 218 of the sharing platform of interest. In response, the interface opens page 260 which is directed to the user's profile for the selected platform, as indicated by the corresponding static platform icon 262, and which displays a user profile data window 264, as well as a touch-sensitive button 266 to update selected profile credentials/parameters, and a touch-sensitive button 268 to have the user add another profile. In this example, the displayed and updatable profile includes a username and password 270 of the user for this sharing platform (e.g. usable in enabling the sharing and integration system to post content on the user's behalf, and optionally track a visibility thereof once posted), as well as different preset posting preferences such as a preferred brand type 272 (showing a preference for luxury brands over other selectable types such as family, entertainment, local, dining, etc., to name a few examples), a preferred branding level for this sharing platform 274 (showing a low level selection) and a condition for applying these preferences 276 (showing a preference that the user be asked to confirm before posting).” [¶0072]; See [¶0039] discloses: sharing personal content “For instance, it is currently commonplace for individuals to post or share personal content (e.g. text, images, pictures, videos, etc.) via one or more sharing platforms, be they social media platforms such as Facebook™ Twitter™, Pinterest™, Instagram™, etc.”));
	Zhao teaches a heterogenous feature selection method with multi-modal neural networks. Laliberte teaches user content sharing system for social media networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhao’s teachings by substituting the modality data of Zhao with the user data modality data as taught by Laliberte. Using multi-modal data is well-known in the field of machine learning and thus one would have been motivated to make this modification in order to yield predictable results. 
	Zhao/Laliberte fails to explicitly teach 
combining, at a multiplicative layer of the multimodal selection engine, the modality mixture candidates multiplicatively and identifying top performing modalities as output data, 
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the additive layer and the multiplicative layer;
Long teaches combining, at a multiplicative layer of the multimodal selection engine, the modality mixture candidates multiplicatively and identifying top performing modalities as output data (“Our DAG nets learn to combine coarse, high layer information with fine, low layer information. Pooling and prediction layers are shown as grids that reveal relative spatial coarseness, while intermediate layers are shown as vertical lines. First row (FCN-32s): Our singlestream net, described in Section 4.1, upsamples stride 32 predictions back to pixels in a single step. Second row (FCN-16s): Combining predictions from both the final layer and the pool4 layer, at stride 16, lets our net predict finer details, while retaining high-level semantic information. Third row (FCN-8s): Additional predictions from pool3, at stride 8, provide further precision.” [pg. 3435, Figure 3 caption; See further, pg. 3433, top left para, discloses multiplicatively: “a matrix multiplication for convolution or average pooling, a spatial max for max pooling, or an elementwise nonlinearity for an activation function, and so on for other types of layers.”), 
wherein the multimodal classification system comprises the modal generator and the multimodal selection engine, the multimodal selection engine comprising the multiplicative layer (See Figure 3, pg. 3435; Our DAG nets learn to combine coarse, high layer information with fine, low layer information. Pooling and prediction layers are shown as grids that reveal relative spatial coarseness, while intermediate layers are shown as vertical lines. First row (FCN-32s): Our singlestream net, described in Section 4.1, upsamples stride 32 predictions back to pixels in a single step. Second row (FCN-16s): Combining predictions from both the final layer and the pool4 layer, at stride 16, lets our net predict finer details, while retaining high-level semantic information. Third row (FCN-8s): Additional predictions from pool3, at stride 8, provide further precision.)
	Zhao teaches a heterogenous feature selection method with multi-modal neural networks. Laliberte teaches user content sharing system for social media networks. Long teaches a method using CNNs for semantic segmentation. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Zhao/Laliberte to implement the multiplicative layer and outputting top performing features as taught by Long. One would have been motivated to make this modification in order to produce more accurate outputs with efficient inference. [Abstract, Long]
Zhao/Laliberte/Long fails to explicitly teach storing, in a storage device of the machine, the classification of the data item; selecting display content corresponding to the classification of the data item; generating an electronic message from the multimodal dataset; 
overlaying the display content on the electronic message; and 
publishing the electronic message that includes the display content on the network site. 
Dutta, however, teaches storing, in a storage device of the machine the classification of the data item; (Dutta, fig. 7: MEMORY 718, CLASSIFICATION DATA 102; Dutta, col. 20, ln. 67 – col. 21, ln. 2, “The classification module may be configured to generate classification data 102 or modify existing classification data 102 based on user interaction data 120”; Dutta, col. 5, ln. 64-66, “The classification data 102(1) and item data 106 may be stored in association with one or more classification servers 108 or other types of computing devices.”). selecting display content corresponding to the classification of the data item; (Dutta, col. 6, ln. 4-9, “the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.”; Dutta, col. 6, ln. 17-20, “In some implementations, selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”). generating an electronic message (Dutta, Fig. 1A:106) from the multimodal dataset; (Dutta, Fig. 1A; col. 5, ln. 26-33; “FIG. 1A depicts … Classification data 102 may include a plurality of classification labels 104 which may be applied to items. For example, classification labels 104 may include alphanumeric descriptors, images, or other types of data that may be used to differentiate particular types of items from other types of items.”; col. 6, ln. 17-22; “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed. For example, selection of the “Running” label may cause item data 106 associated with different types of running shoes to be displayed in the user interface 112.”). overlaying the display content (Dutta, Fig. 1A:104) on the electronic message (Dutta, Fig. 1A:106); and publishing the electronic message (Dutta, Fig. 1A:106) that includes the display content (Dutta, Fig. 1A:104) on the network site (Dutta, Fig. 1A:114). (Dutta, col. 6, ln. 17-21, “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”; col. 5, ln. 62-64; “item data 106 indicative of characteristics of particular items may include an indication of one or more of the classification labels 104.”). 
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao/Laliberte/Long by storing a classification of a data item, selecting display content, generating an electronic message, overlaying the display content, and publishing the electronic message as taught by Dutta. The motivation to do so is that the selected content can be presented on a display over a network site. (“the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.” (Dutta, Fig. 1A; col. 6 ln. 4-9)).

Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Laliberte/Long/Dutta and further in view of Meisheri et al. ("Textmining at EmoInt-2017: A Deep Learning Approach to Sentiment Intensity Scoring of English Tweets", hereinafter "Meisheri").
Regarding claim 3,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). However fails to explicitly teach wherein the first model generator includes a Long Short-Term Memory (LSTM) generator, and the second model generator includes a Deep Neural Network (DNN) generator
Meisheri teaches wherein the first model generator includes a Long Short-Term Memory (LSTM) generator, and the second model generator includes a Deep Neural Network (DNN) generator (“Proposed system architecture is presented in Figure 3, which integrates convolutional neural network (CNN) and Long short term memory networks (LSTM). As shown, output of CNN and LSTM is merged, along with feature sets A and B. Before merging output of CNN layer is flatten to match dimension of other features. This is achieved through the Merge layer as shown. Output of merge layer is then propagated to fully connected neural network layer with 10 hidden units. Finally, output layer is defined with single hidden unit.” [pg. 195, §3.6 Unified Model, ¶1; CNN corresponds to a Deep Neural Network generator])
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao/Laliberte/Long/Dutta by using the unified architecture with a LSTM and CNN as taught by Meisheri. One would have been motivated to make this modification in order to use the models for sentiment analysis in social media networks. [pg. 193, Introduction, Meisheri]


Regarding claim 12,
The Zhao/Laliberte/Long/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). However fails to explicitly teach wherein the first model generator includes a Long Short-Term Memory (LSTM) generator, and the second model generator includes a Deep Neural Network (DNN) generator
Meisheri teaches wherein the first model generator includes a Long Short-Term Memory (LSTM) generator, and the second model generator includes a Deep Neural Network (DNN) generator (“Proposed system architecture is presented in Figure 3, which integrates convolutional neural network (CNN) and Long short term memory networks (LSTM). As shown, output of CNN and LSTM is merged, along with feature sets A and B. Before merging output of CNN layer is flatten to match dimension of other features. This is achieved through the Merge layer as shown. Output of merge layer is then propagated to fully connected neural network layer with 10 hidden units. Finally, output layer is defined with single hidden unit.” [pg. 195, §3.6 Unified Model, ¶1; CNN corresponds to a Deep Neural Network generator])
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao/Laliberte/Long/Dutta by using the unified architecture with a LSTM and CNN as taught by Meisheri. One would have been motivated to make this modification in order to use the models for sentiment analysis in social media networks. [pg. 193, Introduction, Meisheri]


Response to Arguments
Applicant's arguments filed 05/25/2022 have been fully considered but they are not persuasive. 

Regarding the 35 U.S.C. §101 Rejection:
Applicant’s arguments on pg. 8 regarding the 101 rejection has been considered but are not persuasive. Applicant argues that the claims are directed to an improvement to an existing technology or technological field, however examiner respectfully disagrees. The claims as currently recited, under BRI, appear to be merely using deep neural networks as tools to perform the abstract idea. The claims do not appear to focus on the improvement of the functioning of a computer or processor nor an improvement to the training process of the neural network. It appears that the claims are focused on the improvement of the abstract idea. Improvements to an abstract idea are still considered to be an abstract idea. As noted above, the newly amended limitations are considered to more mental steps and additional elements which do not provide an inventive concept nor add significantly more to the claim. Please see the updated 101 rejection above.

Regarding the 35 U.S.C. §103 Rejection:
Applicant’s arguments regarding the previous prior arts of Zhao and Dutta failing to teach the newly amended limitations has been considered but are moot because the newly amended limitations are now taught by the prior arts of Laliberte and Long. Please see the updated 103 rejection above.
Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122   

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122