Detailed Action
This action is in response to claims filed December 28, 2021 for application 16/230,909 filed December 21, 2018. Claims 1, 5, 6, 8, 9, 10, 14, 15, 17, 18, and 20 are amended. Claims 1-20 are pending. 

  Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4/8/2019 was filed. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC 101 because the claimed invention is directed to an abstract idea without significantly more. 



Regarding claim 1,
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
identifying a multimodal dataset of a data item;
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes;
generating a classification of the data item … 
select informative vectors of the multimodal vectors; 
selecting display content corresponding to the classification of the data item;
generating … message from the multimodal dataset;
	These limitations each recite a mental process of deciding, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
	a neural network
storing the classification of the data item
electronic message
overlaying the display content on the … message 
publishing the … message that includes the display content on a network site.
Using a neural network to perform the classification, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). 
Storing the results produced by an abstract idea is insignificant extra-solution activity that does not meaningfully limit the claim (see MPEP 2106.05(g)).  
“Electronic message” is generally linked to the use of the judicial exception to a particular technological environment or field of use.
Regarding “publishing the message that includes the display content on a network site” and “overlaying the display content on the message”, transmitting data to a network site and displaying it is "insignificant extra-solution activity" (see MPEP 2106.04(d) and 2106.05(g)). 
Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination.
With regards to “using a neural network”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)).
The limitation of storing the results of the abstract idea is identified by MPEP 2106.05(d)(II)(iv), “storing and retrieving information in memory” as well-understood, routine, and conventional and thus cannot provide an inventive concept.  
Further, limiting the “generating a classification” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and additionally, transmitting a message and displaying it is well-understood, routine, conventional activity of receiving or transmitting data over a network. (see MPEP 2106.05(d)(II)(i)), and thus cannot provide an inventive concept. With regards to “electronic message”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B, thus do not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.

Regarding claim 2,
Claim 2 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… multiplicatively combining the multimodal vectors to select the informative vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 3,
Claim 3 incorporates the rejection of claim 2.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… nulls non-informative vectors of the multimodal vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

	Regarding claim 4,
Claim 4 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… generating candidate mixtures by additively combining the multimodal vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

	Regarding claim 5,
Claim 5 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
the selected informative vectors include one or more of the generated candidate mixtures.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.


	Regarding claim 6,
Claim 6 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim does not recite any additional idea.
Step 2A Prong 2: The claim recites the additional element of
the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user.
Using the multimodal selection method on a multimodal network site user data, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, because limiting the “multimodal network site user data” step to being performed by the multimodal selection method is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and thus does not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.

	Regarding claim 7,
Claim 7 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim does not recite any additional abstract idea.
Step 2A Prong 2: The claim recites the additional element of
the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network
Using a neural network to generate the multimodal vectors, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 

Step 2B:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, because limiting the “generating multimodal vectors” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and thus does not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.
	
Regarding claim 8,
Claim 8 incorporates the rejection of claim 1.
Further, claim 8 recites only more specific of the judicial exceptions recited in claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
network site
client device
Network site and client device are generally linked to the use of the judicial exception to a particular technological environment or field of use.
Thus, the additional element do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: These additional elements are not sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. (see MPEP § 2106.05.I.A.) or provide an inventive concept in Step 2B.
With regards to network site and client device, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B.
The claim is not patent eligible.

Regarding claim 9,
Claim 9 incorporates the rejection of claim 8.
Further, claim 9 recites only more specific of the judicial exceptions recited in claim 8, and does not recite any further additional elements.
Therefore, this claim is not patent eligible for the reasons set forth in claim 8 above.  

	Regarding claim 10,
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations: – using a processor and a memory to perform the steps.
identifying a multimodal dataset of a data item;
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes;
generating a classification of the data item … 
select informative vectors of the multimodal vectors; 
selecting display content corresponding to the classification of the data item;
generating … message from the multimodal dataset;
	These limitations are mental process of deciding, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements: 
processors
machine
memory
neural network
storing the classification of the data item
electronic message
overlaying the display content on the … message
publishing the … message that includes the display content on a network site.
The processor in both steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of ranking information based on a determined amount of use) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
Using a neural network to perform the classification, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). 
Storing the results produced by an abstract idea is insignificant extra-solution activity that does not meaningfully limit the claim (see MPEP 2106.05(g)).  
“Electronic message” is generally linked to the use of the judicial exception to a particular technological environment or field of use.
Regarding “publishing the message that includes the display content on a network site” and “overlaying the display content on the message”, transmitting data to a network site and displaying it is "insignificant extra-solution activity" (see MPEP 2106.04(d) and 2106.05(g)). 
Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor and a memory to perform the steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. 
With regards to “using a neural network”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)).
The limitation of storing the results of the abstract idea is identified by MPEP 2106.05(d)(II)(iv), “storing and retrieving information in memory” as well-understood, routine, and conventional and thus cannot provide an inventive concept.  
Further, limiting the “generating a classification” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and additionally, transmitting a message and displaying it is well-understood, routine, conventional activity of receiving or transmitting data over a network. (see MPEP 2106.05(d)(II)(i)), and thus cannot provide an inventive concept. With regards to “electronic message”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B, thus do not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.


Regarding claim 11,
Claim 11 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… multiplicatively combining the multimodal vectors to select the informative vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 12,
Claim 12 incorporates the rejection of claim 11.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… nulls non-informative vectors of the multimodal vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 13,
Claim 13 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… generating candidate mixtures by additively combining the multimodal vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.


Regarding claim 14,
Claim 14 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
the selected informative vectors include one or more of the generated candidate mixtures.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

Regarding claim 15,
Claim 15 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim does not recite any additional idea.
Step 2A Prong 2: The claim recites the additional element of
the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user.
Using the multimodal selection method on a multimodal network site user data, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 

Step 2B:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, because limiting the “multimodal network site user data” step to being performed by the multimodal selection method is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and thus does not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.

Regarding claim 16,
Claim 16 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.    

Regarding claim 17,
Claim 17 incorporates the rejection of claim 10.
Further, claim 17 recites only more specific of the judicial exceptions recited in claim 10.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
network site
client device
Network site and client device are generally linked to the use of the judicial exception to a particular technological environment or field of use.
Thus, the additional element do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: These additional elements are not sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. (see MPEP § 2106.05.I.A.) or provide an inventive concept in Step 2B.
With regards to network site and client device, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B.
The claim is not patent eligible.

Regarding claim 18,
Claim 18 incorporates the rejection of claim 17.
Further, claim 18 recites only more specific of the judicial exceptions recited in claim 17, and does not recite any further additional elements.
Therefore, this claim is not patent eligible for the reasons set forth in claim 8 above.  

Regarding claim 19,
Claim 19 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
… multiplicatively combining the multimodal vectors to select the informative vectors.
	This limitation is a mental process, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.


Regarding claim 20,
Step 1: The claim recites a composition of matter, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations: – using a processor and a memory to perform the steps.
identifying a multimodal dataset of a data item;
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes;
generating a classification of the data item … 
select informative vectors of the multimodal vectors; 
selecting display content corresponding to the classification of the data item;
generating … message from the multimodal dataset;
	These limitations are a mental process of deciding, which can reasonably be performed in the mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements: 
processors
machine
memory
neural network
storing the classification of the data item
electronic message
overlaying the display content on the … message
publishing the … message that includes the display content on a network site.
The processor in both steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of ranking information based on a determined amount of use) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
Using a neural network to perform the classification, generally recited, is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)). 
Storing the results produced by an abstract idea is insignificant extra-solution activity that does not meaningfully limit the claim (see MPEP 2106.05(g)).  
“Electronic message” is generally linked to the use of the judicial exception to a particular technological environment or field of use.
Regarding “publishing the message that includes the display content on a network site” and “overlaying the display content on the message”, transmitting data to a network site and displaying it is "insignificant extra-solution activity" (see MPEP 2106.04(d) and 2106.05(g)). 
Thus, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor and a memory to perform the steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. 
With regards to “using a neural network”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)).
The limitation of storing the results of the abstract idea is identified by MPEP 2106.05(d)(II)(iv), “storing and retrieving information in memory” as well-understood, routine, and conventional and thus cannot provide an inventive concept.  
Further, limiting the “generating a classification” step to being performed by a neural network is only indicating a technological environment in which to apply the judicial exception (see MPEP 2106.05(h)), and additionally, transmitting a message and displaying it is well-understood, routine, conventional activity of receiving or transmitting data over a network. (see MPEP 2106.05(d)(II)(i)), and thus cannot provide an inventive concept. With regards to “electronic message”, specifying a particular technological environment in which to apply the judicial exception does not provide an inventive concept (see MPEP 2106.05(h)) in Step 2B, thus do not add an inventive concept or provide significantly more to the abstract idea.
The claim is not patent eligible.


Claim 20 is rejected under 35 USC 101 because the claimed invention is directed to non-statutory subject matter.  The claim does not fall within at least one of the four categories of patent eligible subject matter because the claim recites “a machine readable storage device embodying instructions” where the scope of the recited storage device, under broadest reasonable interpretation in light of the specification, includes transitory embodiments such as signals per se.  Examiner suggests amendment to read “a non-transitory machine readable storage device embodying instructions”.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1-5, 7, 10-14, 16, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (“Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO”) in view of Dutta (US 10783167 B1).

Regarding claim 1, 
Zhao teaches a method comprising:
identifying a multimodal dataset of a data item; (Zhao, pg. 1939, fig. 2; col. 1, section III.A, para. 1, “we assign heterogeneous sub-networks to different modalities”; para.2, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities.”; pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos. These 30 thousands images are classified into 31 classes”).
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes; (Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.”).
generating a classification of the data item from a neural network trained to select informative vectors of the multimodal vectors; (Zhao, pg. 1937, col. 2, para. 1, “We applied our method to three real world datasets with several irrelevant noisy feature groups mixed for image classification tasks. Experimental results show that this framework can discover the relevant feature groups effectively and achieves better classification accuracies compared with several baseline approaches for heterogenous feature selection.”; Zhao, pg. 1939, col. 1, para. 2, “In the proposed framework, we utilize these multi-modal networks and the sparse group lasso jointly to select the feature groups that are relevant to classification tasks.”; Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.”; Zhao, pg. 1945, col. 1, para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value.”). 
Zhao does not appear to explicitly teach storing the classification of the data item; selecting display content corresponding to the classification of the data item; generating an electronic message from the multimodal dataset; 
overlaying the display content on the electronic message; and 
publishing the electronic message that includes the display content on a network site. 
Dutta, however, teaches storing the classification of the data item; (Dutta, fig. 7: MEMORY 718, CLASSIFICATION DATA 102; Dutta, col. 20, ln. 67 – col. 21, ln. 2, “The classification module may be configured to generate classification data 102 or modify existing classification data 102 based on user interaction data 120”; Dutta, col. 5, ln. 64-66, “The classification data 102(1) and item data 106 may be stored in association with one or more classification servers 108 or other types of computing devices.”). selecting display content corresponding to the classification of the data item; (Dutta, col. 6, ln. 4-9, “the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.”; Dutta, col. 6, ln. 17-20, “In some implementations, selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”). generating an electronic message (Dutta, Fig. 1A:106) from the multimodal dataset; (Dutta, Fig. 1A; col. 5, ln. 26-33; “FIG. 1A depicts … Classification data 102 may include a plurality of classification labels 104 which may be applied to items. For example, classification labels 104 may include alphanumeric descriptors, images, or other types of data that may be used to differentiate particular types of items from other types of items.”; col. 6, ln. 17-22; “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed. For example, selection of the “Running” label may cause item data 106 associated with different types of running shoes to be displayed in the user interface 112.”). overlaying the display content (Dutta, Fig. 1A:104) on the electronic message (Dutta, Fig. 1A:106); and publishing the electronic message (Dutta, Fig. 1A:106) that includes the display content (Dutta, Fig. 1A:104) on a network site (Dutta, Fig. 1A:114). (Dutta, col. 6, ln. 17-21, “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”; col. 5, ln. 62-64; “item data 106 indicative of characteristics of particular items may include an indication of one or more of the classification labels 104.”). 
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao by storing a classification of a data item, selecting display content, generating an electronic message, overlaying the display content, and publishing the electronic message as taught by Dutta. The motivation to do so is that the selected content can be presented on a display over a network site. (“the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.” (Dutta, Fig. 1A; col. 6 ln. 4-9)).

Regarding claim 2,
The Zhao/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises multiplicatively combining the multimodal vectors to select the informative vectors. (Zhao, pg. 1939, col. 1, para. 4, “The Feature Selection Component aims to find the optimal weights for all the feature groups by solving the optimization problem with sparse group lasso. As a result, the features with small weights are dropped out.”; Zhao, pg. 1939, col. 1, para. 5, “When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out.”). 

Regarding claim 3,
The Zhao/Dutta combination teaches the method of claim 2, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein multiplicatively combining the multimodal vectors nulls non-informative vectors of the multimodal vectors. (Zhao, pg. 1941, col. 1, para. 3, “Not only some feature groups but also some features within the same group are discarded if their weights are zero. The features whose weights are nonzeros are selected.”; Zhao, pg. 1942, col. 2, para. 1, “According to this importance vector, the feature groups with nonzero weights are selected and they are considered more relevant to the current task. These features are used for the final recognition task. At the same time, if the sparsity parameter λ1≠0, some features within the same group are also left out in order to improve the efficiency of the model.”; Zhao, pg. 1945, col. 1 para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value. In contrast, the other three methods assign incorrect weights to those irrelevant features because of the distinction of the heterogeneous features. For example, with the GLLR method using the original features, high weights have been assigned to the group of random noise. Even though the MKL method assigns every feature group with a different weight, it cannot select those feature groups that are more relevant to the classification task. We notice that the results of MtBGS are close to ours. For the dataset of Animal-10, MtBGS assigns a zero weight to the random noise group but a relatively high weight to the noisy feature group. For the NUS-WIDE-Object dataset, MtBGS sets the weights for all the feature groups to nonzeros. With higher sparsity coefficients, MtBGS can filter out most of the feature groups, nevertheless it gives a poor classification performance.”).

Regarding claim 4, 
The Zhao/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises generating candidate mixtures by additively combining the multimodal vectors. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.” (Concatenating is interpreted as adding or combining feature vectors together to generate one feature vector, i.e., "additively combining".)).

Regarding claim 5,
The Zhao/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein the selected informative vectors include one or more … generated candidate mixtures. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.” (This concatenation is interpreted as one or more generated candidate mixtures.)).

Regarding claim 7,
The Zhao/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network. (Zhao, pg. 1940, col. 1, para. 2; “In this paper, we choose SDA as the base deep architecture for the sub-networks as the inputs are numerical vectors.”).

Regarding claim 10,
Zhou teaches a system comprising: one or more processors of a machine; and a memory storing instructions that, when executed by the one or more processors, cause the machine to perform operations comprising: (pg. 1944, col. 2, par. 3, “The GLLR and MtBGS are implemented with the sparse learning package of SLEP. The SVM in all our experiments is implemented with the LIBSVM5 software package. We implemented the multi-modal neural networks with the deep learning library of Theano.6 Considering the computational demand for training the multi-modal neural networks, we run our algorithm on GPU to accelerate the training procedure.”), in which a processor and a memory storing instructions are inherent. Zhou clearly implements their method on a computer.

identifying a multimodal dataset of a data item; (Zhao, pg. 1939, fig. 2; pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos. These 30 thousands images are classified into 31 classes”)

generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes; (Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.”)

generating a classification of the data item from a neural network trained to select informative vectors of the multimodal vectors; (Zhao, pg. 1937, col. 2, para. 1, “We applied our method to three real world datasets with several irrelevant noisy feature groups mixed for image classification tasks. Experimental results show that this framework can discover the relevant feature groups effectively and achieves better classification accuracies compared with several baseline approaches for heterogenous feature selection.”; Zhao, pg. 1939, col. 1, para. 2, “In the proposed framework, we utilize these multi-modal networks and the sparse group lasso jointly to select the feature groups that are relevant to classification tasks.”; Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.”; Zhao, pg. 1945, col. 1, para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value.”).

Regarding claim 10, 
Zhao teaches a system comprising:
Zhou teaches a system comprising: one or more processors of a machine; and a memory storing instructions that, when executed by the one or more processors, cause the machine to perform operations comprising: (pg. 1944, col. 2, par. 3, “The GLLR and MtBGS are implemented with the sparse learning package of SLEP. The SVM in all our experiments is implemented with the LIBSVM5 software package. We implemented the multi-modal neural networks with the deep learning library of Theano.6 Considering the computational demand for training the multi-modal neural networks, we run our algorithm on GPU to accelerate the training procedure.”), in which a processor and a memory storing instructions are inherent. Zhou clearly implements their method on a computer.

identifying a multimodal dataset of a data item; (Zhao, pg. 1939, fig. 2; col. 1, section III.A, para. 1, “we assign heterogeneous sub-networks to different modalities”; para.2, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities.”; pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos. These 30 thousands images are classified into 31 classes”).
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes; (Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.”).
generating a classification of the data item from a neural network trained to select informative vectors of the multimodal vectors; (Zhao, pg. 1937, col. 2, para. 1, “We applied our method to three real world datasets with several irrelevant noisy feature groups mixed for image classification tasks. Experimental results show that this framework can discover the relevant feature groups effectively and achieves better classification accuracies compared with several baseline approaches for heterogenous feature selection.”; Zhao, pg. 1939, col. 1, para. 2, “In the proposed framework, we utilize these multi-modal networks and the sparse group lasso jointly to select the feature groups that are relevant to classification tasks.”; Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.”; Zhao, pg. 1945, col. 1, para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value.”). 
Zhao does not appear to explicitly teach storing the classification of the data item; selecting display content corresponding to the classification of the data item; generating an electronic message from the multimodal dataset; 
overlaying the display content on the electronic message; and 
publishing the electronic message that includes the display content on a network site. 
Dutta, however, teaches storing the classification of the data item; (Dutta, fig. 7: MEMORY 718, CLASSIFICATION DATA 102; Dutta, col. 20, ln. 67 – col. 21, ln. 2, “The classification module may be configured to generate classification data 102 or modify existing classification data 102 based on user interaction data 120”; Dutta, col. 5, ln. 64-66, “The classification data 102(1) and item data 106 may be stored in association with one or more classification servers 108 or other types of computing devices.”). selecting display content corresponding to the classification of the data item; (Dutta, col. 6, ln. 4-9, “the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.”; Dutta, col. 6, ln. 17-20, “In some implementations, selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”). generating an electronic message (Dutta, Fig. 1A:106) from the multimodal dataset; (Dutta, Fig. 1A; col. 5, ln. 26-33; “FIG. 1A depicts … Classification data 102 may include a plurality of classification labels 104 which may be applied to items. For example, classification labels 104 may include alphanumeric descriptors, images, or other types of data that may be used to differentiate particular types of items from other types of items.”; col. 6, ln. 17-22; “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed. For example, selection of the “Running” label may cause item data 106 associated with different types of running shoes to be displayed in the user interface 112.”). overlaying the display content (Dutta, Fig. 1A:104) on the electronic message (Dutta, Fig. 1A:106); and publishing the electronic message (Dutta, Fig. 1A:106) that includes the display content (Dutta, Fig. 1A:104) on a network site (Dutta, Fig. 1A:114). (Dutta, col. 6, ln. 17-21, “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”; col. 5, ln. 62-64; “item data 106 indicative of characteristics of particular items may include an indication of one or more of the classification labels 104.”). 
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao by storing a classification of a data item, selecting display content, generating an electronic message, overlaying the display content, and publishing the electronic message as taught by Dutta. The motivation to do so is that the selected content can be presented on a display over a network site. (“the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.” (Dutta, Fig. 1A; col. 6 ln. 4-9)).

Regarding claim 11,
The Zhao/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises multiplicatively combining the multimodal vectors to select the informative vectors. (Zhao, pg. 1939, col. 1, para. 4, “The Feature Selection Component aims to find the optimal weights for all the feature groups by solving the optimization problem with sparse group lasso. As a result, the features with small weights are dropped out.”; Zhao, pg. 1939, col. 1, para. 5, “When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out.”). 

Regarding claim 12,
The Zhao/Dutta combination teaches the method of claim 11, (and thus the rejection of claim 11 is incorporated). Zhao further teaches wherein multiplicatively combining the multimodal vectors nulls non-informative vectors of the multimodal vectors. (Zhao, pg. 1941, col. 1, para. 3, “Not only some feature groups but also some features within the same group are discarded if their weights are zero. The features whose weights are nonzeros are selected.”; Zhao, pg. 1942, col. 2, para. 1, “According to this importance vector, the feature groups with nonzero weights are selected and they are considered more relevant to the current task. These features are used for the final recognition task. At the same time, if the sparsity parameter λ1≠0, some features within the same group are also left out in order to improve the efficiency of the model.”; Zhao, pg. 1945, col. 1 para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value. In contrast, the other three methods assign incorrect weights to those irrelevant features because of the distinction of the heterogeneous features. For example, with the GLLR method using the original features, high weights have been assigned to the group of random noise. Even though the MKL method assigns every feature group with a different weight, it cannot select those feature groups that are more relevant to the classification task. We notice that the results of MtBGS are close to ours. For the dataset of Animal-10, MtBGS assigns a zero weight to the random noise group but a relatively high weight to the noisy feature group. For the NUS-WIDE-Object dataset, MtBGS sets the weights for all the feature groups to nonzeros. With higher sparsity coefficients, MtBGS can filter out most of the feature groups, nevertheless it gives a poor classification performance.”).

Regarding claim 13,
The Zhao/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises generating candidate mixtures by additively combining the multimodal vectors. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.” (Concatenating is interpreted as adding or combining feature vectors together to generate one feature vector, i.e., "additively combining".)).

Regarding claim 14,
The Zhao/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein the selected informative vectors include one or more … generated candidate mixtures. (Zhao, pg. 1939, col. 1, para. 5 - pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.” (This concatenation is interpreted as one or more generated candidate mixtures.)).

Regarding claim 16,
The Zhao/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein the machine learning schemes include one or more of: a convolutional neural network, a recurrent neural network, a bidirectional recurrent neural network, a fully connected neural network. (Zhao, pg. 1939, col. 2, para. 3 – pg. 1940, col. 1, para. 1; “To process different data, several architectures have been developed to construct the internal structure of the deep neural networks, including deep neural networks (DNN) [40], deep belief networks (DBN) [41], stacked denoising autoencoders (SDA) [42], and convolutional neural networks (CNN) [43]. With these deep architectures, different performance can be achieved for a variety of data sources.”).

Regarding claim 19,
The Zhao/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein generating the classification using the neural network comprises multiplicatively combining the multimodal vectors to select the informative vectors. (Zhao, pg. 1939, col. 1, para. 4, “The Feature Selection Component aims to find the optimal weights for all the feature groups by solving the optimization problem with sparse group lasso. As a result, the features with small weights are dropped out.”; Zhao, pg. 1939, col. 1, para. 5, “When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out.”). 

Regarding claim 20, 
Zhou teaches a machine-readable storage device embodying instructions that, when executed by a machine, cause the machine to perform operations comprising: (pg. 1944, col. 2, par. 3, "The GLLR and MtBGS are implemented with the sparse learning package of SLEP. The SVM in all our experiments is implemented with the LIBSVM5 software package. We implemented the multi-modal neural networks with the deep learning library of Theano.6 Considering the computational demand for training the multi-modal neural networks, we run our algorithm on GPU to accelerate the training procedure."), in which a processor and a memory storing instructions are inherent. Zhou clearly implements their method on a computer.

identifying a multimodal dataset of a data item; (Zhao, pg. 1939, fig. 2; col. 1, section III.A, para. 1, “we assign heterogeneous sub-networks to different modalities”; para.2, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities.”; pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos. These 30 thousands images are classified into 31 classes”).
generating multimodal vectors in different modalities from the multimodal dataset using different machine learning schemes; (Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector.”).
generating a classification of the data item from a neural network trained to select informative vectors of the multimodal vectors; (Zhao, pg. 1937, col. 2, para. 1, “We applied our method to three real world datasets with several irrelevant noisy feature groups mixed for image classification tasks. Experimental results show that this framework can discover the relevant feature groups effectively and achieves better classification accuracies compared with several baseline approaches for heterogenous feature selection.”; Zhao, pg. 1939, col. 1, para. 2, “In the proposed framework, we utilize these multi-modal networks and the sparse group lasso jointly to select the feature groups that are relevant to classification tasks.”; Zhao, pg. 1939, col. 1, para. 5 – pg. 1939, col. 2, para. 1, “Each independent modality is characterized by a single feature group, and then these different modalities are sent to different branches of the Multi-modal Neural Networks, yielding refined feature representations with multiple nonlinear transformations based upon the given original modalities. When all the feature groups are transformed by the multi-modal neural networks, the outputs of the refined features extracted from the top layer of each branch are concatenated into a new feature vector. Then the Feature Selection Component takes this concatenation as its input and derives an optimal solution of the weight vector. According to this weight vector, the most relevant feature groups with respect to the current task are picked out. Finally, we use these selected features in the final recognition task.”; Zhao, pg. 1945, col. 1, para. 5, “It can be observed that our method can effectively filter the noisy feature groups that are deemed to be irrelevant to the final classification task. All the random noise and the noisy original feature groups are weighted zero in all the three datasets. It endows only those feature groups that are relevant and informative to the classification task with a proper value.”). 
Zhao does not appear to explicitly teach storing the classification of the data item; selecting display content corresponding to the classification of the data item; generating an electronic message from the multimodal dataset; 
overlaying the display content on the electronic message; and 
publishing the electronic message that includes the display content on a network site. 
Dutta, however, teaches storing the classification of the data item; (Dutta, fig. 7: MEMORY 718, CLASSIFICATION DATA 102; Dutta, col. 20, ln. 67 – col. 21, ln. 2, “The classification module may be configured to generate classification data 102 or modify existing classification data 102 based on user interaction data 120”; Dutta, col. 5, ln. 64-66, “The classification data 102(1) and item data 106 may be stored in association with one or more classification servers 108 or other types of computing devices.”). selecting display content corresponding to the classification of the data item; (Dutta, col. 6, ln. 4-9, “the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.”; Dutta, col. 6, ln. 17-20, “In some implementations, selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”). generating an electronic message (Dutta, Fig. 1A:106) from the multimodal dataset; (Dutta, Fig. 1A; col. 5, ln. 26-33; “FIG. 1A depicts … Classification data 102 may include a plurality of classification labels 104 which may be applied to items. For example, classification labels 104 may include alphanumeric descriptors, images, or other types of data that may be used to differentiate particular types of items from other types of items.”; col. 6, ln. 17-22; “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed. For example, selection of the “Running” label may cause item data 106 associated with different types of running shoes to be displayed in the user interface 112.”). overlaying the display content (Dutta, Fig. 1A:104) on the electronic message (Dutta, Fig. 1A:106); and publishing the electronic message (Dutta, Fig. 1A:106) that includes the display content (Dutta, Fig. 1A:104) on a network site (Dutta, Fig. 1A:114). (Dutta, col. 6, ln. 17-21, “selection of one or more of the classification labels 104 may also cause item data 106 for the items associated with the selected classification label(s) 104 to be displayed.”; col. 5, ln. 62-64; “item data 106 indicative of characteristics of particular items may include an indication of one or more of the classification labels 104.”). 
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Zhao by storing a classification of a data item, selecting display content, generating an electronic message, overlaying the display content, and publishing the electronic message as taught by Dutta. The motivation to do so is that the selected content can be presented on a display over a network site. (“the user interface 112 may present at least a portion of the classification labels 104 on a display or other type of output device. As user input is received, selecting or otherwise interacting with one or more of the classification labels 104, additional classification labels 104 may be displayed.” (Dutta, Fig. 1A; col. 6 ln. 4-9)).


Claims 8, 9, 17, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (“Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO”) in view of Dutta (US 10783167 B1) in view of Laliberte (US 20160359987 A1)

Regarding claim 8,
The Zhao/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). Zhao further teaches wherein the multimodal dataset comprises … image data generated by a client device of the user, and text data authored by the user. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, the Zhao/Dutta combination does not explicitly teach profile data of a user of the network site, but Laliberte teaches this limitation. (Laliberte Fig. 20; Fig. 23; [0139] “with reference now to FIG. 20, an exemplary screenshot 2000 is provided of a user profile page for user “MIKE123” showing profile posting and contact details such as a number of posted images or videos, a number of users following and a number of users being followed by this user.”; “in FIG. 23, a screenshot 2300 is provided of an image and video sharing platform interface in which a user's text-based post 2302 (e.g. a text message otherwise destined for Twitter™, SnapChat™ or other such text-based sharing platforms) is set using an embodiment of the system”.).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include profile data of a user of the network. The motivation to do so is that registered users can provide useful information to the service to receive compensation for their use of the system. (“Since registered users of the content integration platform will generally provide accurate/legitimate contact and/or demographic information in registering to the service in order to receive compensation for their use of the system, external content providers will in response receive valuable information not only on the content originators selecting to push their brand, and their social network to which the branded content was pushed, but also confirmed viewership of this embedded branded content by virtue of the scan code concurrently embedded within this integrated content.” (Laliberte [0102])).

Regarding claim 9,
The Zhao/Dutta/Laliberte combination teaches the method of claim 8, (and thus the rejection of claim 8 is incorporated). Zhao further teaches wherein the electronic message …. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, Zhao does not explicitly teach includes ephemeral message., but Laliberte teaches this limitation. (Laliberte Fig. 23; [0139] “in the post 2304 of FIG. 23, the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.”; “the system may be configured to integrate the text-message in a dynamically selected user-related image, selected for example as a function of the time of day (e.g. night time vs. daytime scenery), … profile status or the like.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include ephemeral message. The motivation to do so is that the emotion intended by user’s message can be reflected. (“the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.” (Laliberte Fig. 23; [0139]).

Regarding claim 17,
The Zhao/Dutta combination teaches the system of claim 10, (and thus the rejection of claim 10 is incorporated). Zhao further teaches wherein the multimodal dataset comprises … image data generated by a client device of the user, and text data authored by the user. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, the Zhao/Dutta combination does not explicitly teach profile data of a user of the network site, but Laliberte teaches this limitation. (Laliberte Fig. 20; Fig. 23; [0139] “with reference now to FIG. 20, an exemplary screenshot 2000 is provided of a user profile page for user “MIKE123” showing profile posting and contact details such as a number of posted images or videos, a number of users following and a number of users being followed by this user.”; “in FIG. 23, a screenshot 2300 is provided of an image and video sharing platform interface in which a user's text-based post 2302 (e.g. a text message otherwise destined for Twitter™, SnapChat™ or other such text-based sharing platforms) is set using an embodiment of the system”.).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include profile data of a user of the network. The motivation to do so is that registered users can provide useful information to the service to receive compensation for their use of the system. (“Since registered users of the content integration platform will generally provide accurate/legitimate contact and/or demographic information in registering to the service in order to receive compensation for their use of the system, external content providers will in response receive valuable information not only on the content originators selecting to push their brand, and their social network to which the branded content was pushed, but also confirmed viewership of this embedded branded content by virtue of the scan code concurrently embedded within this integrated content.” (Laliberte [0102])).

Regarding claim 18,
The Zhao/Dutta/Laliberte combination teaches the system of claim 17, (and thus the rejection of claim 17 is incorporated). Zhao further teaches wherein the electronic message …. (Zhao, pg. 1943, col. 2, para. 1, “The NUS-WIDE-Object dataset3 consists of 30000 images from Flickr. Text description tags are attached to every image by the authors of the photos.”).
However, Zhao does not explicitly teach includes ephemeral message., but Laliberte teaches this limitation. (Laliberte Fig. 23; [0139] “in the post 2304 of FIG. 23, the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.”; “the system may be configured to integrate the text-message in a dynamically selected user-related image, selected for example as a function of the time of day (e.g. night time vs. daytime scenery), … profile status or the like.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to include ephemeral message. The motivation to do so is that the emotion intended by user’s message can be reflected. (“the integration engine automatically identifies the user's use of the sad face emoticon or shortcut keys, such as :(, and thus integrates the user's message into a background sad face image 2306 or other image invoking sadness or disappointment as reflective of the emotion intended by user's message.” (Laliberte Fig. 23; [0139]).


Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (“Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO”) in view of Dutta (US 10783167 B1) in view of Cao (US 9684852 B2).

Regarding claim 6,
The Zhao/Dutta combination teaches the method of claim 1, (and thus the rejection of claim 1 is incorporated). The Zhao/Dutta combination does not explicitly teaches wherein the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user, but Cao (Cao, Abstract, “a multimodal information fusion device for combining, using multimodal information fusion, the visual-based gender predictions, the textual-based gender predictions, and the semantic scores to infer a gender of a user”) teaches this limitation. (Cao, col. 2, ln. 46 – col. 3, ln. 12, “The present principles are directed to systems and methods for inferring gender by fusion of multimodal content.
In an embodiment, the present principles advantageously provide an inferred correlation between a user's gender and automatically recognized semantics of his/her image/video collection. In an embodiment, the present principles employ a novel filtered fusion to effectively combine complementary sources of information (visual and textual) with the aim of inferring user gender. As used herein, visual information, visual content and visual-based interchangeably refer to non-textual objects (e.g., cars, purses, etc.) that appear in images, while textual information, textual content, and textual-based refer to textual (e.g., words, phrases, names, etc.) objects that appear in images. It is to be appreciated that as used herein, the term “image” encompasses still images and videos, as the latter includes a series of images.
In an embodiment, the present principles look at the content of a user profile (from social media or other sources) and infer gender by: (1) analyzing visual information (profile picture, header picture, collection of images/videos, color, and so forth) by applying a set of pre-trained visual classifiers that can recognize semantic concepts in images and videos with a confidence score, and then learning a gender classifier; (2) analyzing textual information (text, description, name, and so forth) and providing a prediction for each source of information based on the response of a set of pre-trained textual classifiers; and (3) performing a filtered fusion of different prediction channels to produce a final prediction score. The semantic concepts capable of being recognized by the visual classifiers are pervasive and can include, but are not limited to, scenes (nature, sky, urban, gym), events (sports, entertainment), living entities (people, animals), type (animation, black-and-white), and so forth.”).
It would have been obvious to one of ordinary skill in the art to use the multimodal selection method of Zhao on the multimodal network site user data of Cao in order to generate a gender prediction. The motivation to do so is “filtered fusion involves selecting which information to use directly and which information to aggregate, in order to provide a final prediction of gender from the system.” (Cao, col. 6, ln. 36-39).

Regarding claim 15. 
The Zhao/Dutta combination teaches the method of claim 10, (and thus the rejection of claim 10 is incorporated). Cao (Cao, Abstract, “a multimodal information fusion device for combining, using multimodal information fusion, the visual-based gender predictions, the textual-based gender predictions, and the semantic scores to infer a gender of a user”) teaches wherein the data item is a user of a network site and the multimodal dataset comprises different types of user data of the user. (Cao, col. 2, ln. 46 – col. 3, ln. 12, “The present principles are directed to systems and methods for inferring gender by fusion of multimodal content.
In an embodiment, the present principles advantageously provide an inferred correlation between a user's gender and automatically recognized semantics of his/her image/video collection. In an embodiment, the present principles employ a novel filtered fusion to effectively combine complementary sources of information (visual and textual) with the aim of inferring user gender. As used herein, visual information, visual content and visual-based interchangeably refer to non-textual objects (e.g., cars, purses, etc.) that appear in images, while textual information, textual content, and textual-based refer to textual (e.g., words, phrases, names, etc.) objects that appear in images. It is to be appreciated that as used herein, the term “image” encompasses still images and videos, as the latter includes a series of images.
In an embodiment, the present principles look at the content of a user profile (from social media or other sources) and infer gender by: (1) analyzing visual information (profile picture, header picture, collection of images/videos, color, and so forth) by applying a set of pre-trained visual classifiers that can recognize semantic concepts in images and videos with a confidence score, and then learning a gender classifier; (2) analyzing textual information (text, description, name, and so forth) and providing a prediction for each source of information based on the response of a set of pre-trained textual classifiers; and (3) performing a filtered fusion of different prediction channels to produce a final prediction score. The semantic concepts capable of being recognized by the visual classifiers are pervasive and can include, but are not limited to, scenes (nature, sky, urban, gym), events (sports, entertainment), living entities (people, animals), type (animation, black-and-white), and so forth.”).
It would have been obvious to one of ordinary skill in the art to use the multimodal selection method of Zhao on the multimodal network site user data of Cao in order to generate a gender prediction. The motivation to do so is “filtered fusion involves selecting which information to use directly and which information to aggregate, in order to provide a final prediction of gender from the system.” (Cao, col. 6, ln. 36-39).













Response to Arguments
Applicant’s arguments filed December 28, 2021 have been fully considered but they are not persuasive.

Regarding the rejection of claims 1, 10, and 20 under 35 U.S.C. §101:
In response to applicant’s arguments regarding amended claims 1, 10, and 20 regarding the 35 U.S.C. § 101 rejection has been considered but are not persuasive.

selecting display content corresponding to the classification of the data item; 
generating an electronic message from the multimodal dataset; 
overlaying the display content on the electronic message; and 
publishing the electronic message that includes the display content on a network site.

The applicant appears to argue that the above limitations in the amended claims do not recite a mental process and cannot be performed with the aid of pencil and paper. 
However, Examiner respectfully disagrees. “Selecting” and “generating … message from the multimodal dataset” are mental processes and can be performed in the human mind. Additionally, the other elements of the above limitations are additional elements, which are not integrated into a practical application and do not provide an inventive concept. “Electronic message” is generally linked to the use of the judicial exception to a particular technological environment or field of use. “Overlaying the display content on the electronic message” and “publishing the electronic message on a network site” are considered as “transmitting or receiving data over a network site and displaying it”, (see MPEP 2106.04(d) and 2106.05(g)), which is insignificant extra-solution activity and also well-understood, routine, and conventional activity. Please see the updated §101 rejection above. Thus, applicant’s arguments are not persuasive. 

Regarding the rejection of claims 5 and 14 under 35 U.S.C. §112(b):
In response to applicant’s arguments regarding amended claims 5 and 14 regarding the 35 U.S.C. § 112(b) rejection has been considered and are persuasive. Therefore, the rejection has been withdrawn.

Regarding the rejection of claims 1, 5, 6, 8, 9, 10, 14, 15, 17, 18, 20 under 35 U.S.C. §103:

Applicant’s arguments with respect to claims 1, 5, 6, 8, 9, 10, 14, 15, 17, 18, 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 
Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims. 

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Deuk Lee whose telephone number is 571-272-8440.  The examiner can normally be reached on Monday-Friday 8:30am-5:30pm CDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DL/
Examiner, Art Unit 2122  

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122