Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
Per Applicant’s request, claims 1-17 are amended. Claims 1-17 are pending and have been considered by Examiner.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Objections
Claims 1, 2, 12, 13, and 15 are objected to because of minor informalities, where underline represents an insertion and double brackets [[ ]] or strikethrough represents a deletion.
Claim 1 should recite: “generating, by the plurality of encoders, a plurality of pieces of characteristic information”
Claim 2 should recite
in the first paragraph: “the processor learns the decoders in the plurality of decoders”
in the second paragraph: “the respective classification[[s]] of each piece of the output information is [[has]] the same classification as that of each [[the]] piece[[s]] of input information input to the different encoders in the plurality of encoders.” 
Claim 12, line 7 should recite: “encoders that generate a plurality of pieces of characteristic information”
Claim 13, line 11 should recite: “a plurality of pieces of
Claim 15, second-to-last paragraph should recite: “generating, by a plurality of decoders”
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 12, 14, and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 12, 14, and 16 recite in the last paragraph of each claim, with numbers added by Examiner: “output (1) the generated corresponding content and (2) a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.” It is unclear whether the limitation “having a same classification based on outputs of the plurality of decoders” is supposed to describe (1), (2), or both (1) and (2). For examining purposes, Examiner interprets the claim as the limitation describing only (2).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
CLAIM 1
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
	(1) execute a machine learning process by implementing a model 
(2) evaluating the plurality of pieces of input information (BRI includes performing any type of analysis on the input data, such as viewing an input image and reading an input caption)
(3) generating a plurality of characteristic information indicating characteristics of the plurality of pieces of input information (BRI includes describing the input data, such as providing a tone of the image and providing a tone of the caption (e.g., serious, playful))
(4) synthesizing a combination of the plurality of pieces of characteristic information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(5) generating synthesized information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(6) generating a plurality of pieces of output information of different classifications from the synthesized information (BRI includes creating new data from the combination of input data, such as creating a new image that matches the tone of the input image and the tone of the input caption, and creating a new caption that matches tone of the input image and the tone of the input caption)
Execute a machine learning process is a mental process, as further defined by limitations 2 to 5. Limitations 2 to 6 are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of a processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
a memory
a processor operatively coupled to the memory
a plurality of encoders 
a plurality of decoders
acquire a plurality of pieces of input information, the plurality of pieces of input information having classifications;
receives the plurality of pieces of input information as inputs, 
outputs a plurality of pieces of output information corresponding to the respective pieces of input information
output a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.
A memory and a processor operatively coupled to the memory amount to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A plurality of encoders and a plurality of decoders are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). The acquiring and receiving limitations amount to mere data-gathering, which is an insignificant extra-solution activity. Outputting data is an insignificant extra-solution activity because it is well-known. See MPEP 2106.05(g).
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the acquiring and receiving are well-understood, routine, conventional activities of receiving data over a network. See MPEP 2106.05(d)(II)(i):
The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network, e.g., using the Internet to gather data
Outputting information is well-known in the art, as disclosed by Wical (US Patent 6,460,034, published 2002) at C. 9, L. 30-32: “A screen module, such as screen module 230, which processes information for display on a computer output display, is well known in the art.”
	The claim is not patent eligible.

CLAIM 2 incorporates the rejection of claim 1.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
(1) generate the pieces of output information from the synthesized information (BRI includes generating a new caption that matches tone of the input image and the tone of the input caption.)
(2) the respective classifications of each piece of the output information are different and the respective classifications of each piece of the output information has the same classification of the pieces of input information to the different plurality of encoders.

Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor 
learns the plurality of decoders
plurality of encoders
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). The element “learns the plurality of decoders” is interpreted as “trains the plurality of decoders”. Training decoders is insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application. A plurality of encoders and a plurality of decoders are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a plurality of decoders is well-understood, routine, conventional activity. 
common pre-training technique in deep neural networks.” Chung states an auto-encoder includes a decoder layer W’ in ¶ [0030]: “Once the auto-encoder is trained, the decoder layer (W′) can be removed, and the encoded layer (W) is used as input for stacking the next auto-encoder.” Chung in Fig. 2 shows that a decoder as Reconstruction 108 in Fig. 2.
The claim is not patent eligible.

CLAIM 3 incorporates the rejection of claim 1.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor 
learns the plurality of encoders that have learned the pieces of characteristic information of different classifications
learns the plurality of decoders that have learned the pieces of characteristic information of the same classification as the respective plurality of encoders.
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). Training encoders and decoders is insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a plurality of encoders and decoders is well-understood, routine, conventional activity. 
Chung et al. (US 20170068888 A1, published March 9, 2017) teaches at ¶ [0028]: “An auto-encoder is a common pre-training technique in deep neural networks, and the goal of pre-training is to find a good starting point in weight space to obtain a model with faster or better convergence.” Chung states an auto-encoder includes an encoded (encoder) layer W and a decoder layer W’ in ¶ [0030]: “Once the auto-encoder is trained, the decoder layer (W′) can be removed, and the encoded layer (W) is used as input for stacking the next auto-encoder.” Chung in Fig. 2 shows an encoder as Input Layer 104 and a decoder as Reconstruction 108 in Fig. 2.
The claim is not patent eligible.

CLAIM 4 incorporates the rejection of claim 1.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
(1) generates a characteristic of an image (BRI includes describing the tone of an input image)
(2) generates a characteristic of text (BRI includes describing the tone of an input caption)
(3) generates synthesized information obtained by synthesizing the characteristic of the image and the characteristic of the text respectively generated by the first encoder and the second encoder (BRI includes combining the tone of the input image and the tone of the input caption)

(5) generates output information corresponding to the text from the synthesized information (BRI includes creating new data sharing any qualities with the input caption)
The claim limitations are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of a processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor 
learns at least a first encoder of the plurality of encoders, 
learns a second encoder of the plurality of encoders, 
learns a synthesizer 
learns a first decoder of the plurality of decoders 
learns a second decoder of the plurality of decoders
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A first encoder, a second encoder, a synthesizer, a first decoder, and a second decoder are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). Training encoders, a synthesizer, and decoders are insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 

Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training encoders and decoders is well-understood, routine, conventional activity. 
Chung et al. (US 20170068888 A1, published March 9, 2017) teaches at ¶ [0028]: “An auto-encoder is a common pre-training technique in deep neural networks, and the goal of pre-training is to find a good starting point in weight space to obtain a model with faster or better convergence.” Chung states an auto-encoder includes an encoded (encoder) layer W and a decoder layer W’ in ¶ [0030]: “Once the auto-encoder is trained, the decoder layer (W′) can be removed, and the encoded layer (W) is used as input for stacking the next auto-encoder.” Chung in Fig. 2 shows that the autoencoder includes encoder as Input Layer 104, a synthesizer as Hidden Layer 106, and a decoder as Reconstruction 108.
The claim is not patent eligible.

CLAIM 5 incorporates the rejection of claim 1.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitation:
generates synthesized information obtained by synthesizing the pieces of characteristic information generated by the plurality of encoders in a synthesizing mode corresponding to an output 
	This limitation is a mental process of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of the processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor
learns a synthesizer
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A synthesizer is generally linking the abstract idea to the particular technological environment of machine learning, and it is not an improvement to machine learning technology. Therefore, it is not a meaningful limitation. See MPEP 2106.05(e). Training a synthesizer is insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application.
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a synthesizer is well-understood, routine, conventional activity.
common pre-training technique in deep neural networks, and the goal of pre-training is to find a good starting point in weight space to obtain a model with faster or better convergence.” Chung in Fig. 2 shows that the autoencoder includes a synthesizer as Hidden Layer 106.
The claim is not patent eligible.

CLAIM 6 incorporates the rejection of claim 5.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 5 are incorporated. The claim recites the following limitations:
generates synthesized information obtained by synthesizing the pieces of characteristic information generated by the plurality of encoders in a synthesizing mode corresponding to an attribute of a user that is an output destination of the output information. (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
This limitation is a mathematical operation and/or a mental process of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of the processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor
learns a synthesizer
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A synthesizer is generally linking the abstract idea to the particular technological environment of machine learning, and it is not an improvement to machine learning technology. 
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application.
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a synthesizer is well-understood, routine, conventional activity.
Chung et al. (US 20170068888 A1, published March 9, 2017) teaches at ¶ [0028]: “An auto-encoder is a common pre-training technique in deep neural networks, and the goal of pre-training is to find a good starting point in weight space to obtain a model with faster or better convergence.” Chung in Fig. 2 shows that the autoencoder includes a synthesizer as Hidden Layer 106.
The claim is not patent eligible.

CLAIM 7 incorporates the rejection of claim 5.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 5 are incorporated. The claim recites the following limitations:

(2) generates synthesized information corresponding to an output mode of the output information from combined information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
The first limitation is a mathematical operation and/or a mental process of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of the processor. The second limitation a mental process of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of the processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor
learns a synthesizer
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A synthesizer is generally linking the abstract idea to the particular technological environment of machine learning, and it is not an improvement to machine learning technology. Therefore, it is not a meaningful limitation. See MPEP 2106.05(e). Training a synthesizer is insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a synthesizer is well-understood, routine, conventional activity.
Chung et al. (US 20170068888 A1, published March 9, 2017) teaches at ¶ [0028]: “An auto-encoder is a common pre-training technique in deep neural networks, and the goal of pre-training is to find a good starting point in weight space to obtain a model with faster or better convergence.” Chung in Fig. 2 shows that the autoencoder includes a synthesizer as Hidden Layer 106.
The claim is not patent eligible.

CLAIM 8 incorporates the rejection of claim 1.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
(1) generate an intermediate representation indicating the characteristic of input information, (BRI includes altering any quality of the image and caption, such as converting the image to grayscale and converting the caption to plain text.)
(2) generate the characteristic information from the intermediate representation generated by each model of the plurality of models. (BRI includes describing the input data, such as providing a tone of the image and providing a tone of the caption (e.g., serious, playful))

Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor 
learns a plurality of models that have a structure corresponding to a classification of the input information 
learns the plurality of encoders 
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A plurality of models is generally linking the abstract idea to the particular technological environment of machine learning, and it is not an improvement to machine learning technology. Therefore, it is not a meaningful limitation. See MPEP 2106.05(e). Training a plurality of models and training a plurality of encoders is insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application.
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a 
Chung et al. (US 20170068888 A1, published March 9, 2017) teaches at ¶ [0028]: “An auto-encoder is a common pre-training technique in deep neural networks.” Chung states an auto-encoder includes an encoder layer W in ¶ [0030]: “Once the auto-encoder is trained, the decoder layer (W′) can be removed, and the encoded layer (W) is used as input for stacking the next auto-encoder.” Chung in Fig. 2 shows an encoder as Input Layer 104 in Fig. 2. 
The claim is not patent eligible.

CLAIM 9 incorporates the rejection of claim 8.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 8 are incorporated. The claim recites the following limitations:
	(1) generates an intermediate representation of the input information that is text (BRI includes altering any quality of the caption, such as converting the caption to plain text.)
	(2) generates an intermediate representation of the input information that is an image (BRI includes altering any quality of the image, such as converting the image to grayscale)
These limitations are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of the processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor 
learns a model that is a recurrent neural network (RNN)
learns a model that is a convolution neural network (CNN)
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A RNN model and a CNN model are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). Training a RNN model and training a CNN is insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application.
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a RNN model and training a CNN model are well-understood, routine, conventional activities.
Ghosh et al. (US Patent 7,181,768, published 2007) states at C. 12, L. 32-34, “The adjustable weights for each link are adjusted according to well-known recurrent neural network training techniques.” Miller et al. (US 20100004915 A1, published 2010) states at ¶ [0040], “In box 405, the CNN 104 is trained with the patches according to well known principles of stochastic gradient descent and back-propagation 507 (FIG. 5), to produce a trained version 508 (FIG. 5) of the CNN 10”.
	The claim is not patent eligible.

CLAIM 10 incorporates the rejection of claim 1.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor 
learns the plurality of encoders and the plurality of decoders 
a plurality of groups of an encoder and a decoder, and each of the plurality of groups has learned characteristics of pieces of information belonging to the different classifications.
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A plurality of groups of an encoder and a decoder is generally linking the abstract idea to the particular technological environment of machine learning, and it is not an improvement to machine learning technology. Therefore, it is not a meaningful limitation. See MPEP 2106.05(e). Training the plurality of encoders and the plurality of decoders is insignificant extra-solution activity of training a machine learning model, which is well-known. See MPEP 2106.05(g):
“When determining whether an additional element is insignificant extra-solution activity, examiners may consider the following: (1) Whether the extra-solution limitation is well known”. 
Adding insignificant extra-solution activity is not sufficient to integrate the additional elements into a practical application.
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the training a plurality of encoders and a plurality of decoders is well-understood, routine, conventional activity.
Chung et al. (US 20170068888 A1, published March 9, 2017) teaches at ¶ [0028]: “An auto-encoder is a common pre-training technique in deep neural networks, and the goal of pre-training is to find a good starting point in weight space to obtain a model with faster or better convergence.” Chung states an auto-encoder includes an encoded (encoder) layer W and a decoder layer W’ in ¶ [0030]: “Once the auto-encoder is trained, the decoder layer (W′) can be removed, and the encoded layer (W) is used as input for stacking the next auto-encoder.” Chung in Fig. 2 shows an encoder as Input Layer 104 and a decoder as Reconstruction 108 in Fig. 2.
The claim is not patent eligible.

CLAIM 11 incorporates the rejection of claim 1.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the processor 
outputs the pieces of output information having content with a same characteristic from a plurality of the pieces of input information included in predetermined content.
A processor amounts to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). Outputting data is an insignificant extra-solution activity because it is well-known. See MPEP 2106.05(g). 

 Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, outputting information is well-known in the art, as disclosed by Wical (US Patent 6,460,034, published 2002) at C. 9, L. 30-32: “A screen module, such as screen module 230, which processes information for display on a computer output display, is well known in the art.”
	The claim is not patent eligible.

CLAIM 12
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
(1) executing a machine learning process
(2) generate pieces of characteristic information indicating characteristics of the plurality of pieces of input information of different classifications (BRI includes describing the input data, such as providing a tone of the image and providing a tone of the caption (e.g., serious, playful))
 (3) synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(4) generates synthesized information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(5) generate a plurality of pieces of output information-6-Application No. 15/996,968 corresponding to the plurality of pieces of input information of different classifications from the synthesized information generated by the 
(6) generate corresponding content corresponding to the predetermined content from the plurality of pieces of output information (BRI includes generating new data or limiting data from the pieces of output information that match predetermined content)
Executing a machine learning process is a mental process, as further defined by limitations 2 to 5. Limitations 2 to 6 are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of a processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
a processor 
a memory
a plurality of encoders 
a plurality of decoders
acquire a plurality of pieces of output information corresponding to a plurality of pieces of input information included in a predetermined content
a synthesizer 
output the generated corresponding content and a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.
A memory and a processor operatively coupled to the memory amount to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). A plurality of encoders, a plurality of decoders, and a synthesizer are generally linking the abstract idea to the particular technological 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the acquiring limitation is well-understood, routine, conventional activities of receiving data over a network. See MPEP 2106.05(d)(II)(i). Outputting information is well-known in the art, as disclosed by Wical (US Patent 6,460,034, published 2002) at C. 9, L. 30-32: “A screen module, such as screen module 230, which processes information for display on a computer output display, is well known in the art.”
The claim is not patent eligible.

CLAIM 13
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
(1) executing a machine learning process by implementing a model
(2) evaluating the plurality of pieces of input information (BRI includes performing any type of analysis on the input data, such as viewing an input image and reading an input caption)
(3) generating a plurality of characteristic information indicating characteristics of the plurality of pieces of input information (BRI includes describing the input data, such as providing a tone of the image and providing a tone of the caption (e.g., serious, playful))

(5) generating synthesized information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(6) generating a plurality of pieces of output information of different classifications from the synthesized information (BRI includes creating new data from the combination of input data, such as creating a new image that matches the tone of the input image and the tone of the input caption, and creating a new caption that matches tone of the input image and the tone of the input caption)
Executing a machine learning process by implementing a model is a mental process as further defined by limitations 2 to 6. Limitations 2 to 6 are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper but for the recitation of a processor. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
acquiring a plurality of pieces of input information, the plurality of input information having different classifications;
receives the plurality of pieces of input information as inputs, 
outputs a plurality of pieces of output information corresponding to the respective pieces of input information
a plurality of encoders
a plurality of decoders
outputting a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.

Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally the acquiring and receiving limitations are well-understood, routine, conventional activities of receiving data over a network. See MPEP 2106.05(d)(II)(i). Outputting information is well-known in the art, as disclosed by Wical (US Patent 6,460,034, published 2002) at C. 9, L. 30-32: “A screen module, such as screen module 230, which processes information for display on a computer output display, is well known in the art.” 
The claim is not patent eligible.

CLAIM 14
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
(1) executing a machine learning process 
(2) generate pieces of characteristic information indicating characteristics of the plurality of pieces of input information of different classifications, (BRI includes describing the input data, such as providing a tone of the image and providing a tone of the caption (e.g., serious, playful))

(4) generates synthesized information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(5) generate a plurality of pieces of output information corresponding to the plurality of pieces of input information of different classifications from the synthesized information generated by the synthesizer; (BRI includes creating new data from the combination of input data, such as creating a new image that matches the tone of the input image and the tone of the input caption, and creating a new caption that matches tone of the input image and the tone of the input caption)
(6) generating corresponding content corresponding to the predetermined content from the acquired plurality of pieces of output information (BRI includes generating new data or limiting data from the pieces of output information that match predetermined content)
Executing a machine learning process is a mental process, as further defined by limitations 2-5. Limitations 2 to 6 are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
acquiring a plurality of pieces of output information corresponding to a plurality of pieces of input information included in a predetermined content 
a plurality of encoders 
a synthesizer 
a plurality of decoders 
outputting the generated corresponding content 
outputting a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.
A plurality of encoders, a synthesizer, and a plurality of decoders are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). The acquiring limitation amounts to mere data-gathering, which is an insignificant extra-solution activity. The outputting limitations are insignificant extra-solution activity because they are well-known. See MPEP 2106.05(g). 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the acquiring limitation is well-understood, routine, conventional activities of receiving data over a network. See MPEP 2106.05(d)(II)(i). Outputting information is well-known in the art, as disclosed by Wical (US Patent 6,460,034, published 2002) at C. 9, L. 30-32: “A screen module, such as screen module 230, which processes information for display on a computer output display, is well known in the art.”
The claim is not patent eligible.

CLAIM 15
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
(1) executing a machine learning model by implementing a model

(3) synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders  (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(4) generating synthesized information  (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(5) generating a plurality of pieces of output information of different classifications from the synthesized information (BRI includes creating new data from the combination of input data, such as creating a new image that matches the tone of the input image and the tone of the input caption, and creating a new caption that matches tone of the input image and the tone of the input caption)
Executing a machine learning process is a mental process, as further defined by limitations 2 to 5. Limitations 2 to 5 are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
A non-transitory computer-readable storage medium having stored therein a learning program 
a computer
acquiring a plurality of pieces of input information, the plurality of input information having different classifications; 
receives the plurality of pieces of input information as inputs, 
outputs a plurality of pieces of output information corresponding to the respective pieces of input information
a plurality of encoders
a plurality of decoders
outputting a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.
A storage medium and a computer amount to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). The acquiring and receiving limitations amount to mere data-gathering, which is an insignificant extra-solution activity. Outputting information is insignificant extra-solution activity because it is well-known. See MPEP 2106.05(g). A plurality of encoders and a plurality of decoders are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the acquiring and receiving limitations are well-understood, routine, conventional activities of receiving data over a network. See MPEP 2106.05(d)(II)(i). Outputting information is well-known in the art, as disclosed by Wical (US Patent 6,460,034, published 2002) at C. 9, L. 30-32: “A screen module, such as screen module 230, which processes information for display on a computer output display, is well known in the art.”
The claim is not patent eligible.

CLAIM 16
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
(1) executing a machine learning process 
(2) generate pieces of characteristic information indicating characteristics of the plurality of pieces of input information of different classifications (BRI includes describing the input data, such as providing a tone of the image and providing a tone of the caption (e.g., serious, playful))
(3) synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(4) generates synthesized information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(5) generate a plurality of pieces of output information corresponding to the plurality of pieces of input information of different classifications from the synthesized information generated by the synthesizer; (BRI includes creating new data from the combination of input data, such as creating a new image that matches the tone of the input image and the tone of the input caption, and creating a new caption that matches tone of the input image and the tone of the input caption)
(6) generating corresponding content corresponding to the predetermined content from the acquired plurality of pieces of output information; (BRI includes generating new data or limiting data from the pieces of output information that match predetermined content)
Executing a machine learning process is a mental process, as further defined by limitations 2 to 5. Limitations 2 to 6 are mental processes of evaluating which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2
A non-transitory computer-readable storage medium having stored therein a generation program
a computer
aacquiring a plurality of pieces of output information corresponding to a plurality of pieces of input information included in a predetermined content 
a plurality of encoders 
a synthesizer 
a plurality of decoders 
outputting the generated corresponding content and a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.
A storage medium and a computer amount to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). The acquiring limitation amounts to mere data-gathering, which is an insignificant extra-solution activity. Outputting information is insignificant extra-solution activity because it is well known. See MPEP 2106.05(g). A plurality of encoders, a synthesizer, and a plurality of decoders are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the acquiring limitation is well-understood, routine, conventional activities of receiving data over a network. See MPEP 2106.05(d)(II)(i). Outputting information is well-known in the art, as disclosed by Wical (US Patent well known in the art.”
The claim is not patent eligible.

CLAIM 17
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
(1) execute a machine learning process by implementing a model 
(2) evaluating the plurality of pieces of input information of different classifications; (BRI includes performing any type of analysis on the input data, such as viewing an input image and reading an input caption)-10-
(3) generating a plurality of pieces of characteristic information indicating characteristics of the plurality of pieces of input information (BRI includes describing the input data, such as providing a tone of the image and providing a tone of the caption (e.g., serious, playful))
(4) synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders; (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(5) generating synthesized information (BRI includes making any type of combination, such as combining the tone of the image with the tone of the caption)
(6) generating a plurality of pieces of output information corresponding to the plurality of pieces of input information of different classifications from the generated synthesized information. (BRI includes creating new data from the combination of input data, such as creating a new image that matches the tone of the input image and the tone of the input caption, and creating a new caption that matches tone of the input image and the tone of the input caption)

Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
A non-transitory computer-readable storage medium having stored therein a program
a computer 
receives a plurality of pieces of input information as inputs 
outputs a plurality of pieces of output information corresponding to the respective pieces of input information
a plurality of encoders
a plurality of decoders
outputting a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders.
A storage medium and a computer amount to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). The receiving limitation amounts to mere data-gathering, which is an insignificant extra-solution activity. The outputting limitations are insignificant extra-solution activity because outputting information is well-known. See MPEP 2106.05(g). A plurality of encoders and a plurality of decoders are generally linking the abstract idea to the particular technological environment of machine learning, and they are not an improvement to machine learning technology. Therefore, they are not meaningful limitations. See MPEP 2106.05(e). 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. Additionally, the receiving and limitation is well-understood, routine, conventional activities of receiving data over a network. See MPEP 2106.05(d)(II)(i). Outputting information is well-known in the art, as disclosed by Wical (US Patent 6,460,034, published 2002) at C. 9, L. 30-32: “A screen module, such as screen module 230, which processes information for display on a computer output display, is well known in the art.”
	The claim is not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 5-7, and 11-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “Multimodal Deep Learning” to Ngiam et al. Fig. 3(b) from Ngiam is a bimodal deep autoencoder with audio and video modalities. An annotated copy is shown below.

    PNG
    media_image1.png
    308
    645
    media_image1.png
    Greyscale

Ngiam Fig. 3b (annotated)

Regarding CLAIM 1, Ngiam teaches (where the terms surrounded by single brackets [ ] are assumed by Examiner):
A learning device comprising: (Ngiam teaches multimodal autoencoder Fig. 3b on p. 4, col. 1)
a memory; and a processor operatively coupled to the memory, the processor including a plurality of encoders and a plurality of decoders, the processor being programmed to: (The experimental results on p. 4 are evidence of a memory and processor as claimed. Fig. 3b above shows the model encompasses encoders and decoders.)
acquire a plurality of pieces of input information, the plurality of pieces of input information having classifications; (The acquiring step is shown by the arrows after the input layer in Fig. 3b. One piece of input information is the audio input data, and another piece of input information is the video input data. Audio and video comprise different classifications. Input data is described in § 4.1 – “10 contiguous audio frames were used as the input to our models” and “We used 4 contiguous video frames for input”)
execute a machine learning process by implementing a model that receives the plurality of pieces of input information as inputs, and outputs a plurality of pieces of output information corresponding to the respective pieces of input information, the machine learning process including: (A plurality of pieces of output information is interpreted as a first piece of output information being the audio reconstruction and a second piece of output information being the video reconstruction. The model as indicated above in Fig. 3b encompasses encoders, the synthesizer, and the decoder. The model inputs the audio and video input data and outputs the audio and video reconstructions. Ngiam teaches implementing a model as training the model (p. 3, ¶ 3): “we propose training the bimodal deep autoencoder (Figure 3b) using an augmented but noisy dataset with additional examples that have only a single-modality as input.”)
generating, by the plurality of encoders, a plurality of [pieces of] characteristic information indicating characteristics of the plurality of pieces of input information based on evaluating the plurality of pieces of input information, (Fig. 3b above shows two encoders each receiving a piece of input information. In the limitation “a plurality of [pieces of] characteristic information indicating characterstics,” the term “characterstics” is broadly interpreted as the modalities of audio and video, and the term “[pieces of] characterstic information” is broadly interpreted as hidden variables output by the encoders indicating the modality of the corresponding input data. Thus, the arrow exiting each encoder is a hidden variable indicating a modality. Ngiam states, regarding the Restricted Boltzmann Machines of Figs. 2 and 3, “Informally, the first layer representations correspond to phonemes and visemes” (p. 3, ¶1).)
generating synthesized information by synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders, and (Fig. 3b above shows a synthesizer receiving two pieces of characteristic information from the encoders. The two arrows exiting the synthesizer are pieces of synthesizing information)
generating, by the plurality of decoders, a plurality of pieces of output information of different classifications from the synthesized information; and (Fig. 3b above shows two decoders each receiving a piece of synthesizing information. The arrow exiting each decoder is a piece of output information of either audio or video modalities. “Different” is interpreted as either audio or video.)
output a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders. (Fig. 3b above shows the Audio Reconstruction output on the left and the Video Reconstruction output on the right. A set is interpreted as the union of the Audio Reconstruction output and the Video Reconstruction output, and a sub-set is interpreted as the Audio Reconstruction output alone. The Audio Reconstruction output is “based on outputs of the plurality of decoders” having itself been output by a decoder.)

	Regarding CLAIM 2, Ngiam teaches: The learning device according to claim 1, wherein:
	the processor learns the plurality of decoders that generate the pieces of output information from the synthesized information, and (Fig. 3b above shows two decoders each receiving a piece of synthesizing information and outputting a piece of output information, i.e., audio and video reconstructions. The model is trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
	the respective classifications of each piece of the output information are different and (A classification is broadly interpreted as a modality of audio or video. Fig. 3b shows audio and video reconstruction outputs.) 
the respective classifications of each piece of the output information has the same classification of the pieces of input information to the different plurality of encoders. (In Fig. 3b, audio reconstruction is the same classification as audio input, namely an audio modality, and video 

	Regarding CLAIM 3, Ngiam teaches: The learning device according to claim 1, wherein 
the processor learns the plurality of encoders (The model is trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
that have learned the pieces of characteristic information of different classifications, and (The Fig. 3 caption on p. 3 states that the autoencoder was pre-trained using sparse RBMs, which is an undirected graphical model with hidden and visible variables according to §2.1. The encoders in Fig. 3b have learned characteristics (i.e., either audio or video modality types) of pieces of information (i.e., hidden variables) of different classifications (audio and video modalities are different types of classifications))
wherein the processor learns the plurality of decoders (The model is trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
that have learned the pieces of characteristic information of the same classification as the respective plurality of encoders. (In Fig. 3b, the decoders have learned characteristics (i.e., either audio or video modality types) of pieces of information (i.e., visible variables). “Of the same classification as the respective plurality of encoders” is interpreted as having either of the classifications of the two different encoders.)

Regarding CLAIM 5, Nguyen teaches: The learning device according to claim 1, 
wherein the processor learns a synthesizer (The model in Fig. 3b has a synthesizer. The model is trained according to p. 3, ¶ 3. A processor is implied by the experiments.) 
that generates synthesized information (Arrows exiting synthesizer in Fig. 3b) 
obtained by synthesizing the pieces of characteristic information generated by the plurality of encoders (Arrows entering synthesizer in Fig. 3b) 
in a synthesizing mode corresponding to an output mode of the output information. (The synthesizer outputs audio modality data corresponding the audio reconstruction output, and the synthesizer outputs video modality data corresponding to the video reconstruction output.)

Regarding CLAIM 6, Nguyen teaches: The learning device according to claim 5, 
wherein the processor learns a synthesizer (The model in Fig. 3b has a synthesizer. The model is trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
that generates synthesized information (Arrows exiting synthesizer in Fig. 3b)
obtained by synthesizing the pieces of characteristic information generated by the plurality of encoders (Arrows entering synthesizer in Fig. 3b)
in a synthesizing mode corresponding to an attribute of a user that is an output destination of the output information. (“A synthesizing mode” is the Bimodal Deep Autoencoder synthesizer in Fig. 3b. “An attribute of a user that is an output destination” is broadly interpreted as the experimental setup chosen by authors Ngiam et al. The experimental results are evidence of outputting the audio and video representations to an output destination.)

Regarding CLAIM 7, Nguyen teaches: The learning device according to claim 5, 
wherein the processor learns a synthesizer (The model in Fig. 3b has a synthesizer. The model is trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
that generates synthesized information (Arrows exiting synthesizer in Fig. 3b)
corresponding to an output mode of the output information from combined information obtained by linearly combining the pieces of characteristic information generated by the plurality of encoders. (Page 7: “In particular, we suggest using canonical correlation analysis… , which finds linear transformations of audio and video data, to form a shared representation.” Additionally, Equations 1 and 2 on p. 2 show linear combinations of hidden and visible vectors for the sparse RBM upon which the autoencoder in Fig. 3b is built.)

Regarding CLAIM 11, Nguyen teaches: The learning device according to claim 1, wherein the processor outputs the pieces of output information having content with a same characteristic from a plurality of the pieces of input information included in predetermined content. (The decoders in Fig. 3b output either an audio reconstruction with an audio modality or a video reconstruction with a video modality. The model is pre-trained only on inputs with only audio or video modalities according to Fig. 3 caption.)

Regarding CLAIM 12, Nguyen teaches: A generation device comprising:
	a processor operatively coupled to a memory, the processor including a plurality of encoders and a plurality of decoders, the processor being programmed to: (The experimental results on p. 4 are evidence of a memory and processor as claimed. Fig. 3b above shows the model encompasses encoders and decoders.)
		acquire a plurality of pieces of output information corresponding to a plurality of pieces of input information included in a predetermined content by executing a machine learning process by using (Acquiring a plurality of pieces of output information is shown by the audio and video reconstructions in Fig. 3b. The limitation “corresponding to a plurality of pieces of input information in a predetermined content” is interpreted as the audio and video reconstructions corresponding to the same modalities of the pre-training data used for training the audio and video encoders and decoders, 
a plurality of encoders that generate pieces of characteristic information indicating characteristics of the plurality of pieces of input information of different classifications, (Fig. 3b above shows two encoders each receiving a piece of input information. The arrow exiting each encoder is a piece of characteristic information. A “characteristic” and a “classification” are broadly interpreted as a modality such as audio or video.)
a synthesizer that generates synthesized information obtained by synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders, and (Fig. 3b above shows a synthesizer receiving two pieces of characteristic information from the encoders. The two arrows exiting the synthesizer are pieces of synthesized information.)
a plurality of decoders that generate a plurality of pieces of output information -6-Application No. 15/996,968corresponding to the plurality of pieces of input information of different classifications from the synthesized information generated by the synthesizer; (Fig. 3b above shows two decoders each receiving a piece of synthesized information. The arrow exiting each decoder is a pieces of output information with a modality of either audio or video. “Different” is interpreted as either audio or video.)
		generate corresponding content corresponding to the predetermined content from the plurality of pieces of output information; and (Referring to Fig. 3b and the captions of Fig. 2-3, the output information generated by the decoder already corresponds to the predetermined content. The audio reconstruction is output from the encoders and decoders which had been pre-trained on audio training data, and the video reconstruction is output from the encoders and decoders which had been pre-trained on video training data.)
output the generated corresponding content and a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders. (Fig. 3b 

Regarding CLAIM 13, Nguyen teaches (where the terms surrounded by single brackets [ ] are interpreted as being recited by Examiner): A learning method executed by a learning device, the method comprising: (Ngiam teaches a learning device is a multimodal autoencoder Fig. 3b on p. 4, col. 1)
acquiring a plurality of pieces of input information, the plurality of input information having different classifications; (The acquiring step is shown by the arrows after the input layer in Fig. 3b. One piece of input information is the audio input data, and another piece of input information is the video input data. Audio and video comprise different classifications. Input data is described in § 4.1 – “10 contiguous audio frames were used as the input to our models” and “We used 4 contiguous video frames for input”)
executing a machine learning process by implementing a model that receives the plurality of pieces of input information as inputs, and outputs a plurality of pieces of output information corresponding to the respective pieces of input information, the machine learning process including: (A plurality of pieces of output information is interpreted as a first piece of output information being the audio reconstruction and a second piece of output information being the video reconstruction. The model as indicated above in Fig. 3b encompasses encoders, the synthesizer, and the decoder. The model inputs the audio and video input data and outputs the audio and video reconstructions. Ngiam teaches implementing a model as training the model (p. 3, ¶ 3): “we propose training the bimodal deep 
	generating, by a plurality of encoders, a plurality of [pieces of] characteristic information indicating characteristics of the plurality of pieces of input information based on evaluating the plurality of pieces of input information, (Fig. 3b above shows two encoders each receiving a piece of input information. In the limitation “a plurality of [pieces of] characteristic information indicating characterstics,” the term “characterstics” is broadly interpreted as the modalities of audio and video, and the term “[pieces of] characterstic information” is broadly interpreted as hidden variables output by the encoders indicating the modality of the corresponding input data. Thus, the arrow exiting each encoder is a hidden variable indicating a modality. Ngiam states, regarding the Restricted Boltzmann Machines of Figs. 2 and 3, “Informally, the first layer representations correspond to phonemes and visemes” (p. 3, ¶1).)
	generating synthesized information by synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders, and (Fig. 3b above shows a synthesizer receiving two pieces of characteristic information from the encoders. The two arrows exiting the synthesizer are pieces of synthesizing information)
	generating, by a plurality of decoders, a plurality of pieces of output information of different classifications from the synthesized information; and (Fig. 3b above shows two decoders each receiving a piece of synthesizing information. The arrow exiting each decoder is a piece of output information of either audio or video modalities. “Different” is interpreted as either audio or video.)
outputting a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders. (Fig. 3b above shows the Audio Reconstruction output on the left and the Video Reconstruction output on the right. A set is interpreted as the union of the Audio Reconstruction output and the Video Reconstruction output, and a sub-set is 

	Regarding CLAIM 14, Nguyen teaches: A generation method executed by a generation device, the method comprising: 
acquiring a plurality of pieces of output information corresponding to a plurality of pieces of input information included in a predetermined content by executing a machine learning process by using (Acquiring a plurality of pieces of output information is shown by the audio and video reconstructions in Fig. 3b. The limitation “corresponding to a plurality of pieces of input information in a predetermined content” is interpreted as the audio and video reconstructions corresponding to the same modalities of the pre-training data used for training the audio and video encoders and decoders, as discussed in the Fig. 2 and 3 captions. The pre-training data is interpreted as predetermined content. Nguyen states the TIMIT dataset is used for unsupervised audio feature pre-training (p. 5, col. 1).) 
a plurality of encoders that generate pieces of characteristic information indicating characteristics of the plurality of pieces of input information of different classifications, (Fig. 3b above shows two encoders each receiving a piece of input information. The arrow exiting each encoder is a piece of characteristic information. A “characteristic” and a “classification” are broadly interpreted as a modality such as audio or video.)
a synthesizer that generates synthesized information obtained by synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders, and (Fig. 3b above shows a synthesizer receiving two pieces of characteristic information from the encoders. The two arrows exiting the synthesizer are pieces of synthesized information.)
a plurality of decoders that generate a plurality of pieces of output information corresponding to the plurality of pieces of input information of different classifications from the synthesized information generated by the synthesizer; (Fig. 3b above shows two decoders each receiving a piece of synthesized information. The arrow exiting each decoder is a pieces of output information with a modality of either audio or video. “Different” is interpreted as either audio or video.)
generating corresponding content corresponding to the predetermined content from the acquired plurality of pieces of output information; and (Referring to Fig. 3b and the captions of Fig. 2-3, the output information generated by the decoder already corresponds to the predetermined content. The audio reconstruction is output from the encoders and decoders which had been pre-trained on audio training data, and the video reconstruction is output from the encoders and decoders which had been pre-trained on video training data.)
outputting the generated corresponding content and a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders. (Fig. 3b above shows the Audio Reconstruction output on the left and the Video Reconstruction output on the right. A set is interpreted as the union of the Audio Reconstruction output and the Video Reconstruction output, and a sub-set is interpreted as the Audio Reconstruction output alone. The Audio Reconstruction output is “based on outputs of the plurality of decoders” having itself been output by a decoder.)

Regarding CLAIM 15, Nguyen teaches (where the terms surrounded by single brackets [ ] are interpreted as being recited by Examiner): A non-transitory computer-readable storage medium having stored therein a learning program that causes a computer to execute a process comprising: (The experimental results on p. 4 are evidence of a storage medium)
acquiring a plurality of pieces of input information, the plurality of input information having different classifications; (The acquiring step is shown by the arrows after the input layer in Fig. 3b. One piece of input information is the audio input data, and another piece of input information is the video Input data is described in § 4.1 – “10 contiguous audio frames were used as the input to our models” and “We used 4 contiguous video frames for input”)
	executing a machine learning model by implementing a model that receives the plurality of pieces of input information as inputs, and outputs a plurality of pieces of output information corresponding to the respective pieces of input information, the machine learning process including: (A plurality of pieces of output information is interpreted as a first piece of output information being the audio reconstruction and a second piece of output information being the video reconstruction. The model as indicated above in Fig. 3b encompasses encoders, the synthesizer, and the decoder. The model inputs the audio and video input data and outputs the audio and video reconstructions. Ngiam teaches implementing a model as training the model (p. 3, ¶ 3): “we propose training the bimodal deep autoencoder (Figure 3b) using an augmented but noisy dataset with additional examples that have only a single-modality as input.”)
		generating, by a plurality of encoders, a plurality of pieces of characteristic information indicating characteristics of the plurality of pieces of input information; (Fig. 3b above shows two encoders each receiving a piece of input information. In the limitation “a plurality of [pieces of] characteristic information indicating characterstics,” the term “characterstics” is broadly interpreted as the modalities of audio and video, and the term “[pieces of] characterstic information” is broadly interpreted as hidden variables output by the encoders indicating the modality of the corresponding input data. Thus, the arrow exiting each encoder is a hidden variable indicating a modality. Ngiam states, regarding the Restricted Boltzmann Machines of Figs. 2 and 3, “Informally, the first layer representations correspond to phonemes and visemes” (p. 3, ¶1).)
		generating synthesized information by synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders; and (Fig. 3b above shows a 
		generating, a plurality of decoders, a plurality of pieces of output information of different classifications from the synthesized information; and (Fig. 3b above shows two decoders each receiving a piece of synthesizing information. The arrow exiting each decoder is a piece of output information of either audio or video modalities. “Different” is interpreted as either audio or video.)
		outputting a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders. (Fig. 3b above shows the Audio Reconstruction output on the left and the Video Reconstruction output on the right. A set is interpreted as the union of the Audio Reconstruction output and the Video Reconstruction output, and a sub-set is interpreted as the Audio Reconstruction output alone. The Audio Reconstruction output is “based on outputs of the plurality of decoders” having itself been output by a decoder.)

	Regarding CLAIM 16, Nguyen teaches: A non-transitory computer-readable storage medium having stored therein a generation program that causes a computer to execute a process comprising: (The experimental results on p. 4 are evidence of a storage medium and a computer as claimed. Fig. 3b above shows the model encompasses encoders and decoders.)
aacquiring a plurality of pieces of output information corresponding to a plurality of pieces of input information included in a predetermined content by executing a machine learning process by using (Acquiring a plurality of pieces of output information is shown by the audio and video reconstructions in Fig. 3b. The limitation “corresponding to a plurality of pieces of input information in a predetermined content” is interpreted as the audio and video reconstructions corresponding to the same modalities of the pre-training data used for training the audio and video encoders and decoders, 
a plurality of encoders that generate pieces of characteristic information indicating characteristics of the plurality of pieces of input information of different classifications, (Fig. 3b above shows two encoders each receiving a piece of input information. The arrow exiting each encoder is a piece of characteristic information. A “characteristic” and a “classification” are broadly interpreted as a modality such as audio or video.)
a synthesizer that generates synthesized information obtained by synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders, and (Fig. 3b above shows a synthesizer receiving two pieces of characteristic information from the encoders. The two arrows exiting the synthesizer are pieces of synthesized information.)
a plurality of decoders that generate a plurality of pieces of output information corresponding to the plurality of pieces of input information of different classifications from the synthesized information generated by the synthesizer; (Fig. 3b above shows two decoders each receiving a piece of synthesized information. The arrow exiting each decoder is a pieces of output information with a modality of either audio or video. “Different” is interpreted as either audio or video.)
generating corresponding content corresponding to the predetermined content from the acquired plurality of pieces of output information; and (Referring to Fig. 3b and the captions of Fig. 2-3, the output information generated by the decoder already corresponds to the predetermined content. The audio reconstruction is output from the encoders and decoders which had been pre-trained on audio training data, and the video reconstruction is output from the encoders and decoders which had been pre-trained on video training data.)
outputting the generated corresponding content and a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders. (Fig. 3b 

Regarding CLAIM 17, Nguyen teaches: A non-transitory computer-readable storage medium having stored therein a program that causes a computer to (The experimental results on p. 4 are evidence of a memory and processor as claimed. Fig. 3b above shows the model encompasses encoders and decoders.)
execute a machine learning process by implementing a model that receives a plurality of pieces of input information as inputs and outputs a plurality of pieces of output information corresponding to the respective pieces of input information, the machine learning process comprising: (A plurality of pieces of output information is interpreted as a first piece of output information being the audio reconstruction and a second piece of output information being the video reconstruction. The model as indicated above in Fig. 3b encompasses encoders, the synthesizer, and the decoder. The model inputs the audio and video input data and outputs the audio and video reconstructions. Ngiam teaches implementing a model as training the model (p. 3, ¶ 3): “we propose training the bimodal deep autoencoder (Figure 3b) using an augmented but noisy dataset with additional examples that have only a single-modality as input.”)
generating, by a plurality of encoders, a plurality of pieces of characteristic information indicating characteristics of the plurality of pieces of input information based on evaluating the plurality of pieces of input information of different classifications;-10-Application No. 15/996,968  (Fig. 3b above shows two encoders each receiving a piece of input information. In the limitation “a plurality of [pieces of] characteristic 
generating synthesized information by synthesizing a combination of the plurality of pieces of characteristic information generated by the plurality of encoders; (Fig. 3b above shows a synthesizer receiving two pieces of characteristic information from the encoders. The two arrows exiting the synthesizer are pieces of synthesizing information)
generating, by a plurality of decoders, a plurality of pieces of output information corresponding to the plurality of pieces of input information of different classifications from the generated synthesized information; and (Fig. 3b above shows two decoders each receiving a piece of synthesizing information. The arrow exiting each decoder is a piece of output information of either audio or video modalities. “Different” is interpreted as either audio or video.)
outputting a sub-set of the plurality of pieces of output information having a same classification based on outputs of the plurality of decoders. (Fig. 3b above shows the Audio Reconstruction output on the left and the Video Reconstruction output on the right. A set is interpreted as the union of the Audio Reconstruction output and the Video Reconstruction output, and a sub-set is interpreted as the Audio Reconstruction output alone. The Audio Reconstruction output is “based on outputs of the plurality of decoders” having itself been output by a decoder.)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Ngiam in view of “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models” to Kiros et al.

CLAIM 4, Ngiam teaches: The learning device according to claim 1, wherein the processor learns at least (The model is trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
a first encoder of the plurality of encoders that generates a characteristic of an image, (Videos consist of frames of images. A first encoder is the video encoder in Fig. 3b)
a second encoder of the plurality of encoders that generates a characteristic of …, (The audio encoder in Fig. 3b)
a synthesizer that generates synthesized information obtained by synthesizing the characteristic of the image and the characteristic of the text respectively generated by the first encoder and the second encoder, (The synthesizer in Fig. 3b)
a first decoder of the plurality of decoders that generates output information corresponding to the image from the synthesized information, and (The video decoder in Fig. 3b)
a second decoder of the plurality of decoders that generates output information corresponding to the … from the synthesized information. (The audio decoder in Fig. 3b)

While Ngiam teaches training a bimodal autoencoder with the modalities of audio and video, Ngiam does not explicitly teach training an autoencoder with a text modality. Kiros in Fig. 2 teaches training an encoder-decoder with text inputs. Kiros’ encoder-decoder is analogous to Ngiam’s autoencoder. Because both Ngiam and Kiros teach bimodal encoder-decoder neural networks, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have replaced Ngiam’s audio modality with a text modality for the purpose of generating image captions. (Kiros p. 1, last paragraph: “This paper describes a new approach to the problem of image caption generation, casted into the framework of encoder-decoder models. For the encoder, we learn a joint image-sentence embedding where sentences are encoded using long short-term memory (LSTM) recurrent neural networks [1]. Image features from a deep convolutional network are projected into the embedding space of the LSTM hidden states.”)

Claims 8 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Ngiam in view of “Learning Grounded Meaning Representations with Autoencoders” to Silberer et al. Silberer Fig. 1 is shown below with annotations

    PNG
    media_image2.png
    410
    807
    media_image2.png
    Greyscale

Silberer Fig. 1 (annotated)

Regarding CLAIM 8, Ngiam teaches: The learning device according to claim 1, 
wherein the processor learns a plurality of models that have a structure corresponding to a classification of the input information …, and (A “plurality of models” are broadly interpreted as the right and left sides of Fig. 3b, hereinafter called sub-models. A first sub-model is the audio encoder and decoder and the second sub-model is the video encoder and decoder, with the synthesizer being shared by both. The sub-models are trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
learns the plurality of encoders that generate the characteristic information… generated by each model of the plurality of models. (The sub-models are trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
	However, Ngiam does not explicitly teach: and generate an intermediate representation indicating the characteristic of input information and from the intermediate representation
	But Silberer teaches: and generate an intermediate representation indicating the characteristic of input information and from the intermediate representation (Both of these limitations are taught by Silberer Fig. 1, in which the first encoding layer generates an intermediate representation and the second encoding layer is an encoder.).
Silberer is in the same filed of endeavor as the claimed invention, namely multimodal autoencoders. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have added a second encoding layer for each sub-model/modality in Ngiam’s autoencoder according to Silberer’s teachings, with a motivation to learn higher-level embeddings from the inputs. (Silberer’s Abstract: “We introduce a new model which uses stacked autoencoders to learn higher-level embeddings from textual and visual input.”)


Regarding CLAIM 10, Ngiam teaches: The learning device according to claim 1, 
wherein the processor learns the plurality of encoders and the plurality of decoders (The plurality of encoders and plurality of decoders are trained according to p. 3, ¶ 3. A processor is implied by the experiments.)
However, Ngiam does not explicitly teach: included in a plurality of groups of an encoder and a decoder, and each of the plurality of groups has learned characteristics of pieces of information belonging to the different classifications.
included in a plurality of groups of an encoder and a decoder, and each of the plurality of groups has learned characteristics of pieces of information belonging to the different classifications. (Referring to Silberer Fig. 1, a first group of an encoder consists of the two encoding layers of the text modality, and a second group of an encoder consists of the two encoding layers of the image modality. A first group of a decoder consists of the two decoding layers of the text modality, and a second group of a decoder consists of the two decoding layers of the image modality.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have included Silberer’s two encoding layers per modality and two decoding per modality in Ngiam’s autoencoder with a motivation to learn higher-level embeddings from textual and visual input. (Silberer’s Abstract: “We introduce a new model which uses stacked autoencoders to learn higher-level embeddings from textual and visual input.”)

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Ngiam et al. (“Multimodal Deep Learning”), in view of Silberer et al. (“Learning Grounded Meaning Representations with Autoencoders”), and further in view of Kiros et al. (“Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models”).

Regarding CLAIM 9, the combination of Ngiam and Silberer teaches: The learning device according to claim 8, 
wherein the processor learns a model (Ngiam’s implied processor trains a model according to p. 3, ¶ 3.)
that is a… neural network as a model that-5-Application No. 15/996,968 generates an intermediate representation of the input information that is text, and learns a model that is a… neural network as a model that generates an intermediate representation of the input information that is an image. (Neural network is Ngiam’s 
However, the combination of Ngiam and Silberer does not explicitly teach: recurrent neural network and convolution neural network 
But Kiros teaches: recurrent neural network and convolution neural network (Kiros’ Fig. 2 caption on page 3 states the encoder contains a deep convolutional network (CNN) and a long short-term memory recurrent network (LSTM) for learning a joint image-sentence embedding.) 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have included Kiros’ CNN and LSTM in the Ngiam and Silberer’s autoencoder by processing the image modality with a CNN and text modality with an LSTM, with a motivation to “generate descriptions by sampling from conditional neural language models” (Kiros p. 3, Neural Network Methods). Although Kiros’ LSTM has a single hidden layer (p. 5), the combination with Silberer expands it to multiple hidden layers.	

Response to Arguments
The following is Examiner’s response to Applicant’s remarks, claim amendments, and specification amendment filed on 09/02/2021.

Objection to the title: The objection to the title is withdrawn due to the specification amendment. The amendment has been entered.

Objections to the claims: The objections to the claims 1, 2, 13, and 15 are withdrawn due to the claim amendments. Examiner withdraws an objection to claim 2 made in the previous office action (“In claim 2, line 6 should read ‘is of the same classification’ or similarly.”) because the term “is” was used correctly. In pending claim 2, “has” should be reverted to “is”. 

Claim Interpretations: The pending claims do not invoke 35 U.S.C. 112(f). They are not being interpreted under 35 U.S.C. 112(f). The previous 112(f) interpretation is withdrawn.

Claim Rejection Under 35 U.S.C. 112: The rejection of claim 11 under 35 U.S.C. 112(b) is withdrawn because the relative term has been deleted from the pending claims.

Claim Rejections Under 35 U.S.C. 101 (Remarks pp. 12-15): Applicant's arguments have been fully considered but they are not persuasive.
Applicant’s argument #1: 

    PNG
    media_image3.png
    447
    650
    media_image3.png
    Greyscale

Examiner’s response #1: None of pending claims 1 and 12-17 positively recites learning/training a model. They recite execute/executing a machine learning process which is broadly interpreted as a mental process further defined by limitations also classified as mental processes, as demonstrated in the 35 U.S.C. 101 rejection of this office action. Each abstract idea limitation in this office action includes an example of how the limitation could reasonably be performed in one’s mind. 
Applicant’s argument #2: 

    PNG
    media_image4.png
    481
    702
    media_image4.png
    Greyscale

Examiner’s response #2: The pending claims are directed to abstract ideas for the reasons stated in the 35 U.S.C. 101 rejections in this action.

Applicant’s argument #3: 

    PNG
    media_image5.png
    184
    625
    media_image5.png
    Greyscale
 
Examiner’s response #3: The pending claims are directed to abstract ideas and the claim as a whole does not integrate the judicial exception into a practical application for the reasons stated in the 35 U.S.C. 101 rejections in this action.

Applicant’s argument #4: 

    PNG
    media_image6.png
    452
    647
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    265
    649
    media_image7.png
    Greyscale


    PNG
    media_image8.png
    185
    624
    media_image8.png
    Greyscale

Examiner’s response #4: Pending claims 1 and 12-17 do not recite any improvements in machine learning technology. They do not positively recite the improvements quoted in the specification. 
The rejections of claims 1-17 under 35 U.S.C. 101 are maintained.

Claim Rejections Under 35 U.S.C. 102 (Remarks pp. 15-16): Claims 1-3, 5-7, and 11-17 were rejected under U.S.C. 102(a)(1) over “Multimodal Deep Learning” to Ngiam. Applicant's arguments have been fully considered but they are not persuasive. 
Applicant recites part of pending claim 1 on p. 15 and then states, “Ngiam fails to disclose at least these features of independent claim 1 (and similarly claims 12-17).” In the first paragraph on p. 16, Applicant states that Ngiam’s deep autoencoder does not correspond to the limitations of claim 1. 
    PNG
    media_image9.png
    403
    649
    media_image9.png
    Greyscale

Examiner’s response: Applicant’s statement that Ngiam’s teaching “is a broad and vague description of machine learning using a training set” fails to demonstrate how the pending claims overcome the specific prior art claim mappings of the previous rejection. Ngiam teaches the specific features (i), (ii), and (iii) of the claim, as demonstrated in this action’s claim rejections under 35 U.S.C. 102. The rejections of pending claims 1-3, 5-7, and 11-17 are maintained.

Claim Rejections Under 35 U.S.C. 103 (Remarks p. 17): Claim 4 was rejected under 35 U.S.C. 103 as being unpatentable over Ngiam in view of “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models” to Kiros et al. Claims 8 and 10 were rejected under 35 U.S.C. 103 as being unpatentable over Ngiam in view of “Learning Grounded Meaning Representations with Autoencoders” to Silberer et al. Claim 9 was rejected under 35 U.S.C. 103 as being unpatentable over Ngiam, in view of .

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Neumann et al. (US 20170071671 A1) discloses a multimodal autoencoder with three types of input data in Fig. 4 and ¶ [0056] and [0058].
Kramer et al. (US 20170293736 A1) discloses at ¶ [0028]: “The example of FIG. 2 shows three modality-specific sub-networks 206, 208, 210.”
Kecskemethy et al. (US 20190197366 A1) discloses a bimodal autoencoder 1 in Fig. 1. The input to each encoder 20, 21 is preceded by decoders 10, 11 which may have different modalities - see ¶ [0041]. Finally, ¶ [0032] states any encoders or decoders used are CNNs and/or RNNs.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ASHER JABLON/Examiner, Art Unit 2127                                                                                                                                                                                                        
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127