Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim 1-10 are pending.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
2A Prong 1: The limitation of extract one or more feature values from input data is a mental process, as it merely recites figuring out what features does the input values have, which can be done with the aid of pen and paper. The process of extracting feature also amounts to observation process, which is a mental process. The limitation of each of the plurality of autoencoders is assigned any one class of the classes to be classified as a target class and learns the one or more feature values according to whether the class with which the input data is labeled, is identical to the target class is a mental process, as it merely recites a process of analyzing and judgement.
2A Prong 2: This judicial exception is not integrated into a practical application. A backbone network, a plurality of autoencoders as many of which are provided as the number of classes to be classified of the input data are field of use or technological environment (MPEP 2106.05(h)).
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. A backbone network, a plurality of autoencoders as many of which are provided as the number of classes to be classified of the input data are field of use or technological environment (MPEP 2106.05(h)). 

Claim 6 is a method claim having similar limitation to the apparatus claim 1. Therefore, it is rejected under the same rationale as the claim 1 above.

Regarding claim 2, the limitation of the autoencoder includes: an encoder learned so as to receive the one or more feature values and output different encoding values according to whether the labeled class is identical to the target class and a decoder learned so as to receive the encoding value and output the same value as the feature value input to the encoder are mere data gathering (MPEP 2106.05(g)), as the limitation recites a process of receiving input data.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim 7 is a method claim having similar limitation to the apparatus claim 2. Therefore, it is rejected under the same rationale as the claim 2 above.

Regarding claim 3, the limitation of an absolute value of the encoding value approaches zero when the labeled class is identical to the target class and so that the absolute value of the encoding value becomes farther from zero when the labeled class is different from the target class is a mental process, as the limitation merely recites the result of calculation having larger value if the classification result is further from the target class, which can be done with the aid of pen and paper. The limitation also amounts to mathematical process, as the limitation also recites the calculation process.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The encoder is learned is a field of use or technological environment (MPEP 2106.05(h)).

Claim 8 is a method claim having similar limitation to the apparatus claim 3. Therefore, it is rejected under the same rationale as the claim 3 above.

Regarding claim 4, the limitation of wherein, when the labeled class is not present in the input data, the algorithm is learned so that marginal entropy loss of encoding values output from the plurality of encoders is minimized is a mental process, as it amounts to observing and judging if the class exists, and then learning.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. A plurality of encoders provided in each of the plurality of autoencoders are learned is a field of use or technological environment (MPEP 2106.05(h)).

Claim 9 is a method claim having similar limitation to the apparatus claim 4. Therefore, it is rejected under the same rationale as the claim 4 above.

Regarding claim 5, the limitation of further comprising a predictor configured to, when test data is input, compare sizes of encoding values output and determine a target class corresponding to a smallest encoding value as a class to which the test data belongs as a result of the comparison is a mental process, as it merely recites inputting data and comparing sizes of the outputs.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The backbone network and plurality of encoders provided in each of the plurality of autoencoders are field of use or technological environment (MPEP 2106.05(h)).

Claim 10 is a method claim having similar limitation to the apparatus claim 5. Therefore, it is rejected under the same rationale as the claim 5 above.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-2, and 6-7 are rejected under 35 U.S.C. 103 over Yang (US 20160140438 A1) in view of Soliz (US 10610098 B1).

Regarding claim 1, Yang teaches a semi-supervised learning apparatus comprising: a backbone network configured to extract one or more feature values from input data ([Yang, Claim 11] “11. A learning system, comprising: low level feature extractors; high level feature extractors coupled to the low level feature extractors; and a plurality of classifiers receiving high and low level features, with a softmax loss on auxiliary data and softmax loss on fine-grained data, the classifiers forming a hyper-class augmented and regularized deep Convolution Neural Network (CNN)”, [Yang, Fig. 1A] ‘the lower level features’ and ‘high level features’ networks corresponds to the backbone network); and 
a plurality of autoencoders as many of which are provided as the number of classes to be classified of the input data ([Yang, 0016; Figure 1A and 1B] “Next, the Hyper-class Regularized Learning Model engine is discussed. Before describing the details of our model engine, we first introduce some notations and terms used throughout the paper. Let D.sub.t={(x.sub.1.sup.t, y.sub.1.sup.t), . . . , (x.sub.n.sup.t, y.sub.n.sup.t)} be a set of training fine-grained images with y.sub.i.sup.tε{1, . . . , C} indicating the fine-grained class label (e.g., make, model and year of a car) of image x.sub.i.sup.t, and let D.sub.a={(x.sub.1.sup.a, v.sub.1.sup.a), . . . , (x.sub.m.sup.a, v.sub.m.sup.a)} be a set of auxiliary images, where v.sub.i.sup.aε{1, . . . , K} indicates the hyper-class label of image x.sub.i.sup.a (e.g., view-point of a car). If v denotes a super-class, then we let v.sub.c be the super-class of the fine-grained class c. In the sequel, the two terms ‘classifier’ and ‘recognition model’/‘model engine’ are used interchangeably”, as shown in the disclosure, Fig 1A and 1B, the classification system classifies the input data using hyper-class and fine-grained class. These are the 2 classes provided to each of the neural networks. We interpret the classifier neural networks as the autoencoder), 
wherein each of the plurality of autoencoder is assigned any one class of the classes to be classified as a target class and learns the one or more feature values according to whether the class with which the input data is labeled, is identical to the target class ([Yang, 0017] “The goal is to learn a recognition model engine that can predict the fine-grained class label of an image. In particular, we aim to learn a prediction function given by Pr(y|x), i.e., given the input image how likely it belongs to different fine-grained classes. Similarly, we let Pr(v|x) denote the hyper-class classification model engine. Given the fine-grained training images and the auxiliary hyper-classes labeled images, a straightforward strategy is to train a multi-task deep CNN, by sharing common features and learning classifiers separately. Multi-task deep learning has been observed to improve the performance of individual tasks. To further improve this simple strategy, we disclose a novel multi-task regularized learning framework by exploiting regularization between the fine-grained classifier and the hyper-class classifier. We begin with the description of the model engine regularized by factor-class”, 
[Yang, 0019] “where Pr(v|x) is the probability of any factor-class v and Pr(y|v, x) specifies the probability of any fine-grained class given the factor-class and the input image x. If we let h(x) denote the high level features of x, we model the probability Pr(v|x) by a softmax function”, each of the neural networks are assigned to fine-grained and hyper-class classes, and learns the high level features of x. We interpret the classifier neural networks as the autoencoder).
However, Yang does not specifically teach an autoencoder connected to the feature extractor.
Soliz teaches autoencoder connected to the feature extractor ([Soliz, column 9, line 18-27] “(25) Referring now to FIG. 7, an encoding method according to one embodiment of the present invention is applied to retinal images. The image obtained by a camera that captures images having a low fidelity data set is processed using a feature extraction step such as AMFM to produce a processed feature set at the feature extraction step. The processed feature set is pooled and serves as the input for the auto-encoders comprising the steps of encoding and decoding. The encoded feature set is unpooled and reconstructed using an AMFM step, for example”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Yang and Soliz to use the connection of feature extractor and autoencoder of Soliz to implement the machine learning system of Yang. The suggestion and/or motivation to do so is to accelerate the feature extraction process, as the structure is designed to learn useful filters, which bring out key features from images ([Soliz, column 2, line 64 – column 3, line 3]). 

Claim 6 is a method claim having similar limitation to the apparatus claim 1. Therefore, it is rejected under the same rationale as the claim 1 above.

Regarding claim 2, Yang in view of Soliz teaches wherein the autoencoder includes: an encoder learned so as to receive the one or more feature values and output different encoding values according to whether the labeled class is identical to the target class ([Soliz, column 9, line 23-48] “The processed feature set is pooled and serves as the input for the auto-encoders comprising the steps of encoding and decoding. The encoded feature set is unpooled and reconstructed using an AMFM step, for example. Regarding FIG. 8A and FIG. 8B, the lowest fidelity's ultimate encoded feature set for each image is illustrated according to an exemplary processing of the images with an encoder system and method as discussed herein. An example ‘encoding’ of these images are in the following encoded feature set wherein FIG. 8A is the left image with no pathology and having the encoded feature set [0000110000000011]. FIG. 8B is the image on the right with pathology and having the encoded feature set [1011111110011111]. A plurality of images obtained with a variety of imagers having different fidelity will make up a database of images. Given sufficient examples in each class (i.e., with or without pathology), these encoded feature sets would group or cluster into the same set or bucket. That is, other images with no pathology would have similar vectors. This grouping expands/grows to N number of classes. Thus auto-encoding produces more classes than what humans would typically label. Further still an encoded feature set could be used to index and/or sort the information and be treated as encrypted while still being mappable to the original image”); and 
a decoder learned so as to receive the encoding value and output the same value as the feature value input to the encoder ([Soliz, column 2, line 41-46] “An autoencoder is an artificial neural network (ANN) model that learns parameters for a series of nonlinear transformations that encodes the input into a lower dimensional space or feature set and decodes a reconstructed output, with the goal of minimizing the differences between the original input and the decoded output”).
Claim 7 is a method claim having similar limitation to the apparatus claim 2. Therefore, it is rejected under the same rationale as the claim 2 above.

Claim 3 and 8 are rejected under 35 U.S.C. 103 over Yang (US 20160140438 A1) in view of Soliz (US 10610098 B1), in view of Schwartz (Schwartz et al, 2018, “Δ-encoder: an effective sample synthesis method for few-shot object recognition”), and further in view of ANNAPUREDDY (US 20150269481 A1).

Regarding claim 3, Yang in view of Soliz teaches the semi-supervised learning apparatus of claim 2. 
Yang in view of Soliz does not specifically teach wherein the encoder is learned so that an absolute value of the encoding value approaches zero when the labeled class is identical to the target class and so that the absolute value of the encoding value becomes farther from zero when the labeled class is different from the target class.
Schwartz teaches an encoder is learned so that the output is the difference between the label class and target class ([Schwartz, page 3, 3 The Δ-encoder, 2nd paragraph, line 1-2 - page 4, 2nd paragraph, line 1-7] “The encoder learns to extract transferable deformations between pairs of examples of the same class … Following training, at the sample synthesis phase, we use the trained network to sample from P(XjC; Y ). We use the non-parametric distribution of Z by sampling random pairs fXs; Y sg from the classes seen during training (such that Xs and Y s belong to the same category) and generating from them Z = E(Xs; Y s) using the trained encoder. Thus, we end up with a set of samples fZig. In each of the one-shot experiments, for a novel unseen class U we are provided with an example Y u, from which we synthesize a set of samples for the class U using our trained generator model: fD(Zi; Y u)g.”, the encoder of Schwartz extracts deformation (difference) between a pair of the input, which corresponds to the process of outputting the difference between label class and target class). 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Yang, Soliz, and Schwartz to use an encoder is learned so that the output is difference between the label and target class of Schwartz to implement the machine learning system of Yang and Soliz. The suggestion and/or motivation to do so is to make the system simpler and reduce the computation resources used, as the method of Schwartz is simpler and more effective compared to other encoders in learning to sample from the class distribution after being provided with one or a few examples of that class ([Schwartz, 1 Introduction, last paragraph]). 
Yang in view of Soliz and further in view of Schwartz does not explicitly teach wherein the encoder is learned so that an absolute value of the encoding value approaches zero when the labeled data is identical to the target data and so that the absolute value of the encoding value becomes farther from zero when the labeled data is different from the target data.
ANNAPUREDDY teaches wherein the encoder is learned so that an absolute value of the encoding value approaches zero when the labeled data is identical to the target data and so that the absolute value of the encoding value becomes farther from zero when the labeled data is different from the target data ([ANNAPUREDDY, 0008] An apparatus for performing differential encoding in a spiking neural network in accordance with another aspect of the present disclosure includes means for predicting an activation value for a neuron in the neural network based on at least one previous activation value for the neuron. Such an apparatus further includes means for encoding a value based on a difference between the predicted activation value and an activation value for the neuron in the neural network”, the labeled data corresponds to the predicted data, and the target data corresponds to the actual data.
[ANNAPUREDDY, 0042] “… as a function of time difference between spike time t.sub.pre of the presynaptic neuron and spike time t.sub.post of the postsynaptic neuron (i.e., t=t.sub.post - t.sub.pre). A typical formulation of the STDP is to increase the synaptic weight (i.e., potentiate the synapse) if the time difference is positive (the presynaptic neuron fires before the postsynaptic neuron), and decrease the synaptic weight (i.e., depress the synapse) if the time difference is negative (the postsynaptic neuron fires before the presynaptic neuron)”, teaches how to calculate the difference. The absolute value of the difference is going to be bigger if both are different, and zero if both are the same).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Yang, Soliz, Schwartz, and ANNAPUREDDY to use the method of encoder output becomes larger if it is further from the actual data of ANNAPUREDDY to implement the machine learning system of Yang, Soliz, and Schwartz. The suggestion and/or motivation to do so is to measure how good the prediction is, as larger difference between predicted and actual value means the bad prediction ([ANNAPUREDDY, 0068]). 

Claim 8 is a method claim having similar limitation to the apparatus claim 3. Therefore, it is rejected under the same rationale as the claim 3 above.

Claim 4 and 9 are rejected under 35 U.S.C. 103 over Yang (US 20160140438 A1) in view of Soliz (US 10610098 B1), in view of Akselrod-Ballin (US 10223610 B1), and further in view of Liu (US 20190138860 A1).

Regarding claim 4, Yang in view of Soliz teaches the semi-supervised learning apparatus of claim 2, and plurality of encoders provided in each of the plurality of autoencoders ([Soliz, column 4, line 18-29] “One aspect of one embodiment of the present invention provides an automated approach based on combining a plurality of stacked autoencoders to create a deep neural network for feature extraction. The layering is done to create a span of cameras with different resolution/quality of images, ranging from state-of-art fundus cameras to portable hand-held retinal cameras, in decreasing order of resolution/quality from top to bottom of the stack. Each layer of the stack is its own, individual AM-FM-enhanced stacked autoencoder. In the next section, we detail different components of the system”).
Yang in view of Soliz does not specifically teach wherein, when the labeled class is not present in the input data, a plurality of encoders provided in each of the plurality of autoencoders are learned so that marginal entropy loss of encoding values output from the plurality of encoders is minimized.
Akselrod-Ballin teaches performing specific operation when the labeled class is not present in the input data ([Akselrod-Ballin, column 15, line 31-52] “Optionally, the one or more classifications are stored in the ERM system as projections marked on the received image. Optionally, the one or more classifications are stored in the ERM system as an extraction from the received image. Optionally the one or more classifications stored in the ERM system have a class label, a bounding box describing a locality within the received image and a probability score that a finding of the labeled class exists in the received image at the locality described by the bounding box … An abnormal finding may be determined by comparing the class label of the one or more classifications with a predefined set of abnormal classes, and by comparing the probability score of the one or more classifications with a threshold score value. When at least one classification of the one or more classifications has a class in the predefined set of abnormal classes and a probability score greater than the threshold score value, the system may determine an abnormal finding and optionally issue an alert to a person…”, teaches it alerts when a specific classification exists in the input data)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Yang, Soliz, and Akselrod-Ballin to use the method of performing specific operation when the labeled class is not present in the input data of Akselrod-Ballin to implement the machine learning system of Yang and Soliz. The suggestion and/or motivation to do so is to improve the accuracy of the classifier by finding an abnormality ([Akselrod-Ballin, column 15, line 31-52]). 
Yang in view of Soliz and further in view of Akselrod-Ballin does not specifically teach wherein, an encoder is learned so that marginal entropy loss of encoding values output from the plurality of encoders is minimized.
Liu teaches wherein, an encoder is learned so that marginal entropy loss of encoding values output from the plurality of encoders is minimized ([Liu, 0107] “In Equation 2, L.sub.1 represents the cross-entropy loss for the font classifier, and L.sub.2 represents the cross-entropy loss for the glyph classifier. In addition, θ.sub.G represents the parameters in the encoder, θ.sub.D.sub.1 represents the parameters in the font classifier, and θ.sub.D.sub.2 represents the parameters in the glyph classifier. As shown, the minimax objective function minimizes font classification loss by optimizing the encoder parameters and font classifier parameters. In addition, the minimax objective function also tries to minimize glyph classification loss by optimizing the glyph classification parameters. At the same time, the minimax objective function tries to adversarially maximize the glyph classification loss by tuning the encoder parameters to discount glyph extraction features, such that the encoder generates feature vectors that are indistinguishable between different glyphs”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Yang, Soliz, Akselrod-Ballin, and Liu to use the encoder is learned so that marginal entropy loss of encoding values output from the plurality of encoders is minimized of Liu to implement the machine learning system of Yang, Soliz, and Akselrod-Ballin. The suggestion and/or motivation to do so is to improve the accuracy of the prediction, as entropy loss measures the error of the prediction system ([Liu, 0048]). 

Claim 9 is a method claim having similar limitation to the apparatus claim 4. Therefore, it is rejected under the same rationale as the claim 4 above.

Claim 5 and 10 are rejected under 35 U.S.C. 103 over Yang (US 20160140438 A1) in view of Soliz (US 10610098 B1), and further in view of Hemmer (US 20200099954 A1).

Regarding claim 5, Yang in view of Soliz teaches the semi-supervised learning apparatus of claim 2, however, they fail to explicitly teach further comprising a predictor configured to, when test data is input to the backbone network, compare sizes of encoding values output from a plurality of encoders provided in each of the plurality of autoencoders and determine a target class corresponding to a smallest encoding value as a class to which the data belongs as a result of the comparison. 
Hemmer teaches further comprising a predictor configured to, when test data is input to the backbone network, compare sizes of encoding values output from a plurality of encoders provided in each of the plurality of autoencoders and determine a target class corresponding to a smallest encoding value as a class to which the data belongs as a result of the comparison ([Hemmer, 0008] “For example, the compressing of the frame of the video using the color prediction scheme based on the 3D object and the stored 3D object can include generating a first 3D object proxy based on the stored 3D object, encoding the first 3D object proxy using an auto encoder”, teaches the encoder belongs to an autoencoder, 
[Hemmer, 0071] “The compression size comparator 215 can be configured to select one of the outputs of the first encoder 205, the at least one second encoder 210-1, 210-2, . . . , or 210-i for saving in the video storage 125 (e.g., for later streaming to a client device). In an example implementation, the encoder output can be selected based on compression efficiency. For example, the compressed video (and/or compressed frames of a video) having the fewest number of bits (e.g., smallest value for x) can be saved”, teaches comparing the size of the output of the encoder and selecting the smallest one. It can be seen that Hemmer does not explicitly teach that the comparison is done when test data is input to the backbone network, however, primary reference Yang teaches this aspect, as it can be seen at [0034], Figure 1A and 1B: “The success of the disclosed framework has been tested on both publicly available small-scale fine-grained datasets and self-collected big car data. We anticipate that one could consider multi-task deep learning by considering regularization between different tasks”, as shown in 1A and 1B, inputs are given to the feature extractor (backbone network). Further, Yang at claim 11 shows the connection between the feature extractors and the plurality of classifiers).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Yang, Soliz, and Hemmer to use the method of comparing encoder outputs and selecting the smallest one, as taught by Hemmer, when the test data is inputted to the backbone network, as taught by Yang. The suggestion and/or motivation to do so is to improve the efficiency of the system, as shorter encoded value reduces the usage of memory space ([Hemmer, 0009]). 

Claim 10 is a method claim having similar limitation to the apparatus claim 5. Therefore, it is rejected under the same rationale as the claim 5 above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Regarding a feature extractor with multiple classifiers.
Fukuda et al, 2005, “Designing multiple distinctive phonetic feature extractors for canonicalization by using clustering technique”
Fu et al, 2016, “Semi-supervised classification of hyperspectral imagery based on stacked autoencoders”
Du, 2015, “Deep Neural Networks with Parallel Autoencoders for Learning Pairwise Relations: Handwritten Digits Subtraction”
US 5860032 A
US-20040218637-A1
US 6853318 B1
US-5644305-A
The references teach semi-supervised learning using autoencoder structure. The reference Fu 2016 teaches the stacked autoencoder, which is a neural network with stacked autoencoders.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on M-F 7:30AM – 4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JUN KWON/
Patent Examiner, Art Unit 2127
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126