Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on August 8, 2022, in which claims 1, 3-4, 6-7, and 9 are currently amended. Claims 1, 3-4, 6-7, and 9 are currently pending. 
Response to Arguments
Applicant’s arguments with respect to rejection of claims 1, 3-4, 6-7, and 9 under 35 U.S.C. 103 based on amendment have been considered, however, have not been deemed persuasive. 
Applicant asserts without direct support that Majumdar is completely silent on modifying the cost function to represent an asymmetric SAE layer.  Examiner respectfully disagrees for the reasons set out below.  
With respect to Applicant's arguments that Majumdar does not explicitly teach using the stacked autoencoder for regression, Examiner agrees.  As taught in the non-final office action mailed 12/29/2021 as well as the final office action mailed 4/06/2022, the combination of Yu and Majumdar is relied upon to show the obviousness of using a stacked autoencoder for regression.  In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
With respect to Applicant's arguments that the equation used in the instant specification is different because the third weight vector in the instant is a regression weight vector and not a third autoencoder vector, Examiner respectfully disagrees.  Examiner asserts that the equations are fundamentally identical and that the only difference is the naming of the vectors.  Similarly, Majumdar explicitly discloses a hyper-parameter (mu) used for controlling the weightage of autoencoders synonymous to the lambda term of the claimed invention.  Examiner notes the similarity between equation 4 in Majumdar with the argmin equation in ¶0005 of the instant specification and notes that both Majumdar (which shows a lambda term synonymous with the claimed invention) and the claimed invention equations are based on this relationship, such that the difference between the equations outlined in Applicant's remarks amounts to simplification.  A more representative version of the simplification of Majumdars equation is equation 6, although equations 5-8 shows several derivations of the same formula. Examiner further asserts that many of Applicant's remarks with regards to the perceived differences between the claimed invention and Majumdar are argued from the specification and not with respect to the claim language.  Examiner further asserts that the interpretation of the combination of Majumdar and Yu to cover the claim language is very reasonable. This last point, however, is moot in view of the new ground of rejections below where a new art by Hinton et al. is used to provide further support for alternatively using regression in an autoencoder neural network system. 
With respect to Applicant's arguments that Majumdar does not teach "wherein the non-convex joint optimization function incorporates the [regression and modifies a Euclidean cost function of a SAE framework, wherein the modifications to the Euclidean cost function include reducing number of decoder layers to a single decoder layer which represents an asymmetric SAE and wherein incorporating the set of output variables and the regression weight vector to the Euclidean cost function enables joint learning, and wherein weights of the output variables are learned simultaneously", Examiner respectfully disagrees.  The cost function which is minimized in Majumdar is a Euclidian distance with only a single decoder layer ([p. 912 §I] "By reducing the number of decoder layers, we are reducing the number of parameters (network weights) to learn") which represents an asymmetric autoencoder.  Majumdar further explicitly teaches that the cost function enables joint learning ([p. 913 §IIA] "There the regularization term used is the joint sparse l2,1-norm for the features of each class. The joint-sparse formulation ensures that the features from each class have the same sparsity pattern, i.e. have non-zero values at the same positions") wherein the weights of the output variables are learned simultaneously (see Eqns. 1-9).  One of ordinary skill in the art would recognize that a joint learning models’ parameters are learned simultaneously by definition. 
With respect to Applicant's argument that Majumdar does not explicitly teach "wherein the non-convex joint optimization function representing the [regression] model is minimized to learn values of the [regression] weight vector, values of the plurality of encoder weight matrices and values of the decoder weight matrix", Examiner respectfully disagrees.  Majumdar teaches solving the non-convex joint optimization problem for classification tasks ([p. 912 §II] "Here W is the encoder (between the input and the representation layer) and W’ is the decoder (between the representation layer and the outputs). Here X=[x1|x2|…….|xN] consists all the training samples stacked as columns; each of the training samples are concatenated by one – this corresponds to the bias terms. f is the activation function. In most cases the activation is a non-linear squashing function. But linear ones have been used in the past [16]; they are easier to analyze mathematically [17]. It has been shown that for classification tasks, even non-linear autoencoders operate in the linear region of non-linear activation function [18].") but does not explicitly teach using the stacked autoencoder for regression.  The combination of Majumdar and Yu however shows that stacked autoencoders may be used for regression as well as classification, therefore it would be obvious to use the autoencoder in Majumdar for this purpose.  
With respect to Applicant's arguments that Majumdar does not teach or disclose "wherein an error is computed after each iteration of the multiple iterations and stopping training of the SAE at a specific iteration of the multiple iterations to check whether the error is decreased to a specific value and thereby detecting training failures", Examiner notes that Majumdar teaches minimizing an error over a number of iterations, ([p. 915 §IV] "we need only 10 iterations to converge whereas backpropagation takes about 1000 epochs") and further teaches converging to a solution ([p. 916 §IV] "iterations are more expensive (but efficient solvers are available for the sub-problems); but since we converge much faster the overall time is smaller.") which is interpreted as synonymous with decreasing to a specific value and stopping training.  
These arguments are further detailed in the new grounds of rejection below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


	Claims 1, 3-4, 6-7, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar (“Asymmetric Stacked Autoencoder”, 2017) and in view of Hinton ("Reducing the Dimensionality of Data with Neural Networks", 2006), and in further view of Geva (US20200193299A1). 

	Regarding claim 1, Majumdar teaches A processor implemented method for incorporating [regression] into a Stacked Auto Encoder (SAE), the method comprising: ([p. 911 Abstract] "We specifically address two tasks – 1. Classification capacity as deep neural network and 2. Compressibility of stacked autoencoder." [regression] and classification are interpreted as having similar outcomes.  Logistic [regression] specifically is commonly used for classification, in which case they are synonymous.  Similarly Support Vector [regression] is a form of Support Vector Machine which is commonly used in classification.)
	generating a [regression] model for the SAE for solving a [regression] problem by formulating the [regression] model as a non-convex joint optimization function, ([p. 912 Col. 2] "The problem (1) is clearly non-convex. However, it is solved easily by gradient descent techniques since the activation function chosen (φ) sigmoid function is smooth and continuously differentiable. " [p. 913 Col. 1] "The joint-sparse formulation ensures that the features from each class have the same sparsity pattern, i.e. have non-zero values at the same positions."  [p. 1 Abstract] "We find that such autoencoders are more accurate compared to traditional symmetrically stacked autoencoders for classification accuracy")
	wherein the non-convex joint optimization function comprises a first set of training values associated with a set of input variables applied at an input encoder layer among a plurality of encoder layers of the SAE, ([p. 914 Col. 2] "Notice that Z1 has a specific meaning in this context. It represents the representation of the training samples at the output of the first encoder" See also FIG. 1 for the plurality of encoder layers.)
	a second set of training values for a set of output variables, ([p. 914 Col. 2] "The variables Z2 and Z3 represents training data at encoding layers 2 and 3." Z2 interpreted as second set of training values.)
	a plurality of encoder weight matrices associated with the plurality of encoder layers, a decoder weight matrix associated with a decoder layer of the SAE, ([p. 914 Col. 1] "By reducing the number of decoder layers, we are reducing the number of parameters (network weights) to learn." [p. 914 Col. 2] Eqn. 5 "Here WD is the single decoder, WE1, WE2 and WE3 are three encoders." [p. 912 Col. 2] Eqn. 2 "The weights are usually learned in a greedy fashion – one layer at a time")
	a [regression] weight vector associated with the set of output variables, (See WE3 in Eqn. 5 [p. 915 Col. 2] "The features from the deepest encoder level is used for testing" [p. 915 Col. 2] Testing features interpreted as synonymous with output variables.)
	a parameter for controlling weightage of a [regression] term of the non-convex joint optimization function ([p. 915 Col. 1] "In this work, the hyper-parameter μ balances the scale between the importance of learning weights versus the important of learning features" the importance of learning features interpreted as synonymous with [regression] term.)
	and a non-linear activation function; ([p. 912 Col. 2] " φ is the activation function. In most cases the activation is a non-linear squashing function.")
	wherein the non-convex joint optimization function incorporates [the regression] and modifies a Euclidean cost function of a SAE framework, wherein the modifications to the Euclidean cost function include reducing number of decoder layers to a single decoder layer which represents an asymmetric SAE and wherein incorporating the set of output variables and the [regression] weight vector to the Euclidean cost function enables joint learning (The cost function which is minimized in Majumdar is a Euclidian distance with only a single decoder layer ([p. 912 §I] "By reducing the number of decoder layers, we are reducing the number of parameters (network weights) to learn") which represents an asymmetric autoencoder.  Majumdar further explicitly teaches that the cost function enables joint learning ([p. 913 §IIA] "There the regularization term used is the joint sparse l2,1-norm for the features of each class. The joint-sparse formulation ensures that the features from each class have the same sparsity pattern, i.e. have non-zero values at the same positions") wherein the weights of the output variables are learned simultaneously (see Eqns. 1-9).  One of ordinary skill in the art would recognize that a joint learning models’ parameters are learned simultaneously by definition), and wherein weights of the plurality of encoder weight matrices, the decoder weight matrix and the [regression] weight vector associated with the output variables are learned simultaneously and wherein the asymmetric SAE enables a robust abstraction capacity of deep learning without over-fitting (Majumdar also explicitly teaches reducing overfitting [p. 912 §I] "By reducing the number of decoder layers, we are reducing the number of parameters (network weights) to learn. This means that with limited training data, we will have fewer parameters (multiple encoders and single decoder) to learn. This in turn is likely to reduce over-fitting, improve generalizability and increase classification accuracy" [p. 914 §III] "Mathematically the training is expressed as [Eqn. 5] Here WD is the single decoder, WE1, WE2 and WE3 are three encoders. This (5) is a complex optimization problem. It cannot be solved using the greedy sub-optimal techniques. It needs to be solved in one go...As long the equality holds at convergence, the constraint can be relaxed in intermediate steps. Therefore instead of formulating the Lagrangian, we use the augmented Lagrangian. This is given by [Eqn. 6]...Z1 has a specific meaning in this context. It represents the representation of the training samples at the output of the first encoder" Solving the mathematical learning equation in one go is interpreted as synonymous with learning simultaneously.)
	reformulating the non-convex joint optimization function as an Augmented Lagrangian formulation in terms of a plurality of proxy variables and a plurality of hyper parameters, wherein the plurality of proxy variables provide representations of the set of input variables at each encoder layer among the plurality of encoder layers of the SAE; ([p. 914 Col. 1] "In this work, we will show that there is a far more efficient way to learn such asymmetric autoencoders using state-of-the art optimization techniques like Augmented Lagrangian Alternating Direction Method of Multipliers (ADMM)" [p. 914 Col. 2] "Therefore instead of formulating the Lagrangian, we use the augmented Lagrangian. This is given by [Eqn. 6]" Eqn. 6 shows substitution of proxy variables Z1 with respect to the Lagrangian.  Activation function and hyperparameter mu are both interpreted as hyperparameters.  Proxy variables Z1,Z2,Z3 are representations of the set of input variables at each encoder layer.)
	wherein the non-convex joint optimization function representing the [regression] model is minimized to learn values of the [regression] weight vector, values of the plurality of encoder weight matrices and values of the decoder weight matrix; ([p. 913 §II] "There the regularization term used is the joint sparse l2,1-norm for the features of each class. The joint-sparse formulation ensures that the features from each class have the same sparsity pattern…The regularization term is usually chosen so that they are differentiable and hence minimized using gradient descent techniques." [p. 914 §III] "Mathematically the training is expressed as 

    PNG
    media_image1.png
    67
    503
    media_image1.png
    Greyscale
 

Here WD is the single decoder, WE1, WE2 and WE3 are three encoders.)
	splitting the Augmented Lagrangian formulation into a set of derived functions using Alternating Direction Method of Multipliers (ADMM), wherein the derived functions are sub-problems of the Augmented Lagrangian formulation; and ([p. 914 Col. 2] "This (5) is a complex optimization problem. It cannot be solved using the greedy sub-optimal techniques. It needs to be solved in one go. This work follows the variable splitting technique...we can segregate (8) into relative easier sub-problems.")
	learning values of the plurality of encoder weight matrices, the decoder weight matrix, the [regression] weight vector and the plurality of proxy variables for the [regression] model to train the SAE for the [regression] problem by obtaining argument minimum of each derived function among the set of derived functions in multiple iterations. ([p. 914 Col. 2] See argmin eqns P1-P7. WE1,WE2 interpreted as encoder weight matrices.  WE3 interpreted as [regression] weight vector.  Vector and matrix are interpreted as synonymous.  WD interpreted as decoder weight matrix.).
	wherein the values of the plurality of encoder weight matrices, the decoder weight matrix and the [regression] weight vector associated with the output variables are learnt jointly ([p. 914 §III] "we will show that there is a far more efficient way to learn such asymmetric autoencoders using state-of-the art optimization techniques like Augmented Lagrangian Alternating Direction Method of Multipliers" WE3 interpreted as synonymous with [regression] weight vector.)
	wherein an error is computed after each iteration of the multiple iterations and stopping training of the SAE at a specific iteration of the multiple iterations to check whether the error is decreased to a specific value and thereby detecting training failures ([p. 915 §IV] "we need only 10 iterations to converge whereas backpropagation takes about 1000 epochs") and further teaches converging to a solution [p. 916 §IV] "iterations are more expensive (but efficient solvers are available for the sub-problems); but since we converge much faster the overall time is smaller.") converging in training interpreted as synonymous with decreasing to a specific value and stopping training.  Training failures are interpreted as synonymous with model error.  FIG. 2 shows the model error as a function of the number of epochs (iterations).)
	applying a set of test values associated with the set of input variables to an output SAE function of the SAE to estimate a set of values for the set of output variables, wherein the set of output variables are unknown, wherein the output SAE function comprises the learned values of the plurality of encoder weight matrices and the [regression] weight vector ([p. 916 §B] "We compare the performance of the stacked symmetric and asymmetric autoencoders...In the first set of experiments the designated training data is used for training the autoencoder and the recovery accuracy is tested on the designated test set. The results are shown in Fig. 2.")
	wherein the learned values comprises of small sizes ([p. 913 §II] "It must be remembered that transform coding is not democratic. High valued coefficients have more importance than smaller valued coefficients (even within the ones that are preserved).")
	and enable a testing phase to perform in at least one of a real time and offline. ([p. 915 §IV] "The first one is the MNIST. The MNIST digit classification task is composed of 28x28 images of the 10 handwritten digits. There are 60,000 training images with 10,000 test images in this benchmark. No preprocessing has been done on this dataset." Training and testing with a static dataset such as MNIST is synonymous with performing an offline testing phase. Majumdar explicitly teaches that there are 10,000 images in the test set.).
	However, Majumdar does not explicitly teach that the stacked autoencoder can be used for regression, or changing number of encoding layers, number of iterations and values of one or more parameters, values of the plurality of hyper-parameters as per an application scenario if the error is not decreased to the specific value and thereby checking error reduction during the training;  

	While it would be obvious to one of ordinary skill in the art that a neural network could be trained for classification or regression, Majumdar does not explicitly teach that the network is used for regression.  However, Majumdar does cite the foundational work of Hinton ([p. 911 §I] "the possibility of this task was showed by Hinton more than half a decade back. In a presentation, he showed how stacked autoencoders can compress the MNIST dataset (784 dimensional images) to only 30 coefficients. Using stacked autoencoders, Hinton also compressed a medical corpus of 2000 bag-of-words to only 2 dimensions!").  
Hinton, in the same field of endeavor, reinforces the obviousness of an autoencoder being trained for regression or classification tasks ([p. 507 middle Col.] “Layer-by-layer pretraining can also be used for classification and regression”).   It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of Majumdar with the teachings of Hinton by training the stacked autoencoder for regression.  It would be obvious to one of ordinary skill in the art that expanding an asymmetric stacked autoencoder for classification to include regression tasks would greatly broaden the application of the stacked autoencoder.  Hinton provides as an additional motivation for combination ([p. 507] "we used a 784-1000-500-250-30 autoencoder to extract codes for all the handwritten digits in the MNIST training set (11). The Matlab code that we used for the pretraining and fine-tuning is available in (8). Again, all units were logistic except for the 30 linear units in the code layer. After fine-tuning on all 60,000 training images, the autoencoder was tested on 10,000 new images and produced much better reconstructions than did PCA").  This motivation for combination also applies to the remaining claims depending on this combination.  
However, the combination of Majumdar and Hinton does not explicitly teach changing number of encoding layers, number of iterations and values of one or more parameters, values of the plurality of hyper-parameters as per an application scenario if the error is not decreased to the specific value and thereby checking error reduction during the training.

Geva, in the same field of endeavor, teaches changing number of encoding layers, number of iterations and values of one or more parameters, values of the plurality of hyper-parameters as per an application scenario if the error is not decreased to the specific value and thereby checking error reduction during the training; ([¶0138] "3. Adding additional autoencoder layer for unsupervised feature learning to better represent the input image...4. Forming a block of N image patches...Repeating stages 2-6 in several iterations until pre-defined conversion threshold is achieved." Repeating steps in iterations until convergence threshold is achieved is interpreted as synonymous with changing the number of iterations if the error is not decreased to the specific value.  Forming a block of image patches interpreted as synonymous with changing values of one or more parameters.). 

Majumdar, Hinton, and Geva are all directed towards image classification using autoencoders.  Therefore, Majumdar, Hinton, and Geva are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Majumdar and Hinton with the teachings of Geva by modifying the autoencoder network hyperparameters including layers at runtime.  Geva explicitly teaches for example that one might add additional autoencoder layers at runtime ([¶0138] “to better represent the input image”).  Geva also provides as a motivation for combination with regards to an autoencoder system for image classification ([¶0151] “The accuracy of the detection and the efficiency of the training can be optionally and preferably improved”).  This motivation for combination also applies to the remaining claims which depend on this combination.   

	Regarding claim 3, the combination of Majumdar, Hinton, and Geva teaches The method of claim 1, wherein the SAE is trained by generating the regression model, and wherein the trained SAE is the asymmetric SAE. (Majumdar [p. 911 Col. 1] "It is unsupervised in the sense that the training does not require any class specific information. The autoencoder learns a representation of the data (at the hidden layer) such that the data can be regenerated from the representation" [p. 914 Col. 1] "The architecture of asymmetric autoencoder is shown in Fig. 1. It has multiple encoders and a single decoder."). 

Claims 4 and 6 are directed towards a system for performing the method of claims 1 and 3.  Therefore, the rejection applied to claims 1 and 3 also applies to claims 4 and 6.  Claims 4 and 6 also recite additional elements a processor, a memory, a repository, an I/O interface (Geva [¶0036] “As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the invention, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data. Optionally, a network connection is provided as well. A display and/or a user input device such as a keyboard or mouse are optionally provided as well.”).

	Claims 7 and 9 are substantially similar to claims 4 and 6.  Therefore, the rejection applied to claims 4 and 6 also apply to claims 7 and 9.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number  for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        

/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126