DETAILED ACTION
This action in response to claims filed 07/20/2021 for application 16/253366 filed 01/22/2019. Claims 1, 11, 15, 16, and 20 are amended. Claims 1-20 are pending and have been considered.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5 and 7-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 20160155136 A1, cited by Applicant in the IDS filed on 07/09/2020, hereinafter "Zhang") in view of Wong et al. ("Recurrent Auto-Encoder Model for Large-Scale Industrial Sensor Signal Analysis" cited by Applicant in the IDS filed on 07/09/2020, hereinafter "Wong").

Regarding claim 1, Zhang teaches A method of training a machine learning model, the method comprising: 
receiving input data from at least one remote device (“For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.” [¶0057, lines 10-20]);
jointly training the classifier and a selected context autoencoder (“In some implementations, the accompanying auto-encoder shares the same modeling input data records with the unsupervised fraud detection model. During the model development phase, the unsupervised fraud detection model and auto-encoder network are designed and “learned” on the same data set. The auto-encoder is learned to minimize the loss function L, which is also the reconstruction error on the development data sets” [¶0039; Examiner is interpreting the cited underlined portions to be equivalent to “jointly training”. Note: Wong teaches the classifier as cited below.]) of a knowledge bank of autoencoders including at least one autoencoder using the input data (“wherein the auto-encoder is further configured to select, from a plurality of models stored by the analytics module, a best model according to a lowest reconstruction error.” [Shown in Claim 2; See ¶0046, it is implicit that the analytics module (i.e. knowledge bank) stores multiple autoencoder diagnostic models.]);
applying a training data matrix of the input data (“In a principal component analysis (PCA), as yet another example of latent variable creation, the latent variables are uncorrelated with each other, and capture most of the variance through eigenvalue decomposition of a covariance matrix of observed data.” [¶0034, lines 1-5]) to the selected context autoencoder (“These auto-encoder networks monitoring the production data and feature vectors are of critical importance in go-live monitoring of new models, but the same modules can continue to perform ongoing monitoring of the production data and derived features, looking for drifts and changes in customer transaction behaviors over time.” [¶0052, corresponds to a context autoencoder.]) and determining the training data matrix is out of context for the selected context autoencoder (“the auto-encoder diagnostic component runs periodically to check the reconstruction error on a selected sampled data set. This is done through data extraction, which is fed into the auto-encoder network to compute the reconstruction error.” [¶0041, lines 2-4, the reconstruction error would be used to determine if the training data would be considered out of context.]);
applying the training data matrix to each other context autoencoder of the at least one autoencoder and determining the training data matrix is out of context for each other context autoencoder (“In some instances, several candidate consortium models are available, and a decision must be made as to which one is the appropriate model for certain client. The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client.” [¶0046, lines 1-8; Multiple auto-encoder diagnostic modules would correspond to the other context autoencoders and based off the reconstruction error would determine which other context autoencoders were deemed “out of context”.]); 
and constructing a new context autoencoder (“In some implementations, the auto-encoder network can be used to monitor go-live raw data and derived feature vectors when a new model is installed or a model is upgraded during the model go-live.” [0049, lines 1-4; It is implicit that the new model that installed would be a new autoencoder. See ¶0046, “In some instances, several candidate consortium models are available, and a decision must be made as to which one is the appropriate model for certain client. The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client.” Examiner is interpreting Zhang’s autoencoder module to select a new autoencoder based off the diagnostic outcomes, installing the newly selected model would correspond to constructing a new autoencoder.]).
Zhang fails to explicitly teach evaluating a classifier by determining a classification accuracy of the input data;
Wong teaches evaluating a classifier by determining a classification accuracy of the input data (“Once all the context vectors are labelled with their corresponding clusters, supervised classification algorithms can be used to learn the relationship between them using the training set. For instance, support vector machine (SVM) classifier with J classes can be used. The trained classifier can then be applied to the context vectors in the held-out validation set for cluster assignment.” [pg. 207, § 2.2 Sampling, ¶4]);
Zhang and Wong are both in the same field of endeavor of training autoencoder models. Zhang discloses a diagnostic system to determine the best model based off a reconstruction error of a trained autoencoder. Wong teaches using autoencoder modeling for industrial sensing analysis. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s diagnostic system with the classifier as taught by Wong to determine the classification of input data. One would have been motivated to use a classifier to classify context vectors into proper clusters and learn from similar streaming data. [§ 3.2 Context Vector, Wong]

Regarding claim 2, the combination of Zhang and Wong teaches The method of claim 1, where Zhang further teaches further comprising storing the new context autoencoder with the knowledge bank of autoencoders (“wherein the auto-encoder is further configured to select, from a plurality of models stored by the analytics module, a best model according to a lowest reconstruction error.” [Shown in Claim 2; this best model would correspond to a new context autoencoder as shown in the rejection of claim 1.]).

Regarding claim 3, the combination of Zhang and Wong teaches The method of claim 1, where Zhang further teaches further comprising initializing the new context autoencoder (“The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client. With this diagnostic mechanism, better recommendations can be provided for existing consortium clients as well as new clients to utilize the model best designed to resemble their production data, and to obtain optimal model performance and improve client satisfaction.” [¶0046, lines 6-12; selecting the best model would be equivalent to initializing a new context autoencoder as it would be trained using set modeling parameters as disclosed in ¶0006]). 

Regarding claim 4, the combination of Zhang and Wong teaches The method of claim 1, where Zhang further teaches further comprising applying a semantic meaning to the new context autoencoder (“To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT), a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input.” [¶0057, lines 1-14; Examiner is interpreting applying a semantic meaning would be equivalent to an acoustic, speech, or tactile input from the user.]).

Regarding claim 5, the combination of Zhang and Wong teaches The method of claim 1, where Zhang further teaches wherein determining the input data is out of context includes determining a reconstruction error for a respective one of the at least one context autoencoder (“When applied to supervised models, the diagnostic system can determine the most appropriate model for the client based on a reconstruction error of a trained auto-encoder for each associated model.” [¶0016, lines 3-9; the trained auto-encoder would be corresponding to at least one context autoencoder.]).

Regarding claim 7, the combination of Zhang and Wong teaches The method of claim 1, where Zhang further teaches wherein the input data is streaming data from the at least one remote device (“To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT), a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.” [¶0057]).

	Regarding claim 8, the combination of Zhang and Wong teaches The method of claim 4, where Wong further teaches wherein the creation of a new context autoencoder induces an alarm (“Alarm can be triggered when the context vector travels beyond the boundary of a predefined neighbourhood.” [pg. 211, ¶1, lines 3-4]).
Zhang and Wong are both in the same field of endeavor of training autoencoder models. Zhang discloses a diagnostic system to determine the best model based off a reconstruction error of a trained autoencoder. Wong teaches using autoencoder modeling for industrial sensing analysis. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s diagnostic system with the alarm as taught by Wong. One would have been motivated to trigger an alarm to alert a user or operator when the input data is out of context. [pg. 211, ¶1, Wong] 

Regarding claim 9, the combination of Zhang and Wong teaches The method of claim 1, where Zhang further teaches wherein the at least one autoencoder is part of a machine learning model (“When applied to supervised models, the system can determine the most appropriate model for the client based on a reconstruction error of a trained auto-encoder for each associated model.” [¶0016, lines 3-9; supervised model would correspond to a machine learning model. A trained auto-encoder would be a part of a supervised model.]).

Regarding claim 10, the combination of Zhang and Wong teaches The method of claim 1, where Zhang further teaches wherein each autoencoder of the at least one autoencoder is part of a respective machine learning model of at least one machine learning model (“When applied to supervised models, the system can determine the most appropriate model for the client based on a reconstruction error of a trained auto-encoder for each associated model.” [¶0016, lines 3-9; Each associated model would correspond to each autoencoder. Each autoencoder would be applied to (i.e. part of) a supervised model.]).

Regarding claim 11, Zhang teaches A system for context-based training of a machine learning model, the system comprising: 
a memory unit configured to store data and processor-executable instructions (“A memory, which can include a computer-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein.” [0018, lines 9-13]);
a processor unit in communication with the memory unit, the processor unit configured to execute the processor-executable instructions stored in the memory unit to (“Similarly, computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors.” [0018, lines 6-9]):
receive streaming data from at least one remote device (“To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT), a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.” [¶0057]);
jointly training the classifier and a selected context autoencoder (“In some implementations, the accompanying auto-encoder shares the same modeling input data records with the unsupervised fraud detection model. During the model development phase, the unsupervised fraud detection model and auto-encoder network are designed and “learned” on the same data set. The auto-encoder is learned to minimize the loss function L, which is also the reconstruction error on the development data sets” [¶0039; Examiner is interpreting the cited underlined portions to be equivalent to “jointly training”. Note: Wong teaches the classifier as cited below.]) of a knowledge bank of autoencoders including at least one autoencoder using the input data (“wherein the auto-encoder is further configured to select, from a plurality of models stored by the analytics module, a best model according to a lowest reconstruction error.” [Shown in Claim 2; See ¶0046, it is implicit that the analytics module (i.e. knowledge bank) stores multiple autoencoder diagnostic models.]);
apply a training data matrix of the input data (“In a principal component analysis (PCA), as yet another example of latent variable creation, the latent variables are uncorrelated with each other, and capture most of the variance through eigenvalue decomposition of a covariance matrix of observed data.” [¶0034, lines 1-5]) to the selected context autoencoder (“These auto-encoder networks monitoring the production data and feature vectors are of critical importance in go-live monitoring of new models, but the same modules can continue to perform ongoing monitoring of the production data and derived features, looking for drifts and changes in customer transaction behaviors over time.” [¶0052, corresponds to a context autoencoder.]) and determine the training data matrix is out of context for the selected context autoencoder (“the auto-encoder diagnostic component runs periodically to check the reconstruction error on a selected sampled data set. This is done through data extraction, which is fed into the auto-encoder network to compute the reconstruction error.” [¶0041, lines 2-4, the reconstruction error would be used to determine if the training data would be considered out of context.]);
apply the training data matrix to each other context autoencoder of the at least one autoencoder and determining the training data matrix is out of context for each other context autoencoder (“In some instances, several candidate consortium models are available, and a decision must be made as to which one is the appropriate model for certain client. The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client.” [¶0046, lines 1-8; Multiple auto-encoder diagnostic modules would correspond to the other context autoencoders and based off the reconstruction error would determine which other context autoencoders were deemed “out of context”.]); 
and construct a new context autoencoder (“In some implementations, the auto-encoder network can be used to monitor go-live raw data and derived feature vectors when a new model is installed or a model is upgraded during the model go-live.” [0049, lines 1-4; It is implicit that the new model that installed would be a new autoencoder. See ¶0046, “In some instances, several candidate consortium models are available, and a decision must be made as to which one is the appropriate model for certain client. The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client.” Examiner is interpreting Zhang’s autoencoder module to select a new autoencoder based off the diagnostic outcomes, installing the newly selected model would correspond to constructing a new autoencoder.]).
Zhang fails to explicitly teach evaluate a classifier by determining a classification accuracy of the input data;
Wong teaches evaluate a classifier by determining a classification accuracy of the input data (“Once all the context vectors are labelled with their corresponding clusters, supervised classification algorithms can be used to learn the relationship between them using the training set. For instance, support vector machine (SVM) classifier with J classes can be used. The trained classifier can then be applied to the context vectors in the held-out validation set for cluster assignment.” [pg. 207, § 2.2 Sampling, ¶4]);
Zhang and Wong are both in the same field of endeavor of training autoencoder models. Zhang discloses a diagnostic system to determine the best model based off a reconstruction error of a trained autoencoder. Wong teaches using autoencoder modeling for industrial sensing analysis. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s diagnostic system with the classifier as taught by Wong to determine the classification of input data. One would have been motivated to use a classifier to classify context vectors into proper clusters and learn from similar streaming data. [§ 3.2 Context Vector, Wong]

Regarding claim 12, the combination of Zhang and Wong teaches The system of claim 11, where Zhang further teaches wherein the processor unit is configured to execute the processor- executable instructions stored in the memory unit to store the new context autoencoder with the knowledge bank of autoencoders (“wherein the auto-encoder is further configured to select, from a plurality of models stored by the analytics module, a best model according to a lowest reconstruction error.” [Shown in Claim 2]). 

Regarding claim 13, the combination of Zhang and Wong teaches The system of claim 11, where Zhang further teaches wherein the processor unit is configured to execute the processor-executable instructions stored in the memory unit to initialize the new context autoencoder (“The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client. With this diagnostic mechanism, better recommendations can be provided for existing consortium clients as well as new clients to utilize the model best designed to resemble their production data, and to obtain optimal model performance and improve client satisfaction.” [¶0046, lines 6-12; selecting the best model would be equivalent to initializing a new context autoencoder as it would be trained using set modeling parameters as disclosed in ¶0006]).

Regarding claim 14, the combination of Zhang and Wong teaches The system of claim 11, where Zhang further teaches further comprising a user interface in communication with the processor unit, the user interface configured to apply a semantic meaning provided by a user to the new context autoencoder (“To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT), a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input.” [¶0057, lines 1-14; Examiner is interpreting applying a semantic meaning would be equivalent to an acoustic, speech, or tactile input from the user. User interaction with a display device would correspond to a user interface.]).

Regarding claim 15, the combination of Zhang and Wong teaches The system of claim 11, where Zhang further teaches wherein determining the streaming data is out of context includes determining a reconstruction error with a respective one of the at least one context autoencoder (“When applied to supervised models, the diagnostic system can determine the most appropriate model for the client based on a reconstruction error of a trained auto-encoder for each associated model.” [¶0016, lines 3-9; the trained auto-encoder would be corresponding to at least one context autoencoder.]).

Regarding claim 16, Zhang teaches A system for context-based training of a machine learning model, the system comprising:
a memory unit configured to store data and processor-executable instructions (“A memory, which can include a computer-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein.” [0018, lines 9-13]);
a processor unit in communication with the memory unit and the at least one sensor, the processor unit configured to execute the processor-executable instructions stored in the memory unit to: (“Similarly, computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors.” [0018, lines 6-9]):
receive streaming data from at least one remote sensor (“To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT), a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.” [¶0057]);
jointly training the classifier and a selected context autoencoder (“In some implementations, the accompanying auto-encoder shares the same modeling input data records with the unsupervised fraud detection model. During the model development phase, the unsupervised fraud detection model and auto-encoder network are designed and “learned” on the same data set. The auto-encoder is learned to minimize the loss function L, which is also the reconstruction error on the development data sets” [¶0039; Examiner is interpreting the cited underlined portions to be equivalent to “jointly training”. Note: Wong teaches the classifier as cited below.]) of a knowledge bank of autoencoders including at least one autoencoder using the input data (“wherein the auto-encoder is further configured to select, from a plurality of models stored by the analytics module, a best model according to a lowest reconstruction error.” [Shown in Claim 2; See ¶0046, it is implicit that the analytics module (i.e. knowledge bank) stores multiple autoencoder diagnostic models.]);
apply a training data matrix of the input data (“In a principal component analysis (PCA), as yet another example of latent variable creation, the latent variables are uncorrelated with each other, and capture most of the variance through eigenvalue decomposition of a covariance matrix of observed data.” [¶0034, lines 1-5]) to the selected context autoencoder (“These auto-encoder networks monitoring the production data and feature vectors are of critical importance in go-live monitoring of new models, but the same modules can continue to perform ongoing monitoring of the production data and derived features, looking for drifts and changes in customer transaction behaviors over time.” [¶0052, corresponds to a context autoencoder.]) and determine the training data matrix is out of context for the selected context autoencoder (“the auto-encoder diagnostic component runs periodically to check the reconstruction error on a selected sampled data set. This is done through data extraction, which is fed into the auto-encoder network to compute the reconstruction error.” [¶0041, lines 2-4, the reconstruction error would be used to determine if the training data would be considered out of context.]);
apply the training data matrix to each other context autoencoder of the at least one autoencoder and determining the training data matrix is out of context for each other context autoencoder (“In some instances, several candidate consortium models are available, and a decision must be made as to which one is the appropriate model for certain client. The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client.” [¶0046, lines 1-8; Multiple auto-encoder diagnostic modules would correspond to the other context autoencoders and based off the reconstruction error would determine which other context autoencoders were deemed “out of context”.]); 
and construct a new context autoencoder (“In some implementations, the auto-encoder network can be used to monitor go-live raw data and derived feature vectors when a new model is installed or a model is upgraded during the model go-live.” [0049, lines 1-4; It is implicit that the new model that installed would be a new autoencoder. See ¶0046, “In some instances, several candidate consortium models are available, and a decision must be made as to which one is the appropriate model for certain client. The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client.” Examiner is interpreting Zhang’s autoencoder module to select a new autoencoder based off the diagnostic outcomes, installing the newly selected model would correspond to constructing a new autoencoder.]).
Zhang fails to explicitly teach evaluate a classifier by determining a classification accuracy of the input data;
Wong teaches evaluate a classifier by determining a classification accuracy of the input data (“Once all the context vectors are labelled with their corresponding clusters, supervised classification algorithms can be used to learn the relationship between them using the training set. For instance, support vector machine (SVM) classifier with J classes can be used. The trained classifier can then be applied to the context vectors in the held-out validation set for cluster assignment.” [pg. 207, § 2.2 Sampling, ¶4]);
Zhang and Wong are both in the same field of endeavor of training autoencoder models. Zhang discloses a diagnostic system to determine the best model based off a reconstruction error of a trained autoencoder. Wong teaches using autoencoder modeling for industrial sensing analysis. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s diagnostic system with the classifier as taught by Wong to determine the classification of input data. One would have been motivated to use a classifier to classify context vectors into proper clusters and learn from similar streaming data. [§ 3.2 Context Vector, Wong]
Regarding claim 17, the combination of Zhang and Wong teaches The system of claim 16, where Zhang further teaches wherein the processor unit is configured to execute the processor-executable instructions stored in the memory unit to store the new context autoencoder with the knowledge bank of autoencoders (“wherein the auto-encoder is further configured to select, from a plurality of models stored by the analytics module, a best model according to a lowest reconstruction error.” [Shown in Claim 2]).

	Regarding claim 18, the combination of Zhang and Wong teaches The system of claim 16, where Zhang further teaches wherein the processor unit is configured to execute the processor-executable instructions stored in the memory unit to initialize the new context autoencoder (“The same process as above can be used to send the data through multiple auto-encoder diagnostic modules to check the percentile of the error term from the total loss function. Based on the diagnostic outcomes, the best suitable model can be decided for this specific client. With this diagnostic mechanism, better recommendations can be provided for existing consortium clients as well as new clients to utilize the model best designed to resemble their production data, and to obtain optimal model performance and improve client satisfaction.” [¶0046, lines 6-12; selecting the best model would be equivalent to initializing a new context autoencoder as it would be trained using set modeling parameters as disclosed in ¶0006]).

	Regarding claim 19, the combination of Zhang and Wong teaches The system of claim 16, where Zhang further teaches further comprising a user interface in communication with the processor unit, the user interface configured to apply a semantic meaning provided by a user to the new context autoencoder (“To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT), a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input.” [¶0057, lines 1-14; Examiner is interpreting applying a semantic meaning would be equivalent to an acoustic, speech, or tactile input from the user. User interaction with a display device would correspond to a user interface.]).

	Regarding claim 20, the combination of Zhang and Wong teaches The system of claim 16, where Zhang further teaches wherein determining the input data is out of context includes determining a reconstruction error with a respective one of the at least one context autoencoder (“When applied to supervised models, the diagnostic system can determine the most appropriate model for the client based on a reconstruction error of a trained auto-encoder for each associated model.” [¶0016, lines 3-9; the trained auto-encoder would be corresponding to at least one context autoencoder.]). 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Wong and further in view of Guo et al. ("Multidimensional Time Series Anomaly Detection: A GRU-based Gaussian Mixture Variational Autoencoder Approach", hereinafter "Guo").

Regarding claim 6, the combination of Zhang and Wong teaches The method of claim 5, however the combination of Zhang and Wong fails to explicitly teach wherein the input data is out of context when 
    PNG
    media_image1.png
    50
    134
    media_image1.png
    Greyscale

Guo teaches wherein the input data is out of context when 
    PNG
    media_image1.png
    50
    134
    media_image1.png
    Greyscale
 (“By fitting the input data sample x(i) into the the Gaussian distribution with the reconstructed mean vector and the reconstructed standard deviation vector, we can get the corresponding reconstruction probability 
    PNG
    media_image2.png
    28
    182
    media_image2.png
    Greyscale
of the                         
                            l
                        
                    th generated latent vector. After averaging over the L reconstruction probabilities, we can obtain the final reconstruction probability RP(                        
                            x
                        
                    |                        
                            
                                
                                    x
                                
                                ^
                            
                        
                    )[i] for the input x(i). By comparing whether the reconstruction probability is smaller than a given threshold α, the system can determine whether the input data sample is anomalous.” [pg. 102, para below Algorithm 1; See Algorithm 1
    PNG
    media_image3.png
    138
    571
    media_image3.png
    Greyscale
 note: Examiner is interpreting anomalous to be equivalent to “out of context”.])  
Zhang, Wong, and Guo are all in the same field of endeavor of training autoencoder models. Zhang discloses a diagnostic system to determine the best model based off a reconstruction error of a trained autoencoder. Wong teaches using autoencoder modeling for industrial sensing analysis. Guo teaches an anomaly detection autoencoder method. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s diagnostic system and Wong’s autoencoder model with the reconstruction error probability as taught by Guo. One would have been motivated to use a reconstruction error probability in order to determine when input data is anomalous if the probability falls below a threshold value. [pg. 100, § 3.1. Autoencoder based Anomaly Detection, Guo] 

Response to Arguments
Applicant's arguments filed 07/20/2021 have been fully considered but they are not persuasive. 

Regarding applicant’s arguments on pgs. 6-7 with respect to independent claims 1, 11, and 16 that the cited references fail to explicitly teach the newly amended limitation of “jointly training the classifier and a selected context autoencoder of a knowledge bank of autoencoders including at least one autoencoder using the input data” has been considered but is not persuasive. As cited above in the prior art rejection, Zhang teaches this limitation in [¶0039] in combination with Wong which teaches “the classifier”. Additionally, applicant appears to argue Zhang fails to teach “A method of training a machine learning model”. Examiner respectfully disagrees. Zhang discloses in [¶0039] “During the model development phase, the unsupervised fraud detection model and auto-encoder network are designed and “learned” on the same data set. The auto-encoder is learned to minimize the loss function L, which is also the reconstruction error on the development data sets”. This would be considered “a method of training a machine learning model”. Zhang further discloses training the auto-encoder in [¶0050] – [¶0054].

Regarding applicant’s arguments on pg. 8 with respect to Zhang failing to disclose or suggest “constructing a new autoencoder” as recited in claim 1 has been considered but is not persuasive. Zhang does teach “constructing new autoencoder” in [¶0049], a new model being installed would be equivalent to “constructing a new autoencoder”, for further clarification see [¶0037] - [¶0038]. Applicant also fails to explicitly and clearly state why the cited portion of Zhang fails to teach ““constructing a new autoencoder”. Simply stating that Zhang does not disclose or suggest the limitation does not make the argument persuasive. Furthermore, as respectfully noted above and cited in the prior art rejection, Zhang does discloses a method for training a machine learning model (See [¶0039], [¶0050]-[¶0054]).

In response to applicant’s arguments that Zhang fails to show certain features of the applicant’s invention, it is noted that the features which applicant relies upon “online training” is not recited in the rejected claims. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993)

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        





/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122