DETAILED ACTION
This action is in response to the claims filed on 01/11/2022 for application 16/586,223. Claims 1-7 and 11-21 have been amended. Claims 1-21 are currently pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-3, 7, 8, 11-13, 17, 18, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Ladjal et al. ("A PCA-LIKE AUTOENCODER" cited by Applicant in the IDS filed 04/05/2021, hereinafter "Ladjal") in view of Yoo et al. ("Density Estimation and Incremental Learning of Latent Vector for Generative Autoencoders" cited by Applicant in the IDS filed 04/05/2021, hereinafter "Yoo") and further in view of Kadav ("US 20170039485 A1", hereinafter "Kadav").

Regarding claim 1, Ladjal teaches A method of training a machine learning model to generate embeddings of inputs to the machine learning model, the machine learning model having an encoder that generates the embeddings from the inputs and a decoder that generates outputs from the generated embeddings (“An autoencoder is a neural network which data projects to and from a lower dimensional latent space, where this data is easier to understand and model. The autoencoder consists of two sub-networks, the encoder and the decoder, which carry out these transformations. The neural network is trained such that the output is as close to the input as possible, the data having gone through an information bottleneck : the latent space” [Abstract]), wherein the embedding is partitioned into a sequence of embedding partitions that each includes one or more dimensions of the embedding (“We note with Z = Rd the latent space, d being the dimensionality of this latent space. We denote the encoder with E : X → Z, and the decoder with D : X → Z. We denote with z(i) the ith component of z.” [pg. 4, § 3 PCA Autoencoder, ¶1; See further pg. 4, § 3 PCA Autoencoder, ¶4: “We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained.”]), the method comprising: 
for a first embedding partition in the sequence of embedding partitions, performing initial training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1…. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5; note: Examiner is interpreting “replica” to be equivalent to a new encoder/decoder network. Training an autoencoder would imply training both the encoder and decoder.]),
wherein during the initial training the decoder replica receives as input a first masked embedding, the first masked embedding including values generated by the encoder for the first embedding partition (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). In this first autoencoder, we can suppose that the information of greatest “importance” will be encoded, in the sense of the cost of the                         
                            l
                        
                    2 autoencoder loss. Perhaps, for example, the average colour of the background).” [pg. 4, § 3 PCA Autoencoder, ¶4; Examiner is interpreting encoding only information with the greatest importance to correspond to values generated by the encoder.]);
for each particular embedding partition that is after the first embedding partition in the sequence of embedding partitions, 
performing incremental training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the particular embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1. Once this is trained, we fix the values of this first element in the latent space, and train an autoencoder with a latent space of size 2, where only the second component is trained. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5; See further algorithm 1 on pg. 5]),
wherein during the incremental training the decoder replica corresponding to the particular partition receives as input an incrementally masked embedding for the particular embedding partition, the incrementally masked embedding including values generated by the encoder for the particular embedding partition and each embedding partition that precedes the particular embedding partition in the sequence of embedding partitions; (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). In this first autoencoder, we can suppose that the information of greatest “importance” will be encoded, in the sense of the cost of the                         
                            l
                        
                    2 autoencoder loss. Perhaps, for example, the average colour of the background). We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained. Note that at each iteration, the previous decoder is thrown away, and a new one is trained from scratch. Indeed, we wish to impose some structure on the latent space via the training of the encoder, but the decoder must be allowed to do as it sees fit.” [pg. 4, § 3. PCA Autoencoder, ¶4; maintaining the same first component while training a second component would be equivalent to incrementally training the decoder replica. See further: algorithm 1 on pg. 5.]) 
However Ladjal fails to explicitly teach and masked out values for all subsequent embedding partitions in the sequence of embedding partitions
and masked out values for any subsequent embedding partitions that are after the particular embedding partition in the sequence of embedding partitions.
Yoo teaches and masked out values for all subsequent embedding partitions in the sequence of embedding partitions and masked out values for any subsequent embedding partitions that are after the particular embedding partition in the sequence of embedding partitions. (“In the training process, initially only a small part of the latent vector is used to learn the autoencoder. Then, as the iteration goes on, the effective size of the latent vector is increased gradually. Here, the unused part of the latent vector is masked to zero and is not back-propagated. This incremental learning strategy of the latent variables induces the autoencoder to learn the most important representation of data first, instead of just focusing on reconstruction.” [pg. 5, § 3.4 Incremental learning of latent vector, ¶1; Examiner is interpreting the latent vector being masked to zero to be equivalent to “masked out values”.])
Ladjal and Yoo are both in the same field of endeavor of training autoencoder networks. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s training algorithm by masking out values of subsequent partitions as taught by Yoo. One would have been motivated to make this modification in order to learn the most important representation of data. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]
Ladjal/Yoo fails to explicitly teach generating a plurality of decoder replicas, each decoder replica of the plurality of decoder replicas corresponding to a respective embedding partition in the sequence of embedding partitions, and each decoder replica of the plurality of decoder replicas having one or more parameter values set equal to one or more parameter values of the decoder;
and synchronously applying any changes made to any parameter values of the one or more parameter values of a particular decoder replica to each other decoder replica of the plurality of decoder replicas.
Kadav teaches generating a plurality of decoder replicas, each decoder replica of the plurality of decoder replicas corresponding to a respective embedding partition in the sequence of embedding partitions (“In one aspect, a machine learning method includes installing a plurality of model replicas for training on a plurality of computer learning nodes; receiving training data at each model replica and updating parameters for the model replica after trailing” [¶0005]), and each decoder replica of the plurality of decoder replicas having one or more parameter values set equal to one or more parameter values of the decoder (“In this system, a plurality of model replicas train in parallel using parameter updates. The model replicas train and compute new model weights. They send/receive parameters from everyone and apply them to their own model.” [¶0015]);
and synchronously applying any changes made to any parameter values of the one or more parameter values of a particular decoder replica to each other decoder replica of the plurality of decoder replicas (“The method achieves balancing computation and communication in distributed machine learning. The communication batch sizes can be adjusted to automatically balance processor and network loads. The method includes ensuring accurate convergence and high accuracy machine learning models by adjusting training sizes with communication batch sizes. A plurality of model replicas can train in parallel using parameter updates. The model replicas can train and compute new model weights. The method includes sending or receiving parameters from all other model replicas and applying the parameters to the current model replica model.” [¶0008; See further: “Advantages of the preferred embodiments may include one or more of the following. Balancing CPU and network provides an efficient system that trains machine-learning models quickly and with low running costs. Ensuring all replicas converge at the same time, improves model accuracy. More accurate models, with faster training times ensures that all NEC businesses and applications such as job recommendations, internet helpdesks, etc. provide more accurate results.” [¶0009; The model replicas converging at the same time would imply the parameter changes to each replica would be synchronous.]]).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s teachings by generating model replicas and applying parallel parameter updates as taught by Kadav. One would have been motivated to make this modification in order to balance computation and communication overhead. [Abstract, Kadav]

Regarding claim 2, Ladjal/Yoo/Kadav teaches The method of claim 1, where Ladjal further teaches wherein performing incremental training further comprises, for each preceding embedding partition that precedes the particular embedding partition in the sequence of embedding partitions: 
training the encoder and the decoder replica of the plurality of decoder replicas corresponding to the preceding embedding partition, wherein during the incremental training the decoder replica of the plurality of decoder replicas corresponding to the preceding embedding partitions receives as input incrementally masked embedding for the preceding partition (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained.” [pg. 4, § 3 PCA Autoencoder, ¶4; See further algorithm 1 on pg. 5]).

Regarding claim 3, Ladjal/Yoo/Kadav teaches The method of claim 2, where Yoo further teaches wherein during the incremental training, parameters of the decoder replicas of the plurality of decoder replicas corresponding to the particular partition and the preceding embedding partitions are constrained to have the same values (“From now on, we will call the autoencoder whose latent vector is trained by applying our incremental learning strategy in Section 3.4 as IAE. On the other hand, AE will denote an autoencoder trained without any regularization and incremental learning of the latent vector. The term LDE is used for the proposed latent density estimator of Section 3.3. For fair comparison, the autoencoders implemented for comparison and verification use the same structure. The details of network structure is stated in the supplementary (section A). [pg. 5, § 4. Experiments, ¶1; Examiner interprets the autoencoders implemented with the same structure to imply having the same parameter values.]).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Kadav’s teachings by using the same decoder network structure for each partition as taught by Yoo. One would have been motivated to make this modification in order to use structural characteristics of the network to determine salient factors of data through dimensionality. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]

Regarding claim 7, Ladjal/Yoo/Kadav teaches The method of claim 1, where Yoo further teaches wherein the masked out values for all subsequent embedding partitions in the sequence of embedding partitions are zero (“In the training process, initially only a small part of the latent vector is used to learn the autoencoder. Then, as the iteration goes on, the effective size of the latent vector is increased gradually. Here, the unused part of the latent vector is masked to zero and is not back-propagated. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Kadav’s teachings by masking out values of subsequent partitions as taught by Yoo. One would have been motivated to make this modification in order to learn the most important representation of data. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]

Regarding claim 8, Ladjal/Yoo/Kadav teaches The method of claim 1, where Yoo further teaches wherein the encoder applies an activation function having a fixed output range to an intermediate encoder output to generate the embedding (“Architecture Figure 9 in this supplimentary material shows the architecture of autoencoder used in experiments. Our autoencoder architecture consists of several layers of convolution, transposed convolution, batch normalization, leaky ReLU and TanH. The default filter size of convolution and transposed convolution is 4 × 4 and the negative slope of leaky ReLU activation is 0.2. In this figure, C is the number of channels in the input data, n is the index of convolution blocks (n = 1, · · · , N), S is filter size of encoder’s last convolution layer and D is the dimension of latent vector z.” [pg. 11, § Architecture, ¶1; See further Figure 9.]).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Kadav’s teachings by masking out values of subsequent partitions as taught by Yoo. One would have been motivated to make this modification in order to learn the most important representation of data. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]

Regarding claim 11, Ladjal teaches A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers (“Therefore, we could consider increasing the latent space by small packets of codes, to give it the freedom it needs. However, this will increase the computational load required to compute the covariance loss term. Therefore, we will need to find an efficient way to calculate the covariances between each of the components of this packet.” [pg. 8, Future work, ¶1; implies use of computers and memory]), to cause the one or more computers to perform operations for training a machine learning model to generate embeddings of inputs to the machine learning model, the machine learning model having an encoder that generates the embeddings from the inputs and a decoder that generates outputs from the generated embeddings (“An autoencoder is a neural network which data projects to and from a lower dimensional latent space, where this data is easier to understand and model. The autoencoder consists of two sub-networks, the encoder and the decoder, which carry out these transformations. The neural network is trained such that the output is as close to the input as possible, the data having gone through an information bottleneck : the latent space” [Abstract]), wherein the embedding is partitioned into a sequence of embedding partitions that each includes one or more dimensions of the embedding (“We note with Z = Rd the latent space, d being the dimensionality of this latent space. We denote the encoder with E : X → Z, and the decoder with D : X → Z. We denote with z(i) the ith component of z.” [pg. 4, § 3 PCA Autoencoder, ¶1; See further pg. 4, § 3 PCA Autoencoder, ¶4: “We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained.”]), the operations comprising: 
for a first embedding partition in the sequence of embedding partitions, performing initial training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1…. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5; note: Examiner is interpreting “replica” to be equivalent to a new encoder/decoder network. Training an autoencoder would imply training both the encoder and decoder.]),
wherein during the initial training the decoder replica receives as input a first masked embedding, the first masked embedding including values generated by the encoder for the first embedding partition (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). In this first autoencoder, we can suppose that the information of greatest “importance” will be encoded, in the sense of the cost of the                         
                            l
                        
                    2 autoencoder loss. Perhaps, for example, the average colour of the background).” [pg. 4, § 3 PCA Autoencoder, ¶4; Examiner is interpreting encoding only information with the greatest importance to correspond to values generated by the encoder.]);
for each particular embedding partition that is after the first embedding partition in the sequence of embedding partitions, 
performing incremental training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the particular embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1. Once this is trained, we fix the values of this first element in the latent space, and train an autoencoder with a latent space of size 2, where only the second component is trained. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5; See further algorithm 1 on pg. 5]),
wherein during the incremental training the decoder replica corresponding to the particular partition receives as input an incrementally masked embedding for the particular embedding partition, the incrementally masked embedding including values generated by the encoder for the particular embedding partition and each embedding partition that precedes the particular embedding partition in the sequence of embedding partitions; (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). In this first autoencoder, we can suppose that the information of greatest “importance” will be encoded, in the sense of the cost of the                         
                            l
                        
                    2 autoencoder loss. Perhaps, for example, the average colour of the background). We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained. Note that at each iteration, the previous decoder is thrown away, and a new one is trained from scratch. Indeed, we wish to impose some structure on the latent space via the training of the encoder, but the decoder must be allowed to do as it sees fit.” [pg. 4, § 3. PCA Autoencoder, ¶4; maintaining the same first component while training a second component would be equivalent to incrementally training the decoder replica. See further: algorithm 1 on pg. 5.]) 
However Ladjal fails to explicitly teach and masked out values for all subsequent embedding partitions in the sequence of embedding partitions
and masked out values for any subsequent embedding partitions that are after the particular embedding partition in the sequence of embedding partitions.
Yoo teaches and masked out values for all subsequent embedding partitions in the sequence of embedding partitions and masked out values for any subsequent embedding partitions that are after the particular embedding partition in the sequence of embedding partitions. (“In the training process, initially only a small part of the latent vector is used to learn the autoencoder. Then, as the iteration goes on, the effective size of the latent vector is increased gradually. Here, the unused part of the latent vector is masked to zero and is not back-propagated. This incremental learning strategy of the latent variables induces the autoencoder to learn the most important representation of data first, instead of just focusing on reconstruction.” [pg. 5, § 3.4 Incremental learning of latent vector, ¶1; Examiner is interpreting the latent vector being masked to zero to be equivalent to “masked out values”.])
Ladjal and Yoo are both in the same field of endeavor of training autoencoder networks. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s training algorithm by masking out values of subsequent partitions as taught by Yoo. One would have been motivated to make this modification in order to learn the most important representation of data. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]
Ladjal/Yoo fails to explicitly teach generating a plurality of decoder replicas, each decoder replica of the plurality of decoder replicas corresponding to a respective embedding partition in the sequence of embedding partitions, and each decoder replica of the plurality of decoder replicas having one or more parameter values set equal to one or more parameter values of the decoder;
and synchronously applying any changes made to any parameter values of the one or more parameter values of a particular decoder replica to each other decoder replica of the plurality of decoder replicas.
Kadav teaches generating a plurality of decoder replicas, each decoder replica of the plurality of decoder replicas corresponding to a respective embedding partition in the sequence of embedding partitions (“In one aspect, a machine learning method includes installing a plurality of model replicas for training on a plurality of computer learning nodes; receiving training data at each model replica and updating parameters for the model replica after trailing” [¶0005]), and each decoder replica of the plurality of decoder replicas having one or more parameter values set equal to one or more parameter values of the decoder (“In this system, a plurality of model replicas train in parallel using parameter updates. The model replicas train and compute new model weights. They send/receive parameters from everyone and apply them to their own model.” [¶0015]);
and synchronously applying any changes made to any parameter values of the one or more parameter values of a particular decoder replica to each other decoder replica of the plurality of decoder replicas (“The method achieves balancing computation and communication in distributed machine learning. The communication batch sizes can be adjusted to automatically balance processor and network loads. The method includes ensuring accurate convergence and high accuracy machine learning models by adjusting training sizes with communication batch sizes. A plurality of model replicas can train in parallel using parameter updates. The model replicas can train and compute new model weights. The method includes sending or receiving parameters from all other model replicas and applying the parameters to the current model replica model.” [¶0008; See further: “Advantages of the preferred embodiments may include one or more of the following. Balancing CPU and network provides an efficient system that trains machine-learning models quickly and with low running costs. Ensuring all replicas converge at the same time, improves model accuracy. More accurate models, with faster training times ensures that all NEC businesses and applications such as job recommendations, internet helpdesks, etc. provide more accurate results.” [¶0009; The model replicas converging at the same time would imply the parameter changes to each replica would be synchronous.]).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s teachings by generating model replicas and applying parallel parameter updates as taught by Kadav. One would have been motivated to make this modification in order to balance computation and communication overhead. [Abstract, Kadav]

Regarding claim 12, Ladjal/Yoo/Kadav teaches The system of claim 11, where Ladjal further teaches wherein performing incremental training further comprises, for each preceding embedding partition that precedes the particular embedding partition in the sequence of embedding partitions: 
training the encoder and the decoder replica of the plurality of decoder replicas corresponding to the preceding embedding partition, wherein during the incremental training the decoder replica of the plurality of decoder replicas corresponding to the preceding embedding partitions receives as input incrementally masked embedding for the preceding partition (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained.” [pg. 4, § 3 PCA Autoencoder, ¶4; See further algorithm 1 on pg. 5]).

Regarding claim 13, Ladjal/Yoo/Kadav teaches The system of claim 12, where Yoo further teaches wherein during the incremental training, parameters of the decoder replicas of the plurality of decoder replicas corresponding to the particular embedding partition and the preceding embedding partitions are constrained to have the same values (“From now on, we will call the autoencoder whose latent vector is trained by applying our incremental learning strategy in Section 3.4 as IAE. On the other hand, AE will denote an autoencoder trained without any regularization and incremental learning of the latent vector. The term LDE is used for the proposed latent density estimator of Section 3.3. For fair comparison, the autoencoders implemented for comparison and verification use the same structure. The details of network structure is stated in the supplementary (section A). [pg. 5, § 4. Experiments, ¶1; Examiner interprets the autoencoders implemented with the same structure to imply having the same parameter values.]).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Kadav’s teachings by using the same decoder network structure for each partition as taught by Yoo. One would have been motivated to make this modification in order to use structural characteristics of the network to determine salient factors of data through dimensionality. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]

Regarding claim 17, Ladjal/Yoo/Kadav teaches The system of claim 11, where Yoo further teaches wherein the masked out values for all subsequent embedding partitions in the sequence of embedding partitions are zero (“In the training process, initially only a small part of the latent vector is used to learn the autoencoder. Then, as the iteration goes on, the effective size of the latent vector is increased gradually. Here, the unused part of the latent vector is masked to zero and is not back-propagated. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Kadav’s teachings by masking out values of subsequent partitions as taught by Yoo. One would have been motivated to make this modification in order to learn the most important representation of data. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]

Regarding claim 18, Ladjal/Yoo/Kadav teaches The system of claim 11, where Yoo further teaches wherein the encoder applies an activation function having a fixed output range to an intermediate encoder output to generate the embedding (“Architecture Figure 9 in this supplimentary material shows the architecture of autoencoder used in experiments. Our autoencoder architecture consists of several layers of convolution, transposed convolution, batch normalization, leaky ReLU and TanH. The default filter size of convolution and transposed convolution is 4 × 4 and the negative slope of leaky ReLU activation is 0.2. In this figure, C is the number of channels in the input data, n is the index of convolution blocks (n = 1, · · · , N), S is filter size of encoder’s last convolution layer and D is the dimension of latent vector z.” [pg. 11, § Architecture, ¶1; See further Figure 9.]).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Kadav’s teachings by masking out values of subsequent partitions as taught by Yoo. One would have been motivated to make this modification in order to learn the most important representation of data. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]

Regarding claim 21, Ladjal teaches A computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations for training a machine learning model to generate embeddings of inputs to the machine learning model, the machine learning model having an encoder that generates the embeddings from the inputs and a decoder that generates outputs from the generated embeddings (“An autoencoder is a neural network which data projects to and from a lower dimensional latent space, where this data is easier to understand and model. The autoencoder consists of two sub-networks, the encoder and the decoder, which carry out these transformations. The neural network is trained such that the output is as close to the input as possible, the data having gone through an information bottleneck : the latent space” [Abstract]), wherein the embedding is partitioned into a sequence of embedding partitions that each includes one or more dimensions of the embedding (“We note with Z = Rd the latent space, d being the dimensionality of this latent space. We denote the encoder with E : X → Z, and the decoder with D : X → Z. We denote with z(i) the ith component of z.” [pg. 4, § 3 PCA Autoencoder, ¶1; See further pg. 4, § 3 PCA Autoencoder, ¶4: “We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained.”]), the operations comprising: 
for a first embedding partition in the sequence of embedding partitions, performing initial training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1…. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5; note: Examiner is interpreting “replica” to be equivalent to a new encoder/decoder network. Training an autoencoder would imply training both the encoder and decoder.]),
wherein during the initial training the decoder replica receives as input a first masked embedding, the first masked embedding including values generated by the encoder for the first embedding partition (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). In this first autoencoder, we can suppose that the information of greatest “importance” will be encoded, in the sense of the cost of the                         
                            l
                        
                    2 autoencoder loss. Perhaps, for example, the average colour of the background).” [pg. 4, § 3 PCA Autoencoder, ¶4; Examiner is interpreting encoding only information with the greatest importance to correspond to values generated by the encoder.]);
for each particular embedding partition that is after the first embedding partition in the sequence of embedding partitions, 
performing incremental training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the particular embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1. Once this is trained, we fix the values of this first element in the latent space, and train an autoencoder with a latent space of size 2, where only the second component is trained. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5; See further algorithm 1 on pg. 5]),
wherein during the incremental training the decoder replica corresponding to the particular partition receives as input an incrementally masked embedding for the particular embedding partition, the incrementally masked embedding including values generated by the encoder for the particular embedding partition and each embedding partition that precedes the particular embedding partition in the sequence of embedding partitions; (“Therefore, we impose a notion of importance by training a series of autoencoders of increasing latent space size, starting with a latent space of size 1 (a scalar). In this first autoencoder, we can suppose that the information of greatest “importance” will be encoded, in the sense of the cost of the                         
                            l
                        
                    2 autoencoder loss. Perhaps, for example, the average colour of the background). We then increase the size of the latent space by 1, while maintaining the same first component from the previous training : only the second component is trained. This is repeated iteratively until a certain predefined size dmax is attained. Note that at each iteration, the previous decoder is thrown away, and a new one is trained from scratch. Indeed, we wish to impose some structure on the latent space via the training of the encoder, but the decoder must be allowed to do as it sees fit.” [pg. 4, § 3. PCA Autoencoder, ¶4; maintaining the same first component while training a second component would be equivalent to incrementally training the decoder replica. See further: algorithm 1 on pg. 5.]) 
However Ladjal fails to explicitly teach and masked out values for all subsequent embedding partitions in the sequence of embedding partitions
and masked out values for any subsequent embedding partitions that are after the particular embedding partition in the sequence of embedding partitions.
Yoo teaches and masked out values for all subsequent embedding partitions in the sequence of embedding partitions and masked out values for any subsequent embedding partitions that are after the particular embedding partition in the sequence of embedding partitions. (“In the training process, initially only a small part of the latent vector is used to learn the autoencoder. Then, as the iteration goes on, the effective size of the latent vector is increased gradually. Here, the unused part of the latent vector is masked to zero and is not back-propagated. This incremental learning strategy of the latent variables induces the autoencoder to learn the most important representation of data first, instead of just focusing on reconstruction.” [pg. 5, § 3.4 Incremental learning of latent vector, ¶1; Examiner is interpreting the latent vector being masked to zero to be equivalent to “masked out values”.])
Ladjal and Yoo are both in the same field of endeavor of training autoencoder networks. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s training algorithm by masking out values of subsequent partitions as taught by Yoo. One would have been motivated to make this modification in order to learn the most important representation of data. [pg. 5, § 3.4 Incremental learning of latent vector, ¶1, Yoo]
Ladjal/Yoo fails to explicitly teach generating a plurality of decoder replicas, each decoder replica of the plurality of decoder replicas corresponding to a respective embedding partition in the sequence of embedding partitions, and each decoder replica of the plurality of decoder replicas having one or more parameter values set equal to one or more parameter values of the decoder;
and synchronously applying any changes made to any parameter values of the one or more parameter values of a particular decoder replica to each other decoder replica of the plurality of decoder replicas.
Kadav teaches generating a plurality of decoder replicas, each decoder replica of the plurality of decoder replicas corresponding to a respective embedding partition in the sequence of embedding partitions (“In one aspect, a machine learning method includes installing a plurality of model replicas for training on a plurality of computer learning nodes; receiving training data at each model replica and updating parameters for the model replica after trailing” [¶0005]), and each decoder replica of the plurality of decoder replicas having one or more parameter values set equal to one or more parameter values of the decoder (“In this system, a plurality of model replicas train in parallel using parameter updates. The model replicas train and compute new model weights. They send/receive parameters from everyone and apply them to their own model.” [¶0015]);
and synchronously applying any changes made to any parameter values of the one or more parameter values of a particular decoder replica to each other decoder replica of the plurality of decoder replicas (“The method achieves balancing computation and communication in distributed machine learning. The communication batch sizes can be adjusted to automatically balance processor and network loads. The method includes ensuring accurate convergence and high accuracy machine learning models by adjusting training sizes with communication batch sizes. A plurality of model replicas can train in parallel using parameter updates. The model replicas can train and compute new model weights. The method includes sending or receiving parameters from all other model replicas and applying the parameters to the current model replica model.” [¶0008; See further: “Advantages of the preferred embodiments may include one or more of the following. Balancing CPU and network provides an efficient system that trains machine-learning models quickly and with low running costs. Ensuring all replicas converge at the same time, improves model accuracy. More accurate models, with faster training times ensures that all NEC businesses and applications such as job recommendations, internet helpdesks, etc. provide more accurate results.” [¶0009; The model replicas converging at the same time would imply the parameter changes to each replica would be synchronous.]).
Ladjal, Yoo, and Kadav are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s teachings by generating model replicas and applying parallel parameter updates as taught by Kadav. One would have been motivated to make this modification in order to balance computation and communication overhead. [Abstract, Kadav]

Claims 4-6 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Ladjal in view of Yoo and Kadav and further in view of Haidar et al. ("US 20200134463 A1", hereinafter "Haidar").

Regarding claim 4, Ladjal/Yoo/Kadav teaches The method of claim 1, where Ladjal further teaches wherein performing initial training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1” [pg. 2, § 1 Introduction, ¶5]) comprises:
determining a gradient of an objective function with respect to an output generated by the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“
    PNG
    media_image1.png
    65
    641
    media_image1.png
    Greyscale
The pseudo-code for our algorithm can be seen in Algorithm 1. Note that in this pseudo-code, we have used a standard gradient descent, but any gradient-descent based algorithm can be used” [pg. 4, § 3 PCA Autoencoder, Equation 3; note: “Let y = D ◦ E(x) be the output of the autoencoder”, pg. 4, § 3 PCA Autoencoder, ¶1]); 
However Ladjal/Yoo fails to explicitly teach backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions; and
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder
Haidar teaches:
backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions (“
    PNG
    media_image2.png
    249
    454
    media_image2.png
    Greyscale
” [¶0082]); and 
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder (“The LATEXT-GANs then use backpropagation and the reconstruction loss LAE(φ,ψ) to update the neural network parameters φ of the encoder neural network and the neural network parameters ψ of the decoder neural network.” [¶0082]).
Ladjal, Yoo, Kadav, and Haidar are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Haidar discloses a text-based generative adversarial network using an autoencoder. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav teachings by using back-propagation to update the parameters of the encoder and decoder as taught by Haidar. Back-propagation is a well-known technique in the art and thus one would have been motivated to make this modification to train the autoencoder and to update parameters to minimize loss. [¶0082, Haidar]

Regarding claim 5, Ladjal/Yoo/Kadav teaches The method of claim 1, where Ladjal further teaches wherein performing incremental training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the particular embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1. Once this is trained, we fix the values of this first element in the latent space, and train an autoencoder with a latent space of size 2, where only the second component is trained. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5]) comprises: 
determining a gradient of an objective function with respect to an output generated by the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“
    PNG
    media_image1.png
    65
    641
    media_image1.png
    Greyscale
The pseudo-code for our algorithm can be seen in Algorithm 1. Note that in this pseudo-code, we have used a standard gradient descent, but any gradient-descent based algorithm can be used” [pg. 4, § 3 PCA Autoencoder, Equation 3; note: “Let y = D ◦ E(x) be the output of the autoencoder”, pg. 4, § 3 PCA Autoencoder, ¶1]); 
However Ladjal/Yoo fails to explicitly teach backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions; and
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder
Haidar teaches:
backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions (“
    PNG
    media_image2.png
    249
    454
    media_image2.png
    Greyscale
” [¶0082]); and 
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder (“The LATEXT-GANs then use backpropagation and the reconstruction loss LAE(φ,ψ) to update the neural network parameters φ of the encoder neural network and the neural network parameters ψ of the decoder neural network.” [¶0082]).
Ladjal, Yoo, Kadav, and Haidar are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Haidar discloses a text-based generative adversarial network using an autoencoder. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav teachings by using back-propagation to update the parameters of the encoder and decoder as taught by Haidar. Back-propagation is a well-known technique in the art and thus one would have been motivated to make this modification to train the autoencoder and to update parameters to minimize loss. [¶0082, Haidar]

Regarding claim 6, Ladjal/Yoo/Kadav/Haidar teaches The method of claim 4, where Ladjal further teaches further comprising: 
determining that the gradient of the objective function has converged to a predetermined value (“
    PNG
    media_image1.png
    65
    641
    media_image1.png
    Greyscale
The pseudo-code for our algorithm can be seen in Algorithm 1. Note that in this pseudo-code, we have used a standard gradient descent, but any gradient-descent based algorithm can be used” [pg. 4, § 3 PCA Autoencoder, Equation 3; note: “Let y = D ◦ E(x) be the output of the autoencoder”, pg. 4, § 3 PCA Autoencoder, ¶1; Note: It is implicit from Algorithm 1 that a pre-determined value would be determined to stop the training of the first latent dimension in order for the algorithm to train the next latent dimensions.]); and 
in response to the determining, terminating the initial training and beginning the incremental training for the second embedding partition in the sequence of embedding partitions (“
    PNG
    media_image3.png
    274
    636
    media_image3.png
    Greyscale
” [pg. 5, Algorithm 1; Examiner interprets training the next latent dimensions as the beginning of the incremental training for the second partition in the sequence.]).

Regarding claim 14, Ladjal/Yoo/Kadav teaches The system of claim 11, where Ladjal further teaches wherein performing initial training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1” [pg. 2, § 1 Introduction, ¶5]) comprises:
determining a gradient of an objective function with respect to an output generated by the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“
    PNG
    media_image1.png
    65
    641
    media_image1.png
    Greyscale
The pseudo-code for our algorithm can be seen in Algorithm 1. Note that in this pseudo-code, we have used a standard gradient descent, but any gradient-descent based algorithm can be used” [pg. 4, § 3 PCA Autoencoder, Equation 3; note: “Let y = D ◦ E(x) be the output of the autoencoder”, pg. 4, § 3 PCA Autoencoder, ¶1]); 
However Ladjal/Yoo fails to explicitly teach backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions; and
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder
Haidar teaches:
backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions (“
    PNG
    media_image2.png
    249
    454
    media_image2.png
    Greyscale
” [¶0082]); and 
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder (“The LATEXT-GANs then use backpropagation and the reconstruction loss LAE(φ,ψ) to update the neural network parameters φ of the encoder neural network and the neural network parameters ψ of the decoder neural network.” [¶0082]).
Ladjal, Yoo, Kadav, and Haidar are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Haidar discloses a text-based generative adversarial network using an autoencoder. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav teachings by using back-propagation to update the parameters of the encoder and decoder as taught by Haidar. Back-propagation is a well-known technique in the art and thus one would have been motivated to make this modification to train the autoencoder and to update parameters to minimize loss. [¶0082, Haidar]

Regarding claim 15, Ladjal/Yoo/Kadav teaches The system of claim 11, where Ladjal further teaches wherein performing incremental training to train the encoder and the decoder replica of the plurality of decoder replicas corresponding to the particular embedding partition (“For this, inspired by the PCA, we start by training an autoencoder with a latent space of size 1. Once this is trained, we fix the values of this first element in the latent space, and train an autoencoder with a latent space of size 2, where only the second component is trained. At each step, the decoder is discarded, and a new one is trained from scratch. This continues until we reach the required latent space size. Furthermore, we add a latent space covariance loss term to the autoencoder loss to ensure that each component is statistically independent. We refer to this network as a “PCA Autoencoder”.” [pg. 2, § 1 Introduction, ¶5]) comprises: 
determining a gradient of an objective function with respect to an output generated by the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition (“
    PNG
    media_image1.png
    65
    641
    media_image1.png
    Greyscale
The pseudo-code for our algorithm can be seen in Algorithm 1. Note that in this pseudo-code, we have used a standard gradient descent, but any gradient-descent based algorithm can be used” [pg. 4, § 3 PCA Autoencoder, Equation 3; note: “Let y = D ◦ E(x) be the output of the autoencoder”, pg. 4, § 3 PCA Autoencoder, ¶1]); 
However Ladjal/Yoo fails to explicitly teach backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions; and
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder
Haidar teaches:
backpropagating the gradient from the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition only to a corresponding portion of the encoder that generates the first embedding partition in the sequence of embedding partitions (“
    PNG
    media_image2.png
    249
    454
    media_image2.png
    Greyscale
” [¶0082]); and 
updating, using the backpropagated gradient, respective parameter values of the decoder replica of the plurality of decoder replicas corresponding to the first embedding partition and the corresponding portion of the encoder (“The LATEXT-GANs then use backpropagation and the reconstruction loss LAE(φ,ψ) to update the neural network parameters φ of the encoder neural network and the neural network parameters ψ of the decoder neural network.” [¶0082]).
Ladjal, Yoo, Kadav, and Haidar are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Haidar discloses a text-based generative adversarial network using an autoencoder. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav teachings by using back-propagation to update the parameters of the encoder and decoder as taught by Haidar. Back-propagation is a well-known technique in the art and thus one would have been motivated to make this modification to train the autoencoder and to update parameters to minimize loss. [¶0082, Haidar]

Regarding claim 16, Ladjal/Yoo/Kadav/Haidar teaches The system of claim 14, where Ladjal further teaches further comprise: 
determining that the gradient of the objective function has converged to a predetermined value (“
    PNG
    media_image1.png
    65
    641
    media_image1.png
    Greyscale
The pseudo-code for our algorithm can be seen in Algorithm 1. Note that in this pseudo-code, we have used a standard gradient descent, but any gradient-descent based algorithm can be used” [pg. 4, § 3 PCA Autoencoder, Equation 3; note: “Let y = D ◦ E(x) be the output of the autoencoder”, pg. 4, § 3 PCA Autoencoder, ¶1; Note: It is implicit from Algorithm 1 that a pre-determined value would be determined to stop the training of the first latent dimension in order for the algorithm to train the next latent dimensions.]); and 
in response to the determining, terminating the initial training and beginning the incremental training for the second embedding partition in the sequence of embedding partitions (“
    PNG
    media_image3.png
    274
    636
    media_image3.png
    Greyscale
” [pg. 5, Algorithm 1; Examiner interprets training the next latent dimensions as the beginning of the incremental training for the second partition in the sequence.]).

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ladjal in view of Yoo and Kadav and further in view of Wang et al. ("TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS", hereinafter "Wang").

Regarding claim 9, the combination of Ladjal/Yoo/Kadav teaches The method of claim 1, however Ladjal/Yoo/Kadav fails to explicitly teach wherein the inputs are units of text, and the outputs are utterances representing the units of text.
Wang teaches wherein the inputs are units of text (“The goal of the encoder is to extract robust sequential representations of text. The input to the encoder is a character sequence, where each character is represented as a one-hot vector and embedded into a continuous vector.” [pg. 3-4, § 3.2 Encoder, ¶1]), and the outputs are utterances representing the units of text (“See Figure 1. Model architecture. The model takes characters as input and outputs the corresponding raw spectrogram, which is then fed to the Griffin-Lim reconstruction algorithm to synthesize speech. [pg. 2, Figure 1; Examiner interprets outputting speech to be equivalent to outputting utterances]).
Ladjal, Yoo, Kadav and Wang are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Wang discloses a text-to-speech synthesis system using an encoder and decoder network. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav’s teachings by using their algorithms to perform speech synthesis as taught by Wang. One would have been motivated to make this modification in order to synthesize speech from text and improve upon existing text to speech systems. [pg. 1, § 1 Introduction, ¶1-2, Wang]

Regarding claim 19, Ladjal/Yoo/Kadav teaches The system of claim 11, however Ladjal/Yoo/Kadav fails to explicitly teach wherein the inputs are units of text, and the outputs are utterances representing the units of text.
Wang teaches wherein the inputs are units of text (“The goal of the encoder is to extract robust sequential representations of text. The input to the encoder is a character sequence, where each character is represented as a one-hot vector and embedded into a continuous vector.” [pg. 3-4, § 3.2 Encoder, ¶1]), and the outputs are utterances representing the units of text (“See Figure 1. Model architecture. The model takes characters as input and outputs the corresponding raw spectrogram, which is then fed to the Griffin-Lim reconstruction algorithm to synthesize speech. [pg. 2, Figure 1; Examiner interprets outputting speech to be equivalent to outputting utterances]).
Ladjal, Yoo, Kadav and Wang are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Wang discloses a text-to-speech synthesis system using an encoder and decoder network. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav’s teachings by using their algorithms to perform speech synthesis as taught by Wang. One would have been motivated to make this modification in order to synthesize speech from text and improve upon existing text to speech systems. [pg. 1, § 1 Introduction, ¶1-2, Wang]

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ladjal in view of Yoo and Kadav further in view of Choi et al. ("Autoencoder-Based Incremental Class Learning without Retraining on Old Data", hereinafter "Choi") and further in view of Lample et al. ("Fader Networks: Manipulating Images by Sliding Attributes", hereinafter "Lample").

Regarding claim 10, Ladjal/Yoo/Kadav teaches The method of claim 1, however Ladjal/Yoo/Kadav fails to explicitly teach further comprising: after performing all incremental trainings: 
receiving a new input; 
processing the new input using the trained encoder to generate an initial embedding for the new input; 
receiving a user input modifying values of a given embedding partition in the initial embedding to generate a new embedding; and 
processing the new embedding using the trained decoder to generate an output for the new embedding.
Choi teaches after performing all incremental trainings (“At training for each task, the importance measure obtained in the last task is used to compute LSI or LMAS, and a new importance measure is calculated for the task following right after. When the training is completed, {µi} is appended with new class means.” [pg. 4, § Incremental Training, ¶1]):
receiving a new input (“To further increase our model’s ability to learn new tasks, we enhance the mean prototypes of base classes via applying Local Outlier Factor” [pg. 3, § 3.4 Outlier Exclusion and Additional Training for Base Classes, ¶1]); 
processing the new input using the trained encoder to generate an initial embedding for the new input (“To further increase our model’s ability to learn new tasks, we enhance the mean prototypes of base classes via applying Local Outlier Factor and additionally training the encoder to fit to the altered mean prototypes” [pg. 3, § 3.4 Outlier Exclusion and Additional Training for Base Classes, ¶1]); and 
processing the new embedding using the trained decoder to generate an output for the new embedding (“After the final epoch, the whole training dataset is fed to the autoencoder and we separately collect the output of the encoder h(ϕ(x)) class by class to make class mean of prototypes {µi}. If the outlier exclusion and the additional training is used, LOF is applied to {µi} to obtain {µnew,i} and the encoder is trained by minimizing Ladd in Eq. 7.” [pg. 4, § Base Training, ¶1; inputting the training set into the autoencoder implies using the trained decoder to process the new task.]).
Ladjal, Yoo, Kadav, and Choi are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Choi discloses an autoencoder based incremental learning method. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav’s training algorithms by retraining model after incremental training as taught by Choi. One would have been motivated to make this modification in order to evaluate the trained model’s performance on new tasks. [pg. 3, § 3.4 Outlier Exclusion and Additional Training for Base Classes, Choi]
However Ladjal/Yoo/Kadav/Choi fails to explicitly teach receiving a user input modifying values of a given embedding partition in the initial embedding to generate a new embedding;
Lample teaches receiving a user input modifying values of a given embedding partition in the initial embedding to generate a new embedding (“Our approach relies on an encoder-decoder architecture where, given an input image x with its attributes y, the encoder maps x to a latent representation z, and the decoder is trained to reconstruct x given (z, y). At inference time, a test image is encoded in the latent space, and the user chooses the attribute values y that are fed to the decoder. [pg. 1, § 1 Introduction, ¶1]);
Ladjal, Yoo, Kadav, Choi and Lample are all in the same field of endeavor of training autoencoder networks. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Choi discloses an autoencoder based incremental learning method. Lample discloses an encoder-decoder network trained to reconstruct images. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav’s/Choi’s training algorithms by allowing a user modify values to generate a new embedding as taught by Lample. One would have been motivated to make this modification in order to allow users to update certain features of the input data. [See Abstract and pg. 1, §1 Introduction, ¶2, Lample]
Regarding claim 20, Ladjal/Yoo/Kadav teaches The system of claim 11, however Ladjal/Yoo/Kadav fails to explicitly teach wherein the operations further comprise:
after performing all incremental trainings: 
receiving a new input; 
processing the new input using the trained encoder to generate an initial embedding for the new input; 
receiving a user input modifying values of a given embedding partition in the initial embedding to generate a new embedding; and 
processing the new embedding using the trained decoder to generate an output for the new embedding.
Choi teaches after performing all incremental trainings (“At training for each task, the importance measure obtained in the last task is used to compute LSI or LMAS, and a new importance measure is calculated for the task following right after. When the training is completed, {µi} is appended with new class means.” [pg. 4, § Incremental Training, ¶1]):
receiving a new input (“To further increase our model’s ability to learn new tasks, we enhance the mean prototypes of base classes via applying Local Outlier Factor” [pg. 3, § 3.4 Outlier Exclusion and Additional Training for Base Classes, ¶1]); 
processing the new input using the trained encoder to generate an initial embedding for the new input (“To further increase our model’s ability to learn new tasks, we enhance the mean prototypes of base classes via applying Local Outlier Factor and additionally training the encoder to fit to the altered mean prototypes” [pg. 3, § 3.4 Outlier Exclusion and Additional Training for Base Classes, ¶1]); and 
processing the new embedding using the trained decoder to generate an output for the new embedding (“After the final epoch, the whole training dataset is fed to the autoencoder and we separately collect the output of the encoder h(ϕ(x)) class by class to make class mean of prototypes {µi}. If the outlier exclusion and the additional training is used, LOF is applied to {µi} to obtain {µnew,i} and the encoder is trained by minimizing Ladd in Eq. 7.” [pg. 4, § Base Training, ¶1; inputting the training set into the autoencoder implies using the trained decoder to process the new task.]).
Ladjal, Yoo, Kadav, and Choi are all in the same field of endeavor of training machine learning models. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Choi discloses an autoencoder based incremental learning method. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav’s training algorithms by retraining model after incremental training as taught by Choi. One would have been motivated to make this modification in order to evaluate the trained model’s performance on new tasks. [pg. 3, § 3.4 Outlier Exclusion and Additional Training for Base Classes, Choi]
However Ladjal/Yoo/Choi fails to explicitly teach receiving a user input modifying values of a given embedding partition in the initial embedding to generate a new embedding;
Lample teaches receiving a user input modifying values of a given embedding partition in the initial embedding to generate a new embedding (“Our approach relies on an encoder-decoder architecture where, given an input image x with its attributes y, the encoder maps x to a latent representation z, and the decoder is trained to reconstruct x given (z, y). At inference time, a test image is encoded in the latent space, and the user chooses the attribute values y that are fed to the decoder. [pg. 1, § 1 Introduction, ¶1]);
Ladjal, Yoo, Kadav, Choi and Lample are all in the same field of endeavor of training autoencoder networks. Ladjal discloses a method of training a PCA autoencoder. Yoo discloses learning latent vectors with generative autoencoders with incremental learning. Kadav discloses training a plurality of machine learning model replicas in parallel. Choi discloses an autoencoder based incremental learning method. Lample discloses an encoder-decoder network trained to reconstruct images. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ladjal’s/Yoo’s/Kadav’s/Choi’s training algorithms by allowing a user modify values to generate a new embedding as taught by Lample. One would have been motivated to make this modification in order to allow users to update certain features of the input data. [See Abstract and pg. 1, §1 Introduction, ¶2, Lample]


Response to Arguments
Applicant's arguments filed 01/11/2022 have been fully considered but they are not persuasive.

Regarding the 35 U.S.C. §103 Rejections:
Applicant’s arguments on pg. 12 with respect to independent claims 1, 11, and 21 have been considered but are moot because the newly amended limitations are now taught by the newly presented art of Kadav. Please see the updated 103 rejection above.
Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Yao et al. ("A Map-reduce Method for Training Autoencoders on
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122