Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
1.	The Examiner acknowledges the applicant’s amendment filed May 26, 2022.  At this point claims 1, 5, 9-13 are pending in the instant application and ready for examination by the Examiner.

Response to Arguments
2.	Applicant’s arguments filed on May 26, 2022for claims 1, 5, 9-13 have been fully considered but are not persuasive.

3.	Applicant’s argument:
Claim Objection

Without admitting to the propriety of the objection, Applicant has removed the term “supervised learning” from claims 1 and 9. 

Examiner’s answer:
The examiner with draws the objection. 

4.	Applicant’s argument:
35 U.S.C. §112

The term “trainer unit” in the claims is supported by self-trainer-unit 110 in the specification and drawings. Further there is no in haec verba requirement for limitations used in the claims. Therefore, the present claims satisfy the written description requirement and the rejection of claims 1, 9-10 and 12 under 35 U.S.C. 112(a) should be withdrawn. 

Examiner’s answer:
The examiner with draws the rejection. 

5.	Applicant’s argument:

35 U.S.C. §103

Claims 1, 5 and 9-13 stand rejected under 35 U.S.C. §103 as being unpatentable over Rippel (U.S Publication No. 2018/0173994; hereinafter “Rippel”) in view of Mehrotra (‘Elements of artificial neural network’; hereinafter “Mehrotra”) and further in view of Milek (U.S. Patent No. 6,208,953; hereinafter “Milek”). This rejection is traversed as follows.

Accordingly, the trainer unit itself updates the learning task to generate a new learning task and generates the initial data corresponding to the new learning task. The initial data generated by the trainer unit is used by the learning model to generate training data candidates. Simply put, neither Rippel et al. or Milek et al. disclose that the trainer unit itself updates the initial learning task to generate a new learning task and generates the initial data corresponding to the content of the new learning task. Additionally, neither Rippel or Milek et al. disclose causing a learning model to generate training candidates which are generated based on the generated initial data.

Examiner’s answer:
New reference Zhang discloses learning / training in incremental steps for increasing complex tasks. 

6.	Applicant’s argument:
However, Rippel et al. do not disclose “generate initial data corresponding to the content of the new learning task, the initial data including a plurality of first input values,” as set forth in claim 1. That is, in Rippel et al. data corresponding to the content of the new learning task is not generated. Even further, Rippel et al. do not disclose generating data corresponding to the content of the new learning task that is then used by the learning model unit to generate training candidates.

Examiner’s answer:
For example, an easier model can be trained on a smaller patch size of an input image (e.g., 64×64) and a second, more complicated model can be fine-tuned from the easier model for a larger patch size (e.g., 256×256). (Ripple, 0028)

7.	Applicant’s argument:
Rippel and Mehrotra do not disclose expressly wherein the trainer unit is configured to: upon completion of learning of the initial learning task by the learning model unit, generate a copy of a learned model from the learning model unit and store the copy of the learned model in the storage unit, cause the learning model unit to generate training data candidates which are generated based on the generated initial data, the training data candidates include a plurality of first output values from the learning model unit for the plurality of first input values, determine whether each of the first output values is true for each of the first input values based on the validation rule, store pairs of first output values, which is are determined to be true, and their corresponding first input values into the storage unit, as new training data for supervised learning. 

Office Action, p. 9. Rather, Milek et al. are relied upon for allegedly curing the deficiencies.

Examiner’s answer:
Before the last current model M.sub.u-1 is changed, that is, before the model is calculated anew, the associated model parameters of the model M.sub.u-1 and a characteristic value which is representative of the quality of this model M.sub.u-1, for example, correlation matrices and/or estimated variances of the measurement noise, are stored. In the updating for the determination of the current model M.sub.u the older models, the model parameters and model uncertainties or qualities of which are stored, are preferably taken into consideration, in particular in a weighted manner. (Milek, c10:54-c11:18) Milek discloses storing elements of a model before a model is changed.

8.	Applicant’s argument:
Milek et al. disclose 

the model value ynu for the parameter xn is determined by means of the measured values x1u, X2u, . . . x(n—1)u with the help of the last current model Mu-1 or with the help of the newly determined model. See col. 10, Il. 59-63. “Then the residue rnu=xnu-ynu is formed, which is the difference between the actual measured value xnu and the model value ynu.” /d. at Il. 64-66.

Milek et al. also disclose minimizing the residue rnu and

[b]efore the last current model Mu-1 is changed, that is, before the model is calculated anew, the associated model parameters of the model Mu-1 and a characteristic value which is representative of the quality of this model Mu-1, for example, correlation matrices and/or estimated variances of the measurement noise, are stored. 

Col. 11, Il. 4-11.

Examiner’s answer:
The examiner does not cite this portion of Milek. 

9.	Applicant’s argument:
However, Milek et al. merely disclose optimizing a current model Mu. Milek et al. do not generate candidates for training data and Milek et al. do not generate new training data from training data candidates. Accordingly, Milek et al. do not disclose “cause the learning model unit to generate training data candidates which are generated based on the generated initial data,” as set forth in claim 1.

Examiner’s answer:
Milek is used to disclose saving a current model. Zhang is used to disclose the generation of a new model.

CLAIM INTERPRETATION
10. 	The following is a quotation of 35 U.S.C. 112(f):
(f) ELEMENT IN CLAIM FOR A COMBINATION – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

 	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP §2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B)    the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C)    the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: ‘trainer unit’ in claims 1 and 9-10, 12; ‘storage unit’ in claim 1, 9; ‘learning model unit’ in claims 1 and 10-12.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
11.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim(s) 1, 5, 9-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rippel in view of Mehrotra in view of Milek and in further in view of Zhang. (U. S. Patent Publication 20180173994, referred to as Rippel; ‘Elements of artificial neural networks’, referred to as Mehrotra; U. S. Patent 6208953, referred to as Milek; U. S. Patent 10915798, referred to as Zhang)

Claim 1
Rippel discloses an information processing system comprising: a learning model unit (Ripple, 0005; For example, one or more models can be trained once based on machine learning techniques, but the trained models can be applied to input images regardless of input image dimensions and desired target bit rate, and the one or more trained models are progressive with increased image reconstruction quality in response to increased available bits for compression.); a trainer unit configured to train the learning model unit (Ripple, 0014; FIG. 4A illustrates the training process of an adaptive arithmetic coding module, in accordance with an embodiment.); and a storage unit, wherein the storage unit stores initial training data (Ripple, 0029; Furthermore, the DLBC system 130 includes a training data store 190 where the data used to train different machine learning models are stored.)…. wherein the trainer unit is configured to input the initial training data into the learning model unit (Ripple, 0055; Reference is now made to FIG. 4A which illustrates the training process of the AAC module 225 to train a machine learning model to predict probability of context features, in accordance with an embodiment. The output of bitplane decomposition (e.g., binary code 405: B× C×H×W ϵ (0,1)) is used as input to train the model that determines context feature probabilities 420.), wherein the learning model unit learns an initial learning task based on the initial training data (Ripple, 0027; For example, a feature extraction model is trained, starting with training an easier model, e.g., for each scale of an input image,….),…. generate initial data corresponding to the content of the new learning task, the initial data including plurality of first input values (Ripple, 0028; For example, an easier model can be trained on a smaller patch size of an input image (e.g., 64×64) and a second, more complicated model can be fine-tuned from the easier model for a larger patch size (e.g., 256×256).),…. train the learning model unit based on the new training data, input test data to the learning model unit after learning of the learning model unit by using the new training data (Ripple, 0028; Other examples of training the machine learning models in stages include training based on a task such as training a first model on generic images and fine-tuning a second model based on the first model on targeted domains (e.g., faces, pedestrians, cartoons, etc.).), determine an accuracy rate to the test data based on the validation rule (Ripple, 0068; For example, the discriminator module 180 can choose to either train the discriminator model or backpropagate a confusion signal through the generator pipeline a function of the prediction accuracy of the trained model.), determine whether or not to continue the learning of the current learning content of the learning model based on the accuracy rate and on a determination condition defined in advance. (Ripple, 0069; More concretely, given lower and upper accuracy bounds L, U ϵ [0, 1] and discriminator accuracy a(D), the following procedure is applied: [0070] If a<L: stop propagating confusion signal, and continuously train the discriminator model. [0071] If L≤a<U: alternate continuously between propagating confusion signal and training the discriminator model. [0072] If U≤a: continuously propagate confusion signal, and freeze the training of the discriminator model.)
Rippel does not disclose expressly and a validation rule defined in advance that indicates a condition determining that an output value of the learning model unit for an input value is determined to be true.
Mehrotra discloses  and a validation rule defined in advance that indicates a condition determining that an output value of the learning model unit for an input value is determined to be true. (Mehrotra, p70; ‘The backpropagation algorithm is a generalization of the least mean squared algorithm that modifies network weights to minimize the mean squared error between the desired and actual outputs of the network.’ EC: per the specification, 0044, ‘The initial configuration parameter 141 includes a configuration parameter that is referred to in the learning of the learning model unit 120. The initial configuration parameter 141 includes, for example, a loss function, an optimization approach (for example, a specific algorithm of the gradient descent method), and an optimization parameter.’ Gradient descent is part of backpropagation. (Mehrotra, p65, ‘Backpropagation is similar to the LMS (least mean squared error) learning algorithm described earlier, and is based on gradient descent :..) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel and Mehrotra before him before the effective filing date of the claimed invention, to modify Rippel to incorporate a decision engine within a learning algorithm, the basics of training data of Mehrotra. Given the advantage of having the ability to stop learning and avoiding a loop, and comparing the actual output to the desired output as a measure of accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Rippel and Mehrotra do not disclose expressly wherein the trainer unit is configured to: upon completion of learning of the initial learning task by the learning model unit, generate a copy of a learned model from the learning model unit and store the copy of the learned model in the storage unit,…. cause the learning model unit to generate training data candidates which are generated based on the generated initial data, the training data candidates include a plurality of first output values from the learning model unit for the plurality of first input values  determine whether each of the first output values is true for each of the first input values based on the validation rule, store pairs of first output values, which is are determined to be true, and their corresponding first input values into the storage unit, as new training data. 
Milek discloses wherein the trainer unit is configured to: upon completion of learning of the initial learning task by the learning model unit, generate a copy of a learned model from the learning model unit and store the copy of the learned model in the storage unit (Milek, c10:54-c11:18; Before the last current model M.sub.u-1 is changed, that is, before the model is calculated anew, the associated model parameters of the model M.sub.u-1 and a characteristic value which is representative of the quality of this model M.sub.u-1, for example, correlation matrices and/or estimated variances of the measurement noise, are stored. In the updating for the determination of the current model M.sub.u the older models, the model parameters and model uncertainties or qualities of which are stored, are preferably taken into consideration, in particular in a weighted manner.),…. cause the learning model unit to generate training data candidates which are generated based on the generated initial data, the training data candidates include a plurality of first output values from the learning model unit for the plurality of first input values (Milek, c10:54-c11:18;  This means that the updating is a new determination of the model on the basis of the set of measured values x.sub.1u, x.sub.2u, . . . x.sub.nu. By means of the measured values x.sub.1u, x.sub.2u, . . . x.sub.(n-1)u the model value y.sub.nu for the parameter x.sub.n is determined with the help of the last current model M.sub.u-1 or with the help of the newly determined model.) determine whether each of the first output values is true for each of the first input values based on the validation rule (Milek, fig 2; Item 103 is the decision engine or validation rule.), store pairs of first output values, which is are determined to be true, and their corresponding first input values into the storage unit, as new training data. (Milek, c10:54-c11:18; Before the last current model M.sub.u-1 is changed, that is, before the model is calculated anew, the associated model parameters of the model M.sub.u-1 and a characteristic value which is representative of the quality of this model M.sub.u-1, for example, correlation matrices and/or estimated variances of the measurement noise, are stored. In the updating for the determination of the current model M.sub.u the older models, the model parameters and model uncertainties or qualities of which are stored, are preferably taken into consideration, in particular in a weighted manner.) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel, Mehrotra and Milek before him before the effective filing date of the claimed invention, to modify Rippel and Mehrotra to incorporate once a model has been trained at a given stage, saving a copy of that stage; when a model is ready for the next stage of training, generating training data for the next model; using a decision engine to determine is training is complete; saving the outputs of a previous model as training data for the next refined model; of Milek. Given the advantage of having a copy of the previous stage removes starting over if problems arise; new training data is required for a different refined model; the use of a decision engine removes going into a loop and avoids over training, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Rippel, Mehrotra and Milek do not disclose expressly update the initial learning task to content with a higher computational complexity than content of the initial learning task to generate a new learning task.
Zhang discloses update the initial learning task to content with a higher computational complexity than content of the initial learning task to generate a new learning task. (Zhang, c4:36-50, c8:35-67; ‘An effective course-to-fine hierarchical learning strategy that leverages hierarchical structure of an emotion wheel to imitate easy-to-hard learning of different tasks in cognitive studies may be utilized.’ And ‘The CNN trainer module 310 may therefore execute curriculum guided training by incrementally increasing the complexity of the training dataset using more granular image tags for each successive iteration of the training.) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel, Mehrotra, Milek and Zhang before him before the effective filing date of the claimed invention, to modify Rippel, Mehrotra and Milek to incorporate generating a series of models of incremental increasing complexity of Zhang. Given the advantage of testing and troubleshooting a small domain of code is steps is more efficient that a large domain of code in a single step one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 5
Rippel discloses wherein the first input values are the data for learning with computational complexity higher than that of the initial training data. (Rippel, 0028; Other examples of training the machine learning models in stages include training based on a task such as training a first model on generic images and fine-tuning a second model based on the first model on targeted domains (e.g., faces, pedestrians, cartoons, etc.).)

Claim 9
Rippel discloses a method performed in an information processing system including a learning model unit, a trainer unit configured to train the learning model, and a storage unit, wherein the storage unit stores initial training data and a validation rule defined in advance that indicates a condition determining that an output value of the learning model for an input value is true, and the method comprising: inputting, by the trainer unit, the initial training data into the learning model unit (Ripple, 0014; FIG. 4A illustrates the training process of an adaptive arithmetic coding module, in accordance with an embodiment.); and learning, by the learning model unit, an initial learning task based on the initial training data (Ripple, 0005, 0029; ‘For example, one or more models can be trained once based on machine learning techniques, but the trained models can be applied to input images regardless of input image dimensions and desired target bit rate, and the one or more trained models are progressive with increased image reconstruction quality in response to increased available bits for compression.’ And ‘Furthermore, the DLBC system 130 includes a training data store 190 where the data used to train different machine learning models are stored.’),…. generating initial data corresponding to the content of the new learning task, the initial data including a plurality of first input values (Ripple, 0028; For example, an easier model can be trained on a smaller patch size of an input image (e.g., 64×64) and a second, more complicated model can be fine-tuned from the easier model for a larger patch size (e.g., 256×256).),…. training the learning model unit based on the new training data; inputting test data to the learning model unit after learning of the learning model unit by using the new training data (Ripple, 0028; Other examples of training the machine learning models in stages include training based on a task such as training a first model on generic images and fine-tuning a second model based on the first model on targeted domains (e.g., faces, pedestrians, cartoons, etc.).); determining an accuracy rate to the test data based on the validation rule (Ripple, 0068; For example, the discriminator module 180 can choose to either train the discriminator model or backpropagate a confusion signal through the generator pipeline a function of the prediction accuracy of the trained model.); and determining whether or not to continue the learning of the current learning content of the learning model based on the accuracy rate and on a determination condition defined in advance. (Ripple, 0069; More concretely, given lower and upper accuracy bounds L, U ϵ [0, 1] and discriminator accuracy a(D), the following procedure is applied: [0070] If a<L: stop propagating confusion signal, and continuously train the discriminator model. [0071] If L≤a<U: alternate continuously between propagating confusion signal and training the discriminator model. [0072] If U≤a: continuously propagate confusion signal, and freeze the training of the discriminator model.)
Rippel and Mehrotra do not disclose expressly wherein the trainer unit performs: upon completion of learning of the initial learning task by the learning model unit, generating a copy of a learned model from the learning model unit and store the copy of the learned model in the storage unit,…. causing the learning model unit to generate training data candidates which are generated based on the generated initial data, the training data candidates include a plurality of output values from the learning model unit for the plurality of input values; determining whether each of the output values is true for each of the input values based on the validation rule; storing pairs of first output values, which is are determined to be true, and their corresponding input values into the storage unit, as new training data. 
Milek discloses wherein the trainer unit performs: upon completion of learning of the initial learning task by the learning model unit, generating a copy of a learned model from the learning model unit and store the copy of the learned model in the storage unit (Milek, c10:54-c11:18; Before the last current model M.sub.u-1 is changed, that is, before the model is calculated anew, the associated model parameters of the model M.sub.u-1 and a characteristic value which is representative of the quality of this model M.sub.u-1, for example, correlation matrices and/or estimated variances of the measurement noise, are stored. In the updating for the determination of the current model M.sub.u the older models, the model parameters and model uncertainties or qualities of which are stored, are preferably taken into consideration, in particular in a weighted manner.),…. causing the learning model unit to generate training data candidates which are generated based on the generated initial data, the training data candidates include a plurality of output values from the learning model unit for the plurality of input values (Milek, c10:54-c11:18;  This means that the updating is a new determination of the model on the basis of the set of measured values x.sub.1u, x.sub.2u, . . . x.sub.nu. By means of the measured values x.sub.1u, x.sub.2u, . . . x.sub.(n-1)u the model value y.sub.nu for the parameter x.sub.n is determined with the help of the last current model M.sub.u-1 or with the help of the newly determined model.); determining whether each of the output values is true for each of the input values based on the validation rule (Milek, fig 2; Item 103 is the decision engine or validation rule.); storing pairs of first output values, which is are determined to be true, and their corresponding input values into the storage unit, as new training data. (Milek, c10:54-c11:18; Before the last current model M.sub.u-1 is changed, that is, before the model is calculated anew, the associated model parameters of the model M.sub.u-1 and a characteristic value which is representative of the quality of this model M.sub.u-1, for example, correlation matrices and/or estimated variances of the measurement noise, are stored. In the updating for the determination of the current model M.sub.u the older models, the model parameters and model uncertainties or qualities of which are stored, are preferably taken into consideration, in particular in a weighted manner.) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel, Mehrotra and Milek before him before the effective filing date of the claimed invention, to modify Rippel and Mehrotra to incorporate
once a model has been trained at a given stage, saving a copy of that stage;
when a model is ready for the next stage of training, generating training data for the next model; using a decision engine to determine is training is complete; saving the outputs of a previous model as training data for the next refined model; of Milek. Given the advantage of having a copy of the previous stage removes starting over if problems arise; new training data is required for a different refined model; the use of a decision engine removes going into a loop and avoids over training, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Rippel, Mehrotra and Milek do not disclose expressly updating the initial learning task to content with a higher computational complexity than content of the initial learning task to generate a new learning task.
Zhang discloses updating the initial learning task to content with a higher computational complexity than content of the initial learning task to generate a new learning task. (Zhang, c4:36-50, c8:35-67; ‘An effective course-to-fine hierarchical learning strategy that leverages hierarchical structure of an emotion wheel to imitate easy-to-hard learning of different tasks in cognitive studies may be utilized.’ And ‘The CNN trainer module 310 may therefore execute curriculum guided training by incrementally increasing the complexity of the training dataset using more granular image tags for each successive iteration of the training.) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel, Mehrotra, Milek and Zhang before him before the effective filing date of the claimed invention, to modify Rippel, Mehrotra and Milek to incorporate generating a series of models of incremental increasing complexity of Zhang. Given the advantage of testing and troubleshooting a small domain of code is steps is more efficient that a large domain of code in a single step one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 10
Rippel discloses wherein the trainer unit is configured to: after learning of the learning model unit by using the new training data, inputting a plurality of second input values to the learning model unit to obtain a plurality of second output values. (Rippel, 0028; The example of, ‘Other examples of training the machine learning models in stages include training based on a task such as training a first model on generic images and fine-tuning a second model based on the first model on targeted domains (e.g., faces, pedestrians, cartoons, etc.).’)
Rippel does not disclose expressly using a pair of a second output value, which is determined to be true in the plurality of second output values, and the corresponding second input value, as training data for the relearning of the learning mode unit. 
Mehrotra discloses using a pair of a second output value, which is determined to be true in the plurality of second output values, and the corresponding second input value, as training data for the relearning of the learning mode unit. (Mehrotra, p38; In network development, therefore, available data is separated into two parts, of which one part is the training data and other part is the test data.) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel and Mehrotra before him before the effective filing date of the claimed invention, to modify Rippel to incorporate a decision engine within a learning algorithm, the basics of training data of Mehrotra. Given the advantage of having the ability to stop learning and avoiding a loop, and comparing the actual output to the desired output as a measure of accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Rippel and Mehrotra do not disclose expressly determining whether each of the plurality of second output values is true for each of the plurality of second input values by referring to the validation rule.
Milek discloses determining whether each of the plurality of second output values is true for each of the plurality of second input values by referring to the validation rule. (Milek, fig 2; Item 103 is the decision engine or validation rule. EC: This is a simple iterative process where version of the model can be item ‘A.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel, Mehrotra and Milek before him before the effective filing date of the claimed invention, to modify Rippel and Mehrotra to incorporate once a model has been trained at a given stage, saving a copy of that stage; when a model is ready for the next stage of training, generating training data for the next model; using a decision engine to determine is training is complete; saving the outputs of a previous model as training data for the next refined model; of Milek. Given the advantage of having a copy of the previous stage removes starting over if problems arise; new training data is required for a different refined model; the use of a decision engine removes going into a loop and avoids over training, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 11
Rippel discloses wherein the plurality of second input values are the data for learning with computational complexity higher than that of the plurality of first input values (Rippel, 0028; Other examples of training the machine learning models in stages include training based on a task such as training a first model on generic images and fine-tuning a second model based on the first model on targeted domains (e.g., faces, pedestrians, cartoons, etc.).)

Claim 12
Rippel discloses wherein the trainer unit performs the steps of: after learning of the learning model unit by using the new training data, inputting a plurality of second input values to the learning model unit to obtain a plurality of second output values. (Rippel, 0028; The example of, ‘Other examples of training the machine learning models in stages include training based on a task such as training a first model on generic images and fine-tuning a second model based on the first model on targeted domains (e.g., faces, pedestrians, cartoons, etc.).’)
Rippel does not disclose expressly using a pair of a second output value, which is determined to be true in the plurality of second output values, and the corresponding second input value, as training data for the relearning of the learning mode unit. 
Mehrotra discloses using a pair of a second output value, which is determined to be true in the plurality of second output values, and the corresponding second input value, as training data for the relearning of the learning mode unit. (Mehrotra, p38; In network development, therefore, available data is separated into two parts, of which one part is the training data and other part is the test data.) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel and Mehrotra before him before the effective filing date of the claimed invention, to modify Rippel to incorporate a decision engine within a learning algorithm, the basics of training data of Mehrotra. Given the advantage of having the ability to stop learning and avoiding a loop, and comparing the actual output to the desired output as a measure of accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Rippel and Mehrotra do not disclose expressly determining whether each of the plurality of second output values is true for each of the plurality of second input values by referring to the validation rule.
Milek discloses determining whether each of the plurality of second output values is true for each of the plurality of second input values by referring to the validation rule. (Milek, fig 2; Item 103 is the decision engine or validation rule. EC: This is a simple iterative process where version of the model can be item ‘A.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Rippel, Mehrotra and Milek before him before the effective filing date of the claimed invention, to modify Rippel and Mehrotra to incorporate once a model has been trained at a given stage, saving a copy of that stage; when a model is ready for the next stage of training, generating training data for the next model; using a decision engine to determine is training is complete; saving the outputs of a previous model as training data for the next refined model; of Milek. Given the advantage of having a copy of the previous stage removes starting over if problems arise; new training data is required for a different refined model; the use of a decision engine removes going into a loop and avoids over training, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 13
Rippel discloses wherein the plurality of second input values are the data for learning with computational complexity higher than that of the plurality of first input values. (Rippel, 0028; Other examples of training the machine learning models in stages include training based on a task such as training a first model on generic images and fine-tuning a second model based on the first model on targeted domains (e.g., faces, pedestrians, cartoons, etc.).)

12.	Claims 1, 5, 9-13 are rejected.

Conclusion – Final
13.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Correspondence Information
14.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Michael Huntley can be reached at (303) 297-4307.  .  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129