Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
Claims 1 and 11 are amended. Claims 1-5, 7-15, and 17-20 are pending and have been considered.

Claim Objections
Claims 1 and 11 are objected to because of the following informalities:  In claims 1 and 11, second-to-last line, it is unclear whether “each of the plurality of models” is supposed to mean “each plurality” or “each model in the plurality of models”. For purposes of examination, Examiner interprets this limitation to mean “each model in the plurality of models”. Appropriate correction is required.

Drawings
In Fig. 1 of the drawings filed 12/21/2017, the model ensemble 104 contains “modules” instead of models as disclosed in para. [0023] of the amended specification filed 09/02/2021. The drawings are not objected to and no action is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 1-2, 4-5, 9, 11-12, 14-15, and 19  are rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1) in view of Odaibo et al. (US 20190043193 A1) and Detwiler et al. (US 20150154318 A1). All references were cited in the PTO-892 filed 05/13/2022.

Regarding CLAIM 1, Kobayashi teaches: A method of generating a model ensemble, comprising: 
training, via at least one processor, a base model including a plurality of layers; (A base model is interpreted as a seed network in ¶ [0056] - [0057] which has 10 layers. ¶ [0053], lines 6-8 state the seed network is also called the original network. ¶ [0110] and Fig. 8A disclose a seed network is trained, as indicated by the seed training error ST and seed validation error SV which are evaluated and plotted. ¶ [0204] teaches CPU 871.)
generating, via the at least one processor, a plurality of models for the model ensemble based on the base model, each model of the plurality of models being an… replica of the base model and including a respective plurality of layers that are replicas of the plurality of layers of the base model; (¶ [0059] and Fig. 1 disclose this limitation. The BRI of a base model includes neural network MN1. This network generates neural network MN2, in which every layer except for “Abs1” is a replica of a layer in MN1. Furthermore, ¶ [0142], lines 1-7 discloses that an original/seed network can generate a plurality of models.)
modifying, via the at least one processor, a layer of each of the plurality of models, each layer being modified using a different respective learning algorithm such that each model of the plurality of models includes a layer modified in a different manner as compared to a corresponding same layer of the base model and of each of the other plurality of models; (¶ [0093], [0097], and [0098] disclose changing a layer type of an existing layer and changing a parameter relating to an existing layer. ¶ [0058] and Fig. 1 teach changing a Relu activation function to a Tanh activation function. ¶ [0128]-[0129] teach changing parameters for Conv and Pool layers. Figs. 1, 10A and 10B disclose many other types of layers. The BRI of this limitation is that one layer in each of the four networks MN1 to MN4 (as disclosed by ¶ [0142], lines 1-7) mutates in a way unique to that network.)
tuning, via the at least one processor, each modified layer of the plurality of models; and (¶ [0079]-[0080] and ¶ [0143] teach training the generated neural networks.)
	However, Kobayashi does not explicitly teach: each model being an exact replica of the base model 
	While Kobayashi teaches a modified plurality of models, Kobayashi does not explicitly teach: aggregating the… plurality of models into a model ensemble in which the model ensemble is configured to solve one or more problems for which each of the plurality of models is trained.
But Odaibo teaches: aggregating the… plurality of models into a model ensemble in which the model ensemble is configured to solve one or more problems for which each of the plurality of models is trained. (¶ [0007], Fig. 10, ¶ [0056] generally teach the ensemble; ¶ [0022], [0023], and [0044] teach that convolutional networks, activation layers, and multilayer perceptrons, respectively, are members of the ensemble. ¶ [0025] discloses the ensemble and each model in the plurality of models is configured to solve a problem of classifying input data.)
Odaibo is in the same field of endeavor as the claimed invention, namely, machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have generated a model ensemble output, as taught by Odaibo, for Kobayashi’s plurality of networks. A motivation for the combination is that within the end-to-end approaches, ensemble strategies have shown some advantage over non-ensemble approaches. (Odaibo, ¶ [0007], line 4-6)
	However, neither Kobayashi nor Odaibo explicitly teaches: each model being an exact replica of the base model
But Detwiler teaches: each model being an exact replica of the base model (¶ [0151], col. 2, lines 4-6 teach that “copying is often employed” to generate the child models from the parent models. ¶ [0151], col. 2, lines 9-10 teach the child models are mutated. Fig. 13 shows an evolutionary loop. As explained by ¶ [0151], the step in Fig. 13 called “Generate Next Generation” copies the parent model to generate the child models and the step “Mutate” then mutates the child models.)
	Detwiler is in the same field of endeavor as the claimed invention, namely, evolutionary algorithms. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have generated Kobayashi’s child models by copying the original/parent model before mutating the child models. A motivation for the combination is that this method is often employed to produce the next generation. (Detwiler, ¶ [0151], col. 2, lines 4-6)

	Regarding CLAIM 2, the combination of Kobayashi, Odaibo, and Detwiler teaches: The method of claim 1,
Kobayashi teaches: further comprising: receiving an output from each of the plurality of models; and (¶ [0053] discloses the seed network has an output layer. ¶ [0143] is evidence that each model generates an output.)
However, neither Kobayashi nor Detwiler explicitly teaches: generating, via the at least one processor, a model ensemble output based on the output of each of the plurality of models.
But Odaibo teaches: generating, via the at least one processor, a model ensemble output based on the output of each of the plurality of models. (¶ [0007], Fig. 10, ¶ [0056] generally teach the ensemble; ¶ [0022], [0023], and [0044] teach convolutional networks, activation layers, and multilayer perceptrons, respectively, are members of the ensemble.)

Regarding CLAIM 4, the combination of Kobayashi, Odaibo, and Detwiler teaches: The method of claim 1, 
Kobayashi teaches: wherein modifying comprises modifying at least one training parameter of the layer of each of the plurality of models. (¶ [0128] – [0129] teaches modifying a kernel shape relating to Conv1 and a pool shape relating to Pool2. This teaches modifying weights for one or more connections of the layer and a number of connections of the layer.)

Regarding CLAIM 5, the combination of Kobayashi, Odaibo, and Detwiler teaches: The method of claim 4, 
Kobayashi teaches: wherein modifying at least one training parameter of the layer comprises modifying at least one of a number of bits of the layer, a number of neurons of the layer, weights for one or more connections of the layer, and a number of connections of the layer. (¶ [0128] – [0129] teaches modifying a kernel shape relating to Conv1 and a pool shape relating to Pool2. This teaches modifying weights for one or more connections of the layer and a number of connections of the layer.)

	Regarding CLAIM 9, the combination of Kobayashi, Odaibo, and Detwiler teaches: The method of claim 1, 
Kobayashi teaches: further comprising: arbitrarily selecting at least one additional layer in at least one model for modification; (¶ [0094], lines 1-3 teach mutations are determined randomly. ¶ [0102], lines 1-3 and ¶ [0129] teach multiple layers are changed at once.)
modifying the selected at least one additional layer; and (¶ [0102], lines 1-3 and ¶ [0129])
tuning the selected at least one additional layer. (The BRI of this claim includes training both modified layers at the same time. ¶ [0079]-[0080] and ¶ [0143] teach training the generated neural networks.)

Claims 11-12, 14-15, and 19 are product claims that recites the same features as method claims 1-2, 4-5, and 9, respectively. Claim 11 recites the additional features of one or more non-transitory computer-readable media that include instructions that, when executed by one or more processors, are configured to cause the one or more processors to perform operations, the operations comprising the method of claim 1. Kobayashi, ¶ [0205] teaches the above features. Claims 11-12 14-15, and 19 are rejected for the reasons set forth in the rejection of claims 1-2, 4-5, and 9.

Claim 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1, cited in the PTO-892 filed 05/13/2022) in view of Odaibo et al. (US 20190043193 A1, cited in the PTO-892 filed 05/13/2022), Detwiler et al. (US 20150154318 A1, cited in the PTO-892 filed 05/13/2022), and Saldana et al. (US 20180018560, cited in the PTO-892 filed 06/03/2021). 

Regarding CLAIM 3, the combination of Kobayashi, Odaibo, and Detwiler teaches: The method of claim 1,
	However, neither Kandaswamy, Odaibo, nor Detwiler explicitly teaches: wherein modifying comprises modifying the layer of each of the plurality of models based on at least one of clustering and quantization.
	But Saldana teaches: wherein modifying comprises modifying the layer of each of the plurality of models based on at least one of clustering and quantization. (Saldana teaches quantizing weight value into a binary values. ¶ [0055]-[0057] and Fig. 7, 720 discloses that the quantized weight is +1 if the weight is at greater than or equal to zero; and the quantized weight is -1 otherwise.)
Saldana is in the same field of endeavor as the claimed invention, namely, machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the layer of each of the plurality of models by quantizing a weight into binary values ± 1, as taught by Saldana, with a motivation to reduce memory storage requirements and reduce memory bandwidth requirements. (Saldana [0016]-[0017]).

	Claim 13 recites the same features as claim 3. Claim 13 relies upon claim 11, which recites the additional features of one or more non-transitory computer-readable media that include instructions that, when executed by one or more processors, are configured to cause the one or more processors to perform operations, the operations comprising the method of claim 1. Kobayashi, ¶ [0205] teaches the above features. Claim 13 is rejected for the reasons set forth in the rejection of claim 3.

Claims 7-8 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1, cited in the PTO-892 filed 05/13/2022) in view of Odaibo et al. (US 20190043193 A1, cited in the PTO-892 filed 05/13/2022), Detwiler et al. (US 20150154318 A1, cited in the PTO-892 filed 05/13/2022), and Kandaswamy et al. (“Deep Transfer Learning Ensemble for Classification”, see PTO-892 filed 06/03/2021).

Regarding CLAIM 7, the combination of Kobayashi, Odaibo, and Detwiler teaches: The method of claim 1, 
Kobayashi’s ¶ [0079]-[0080] and ¶ [0143] teach training the generated neural networks. However, neither Kobayashi, Odaibo, nor Detwiler explicitly teach: wherein tuning each modified layer comprises tuning each modified layer with an X number of epochs.
	But Kandaswamy teaches: wherein tuning each modified layer comprises tuning each modified layer with an X number of epochs. (Kandaswamy teaches transfer learning on p. 335, § 1, first paragraph, and p. 338, paragraphs titled “Transfer Learning (TL)” and “Transferred Layers”. Kandaswamy teaches a system for transferring layers from a base network to a target networks and then fine tuning the target networks. Tuning each modified layer with a certain number of epochs is disclosed on p. 340, Algorithm 1, col. 2, Stage 2: “ for each epoch in Fine-tuning do ”. At least 1 epoch is performed.)
Kandaswamy is in the same field of endeavor as the claimed invention, namely, transfer learning for machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have trained Kobayashi’s mutated networks for at least 1 epoch, as taught by Kandaswamy. Kandaswamy’s base and target networks are analogous to Kobayashi’s original and mutated networks. A motivation for the combination is to calculate a training error and a validation error by training the model for at least one epoch, wherein the errors are used to evaluate the structure of the generated model. (Kobayashi, ¶ [0080])

Regarding CLAIM 8, the combination of Kobayashi, Odaibo, Detwiler, and Kandaswamy teaches: The method of claim 7, 
However, neither Kobayashi, Odaibo, nor Detwiler explicitly teaches: wherein training the base model comprises training the base model with ten times X number of epochs. 
	But Kandaswamy teaches: wherein training the base model comprises training the base model with ten times X number of epochs. (P. 335, § 1, line 3: “train a source model”; p. 338, “Baseline (BL)”, lines 3-4: “SDA training comprises of two stages: an unsupervised pre-training phase followed by a supervised fine-tuning stage”. On p. 343, the last paragraph states that the three networks used in experiments were pre-trained on a minimum of 25, 10, and 30 epochs each (Lines 4, 6, and 8, respectively.) The limitation “training the base model comprises” includes training the base model on at least 10 epochs.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have pre-trained Kobayashi’s original network for at least 10 epochs. When Kobayashi’s original network is mutated, the trained parameters are transferred for every layer except for the mutated ones. Transfer learning gives better generalization with less computational effort and offers several advantages over traditional machine learning specially for non-stationary environments where the training and test samples may be drawn from different marginal distributions or the classification tasks may not be identical. (Kandaswamy, p. 335, § 1, first paragraph)

Claims 17 and 18 recite the same features as claims 7 and 8. Claims 17 and 18 rely upon claim 11, which recites the additional features of one or more non-transitory computer-readable media that include instructions that, when executed by one or more processors, are configured to cause the one or more processors to perform operations, the operations comprising the method of claim 1. Kobayashi, ¶ [0205] teaches the above features. Claims 17 and 18 are rejected for the reasons set forth in the rejection of claim 7 and 8.

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1) in view of Odaibo et al. (US 20190043193 A1), Detwiler et al. (US 20150154318 A1), and Baker (US 20200285948 A1). All references were cited in the PTO-892 filed 05/13/2022.

Regarding CLAIM 10, the combination of Kobayashi, Odaibo, and Detwiler teaches: The method of claim 1, 
Kobayashi teaches: wherein training the base model comprises training the base model… (¶ [0110] and Fig. 8A disclose a seed network is trained, as indicated by the seed training error ST and seed validation error SV which are evaluated and plotted. ¶ [0204] teaches CPU 871.)
However, neither Kobayashi, Odaibo, nor Detwiler explicitly teaches: training the base model via random initialization.
But Baker teaches: training the base model via random initialization. (¶ [0041], last sentence)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have randomly initialized connection weights of Kobayashi’s base model, as taught by Baker. Such random initialization is well-known to those skilled in the art of training deep neural networks. (Baker, ¶ [0041], last sentence)

	Claim 20 recites the same features as claim 10. Claim 20 relies upon claim 11, which recites the additional features of one or more non-transitory computer-readable media that include instructions that, when executed by one or more processors, are configured to cause the one or more processors to perform operations, the operations comprising the method of claim 1. Kobayashi, ¶ [0205] teaches the above features. Claim 20 is rejected for the reasons set forth in the rejection of claim 10.


Response to Arguments
Applicant's arguments filed 08/15/2022 have been fully considered but they are not persuasive.
Claim Rejections Under 35 U.S.C. 103 (Remarks p. 6-8): Applicant argues: “The above-recited claim elements relate to a model ensemble aspect of the invention in that different changes are made to the same layers in different models to create larger diversity in a model ensemble. By comparison, the cited references merely mention that a model can be copied and then changed. The idea of making sure different changes happen across a same layer to create a more robust ensemble model, such as in the manner recited by the claims, does not appear to be found in those references.”
	Examine respectfully disagrees. The combination of Kobayashi, Odaibo, and Detwiler teaches claims 1 and 11. Kobayashi teaches: training, via at least one processor, a base model including a plurality of layers; generating, via the at least one processor, a plurality of models for the model ensemble based on the base model, each model of the plurality of models being an… replica of the base model and including a respective plurality of layers that are replicas of the plurality of layers of the base model; modifying, via the at least one processor, a layer of each of the plurality of models, each layer being modified using a different respective learning algorithm such that each model of the plurality of models includes a layer modified in a different manner as compared to a corresponding same layer of the base model and of each of the other plurality of models; tuning, via the at least one processor, each modified layer of the plurality of models. Odaibo teaches: aggregating the… plurality of models into a model ensemble in which the model ensemble is configured to solve one or more problems for which each of the plurality of models is trained. Detwiler teaches: each model being an exact replica of the base model.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.H.J./Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127