DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending and have been considered.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference character “107” has been used to designate both input and output. In ¶[0023] lines 3 and 4, “input 107” should read “input 105” (2 instances). 
The drawings are objected to because in Fig. 3, block 306 should read “tune” instead of “turn”.  
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
Claims 8 and 18 are objected to because “10X” should read “ten times X” or similarly. Appropriate action is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-2, 4-12, and 14-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kandaswamy et al. (“Deep Transfer Learning Ensemble for Classification”). 
Regarding claim 1, Kandaswamy teaches: A method of generating a model ensemble, comprising: 
training, via at least one processor, a base model including a plurality of layers; (In Algorithm 1 on p. 340, the Baseline Stage 1 pretrains the network and the Baseline Stage 2 fine-tunes the network. A processor is taught on p. 344: “Our code for experiments… ran with the help of an GTX 770 GPU.”)
generating, via the at least one processor, a plurality of models for the model ensemble based on the base model, each model of the plurality of models including a plurality of layers; (P. 338: “Transferred Layers: We select a set of layers of the whole baseline network to transfer.” Also taught in Algorithm 1, col. 2, Stage 1: “Select which hidden layers to transfer”.)
modifying, via the at least one processor, a layer of each of the plurality of models such that each model of the plurality of models includes a layer modified with respect to an associated layer of each of the base model and an associated layer of each of the other plurality of models; and (P. 338 states that after transferring layers, “The rest of the target network layer features are randomly initialized.” Algorithm 1, col. 2, Stage 1 teaches randomly initializing weights:

    PNG
    media_image1.png
    66
    334
    media_image1.png
    Greyscale

Fig. 2 below shows transferred layers and randomly initialized layers. For the left generated model, hidden layers 1 and 3 were transferred and hidden layer 2 was randomly initialized. For the center generated model, hidden layer 2 was transferred and hidden layers 1 and 3 were randomly initialized. For the right generated model, hidden layer 1 was transferred and hidden layers 2 and 3 were randomly initialized. Note that the hidden layers are color-coded in the printed publication. Transferred layers are green and randomly initialized layers are red.) 
    PNG
    media_image2.png
    385
    720
    media_image2.png
    Greyscale

Kandaswamy Fig. 2: Ensemble of Deep Transfer Learning

tuning, via the at least one processor, each modified layer of the plurality of models. (Tuning is interpreted as fine-tuning in the following citation. P. 338: “Retraining Layers: … We have a choice to fine-tune this entire network as a multi-layer perceptron using back-propagation or lock a layer, meaning the transferred feature from source network do not change during the error propagation for the target task. Thus giving a choice of whether or not to fine-tune the certain layers of the target network.” Under the broadest reasonable interpretation, the unlock symbol in every layer of Fig. 2 means that weights in the randomly initialized layers can get updated during tuning. This is further supported by Algorithm 1, right col. Stage 2: 

    PNG
    media_image3.png
    152
    376
    media_image3.png
    Greyscale


Regarding claim 2, Kandaswamy teaches: The method of claim 1, further comprising: 
receiving an output from each of the plurality of models; and (Represented in Fig. 2 by the three arrows exiting each logistic regression and entering the ensemble of posterior probability.)
generating, via the at least one processor, a model ensemble output based on the output of each of the plurality of models. (Represented in Fig. 2 by the Ensemble of Posterior Probability. Claim 2 is further disclosed by Algorithm 2 lines 12 and 14 in which an ensemble is generated.)

Regarding claim 4, Kandaswamy teaches: The method of claim 1, wherein modifying comprises modifying at least one training parameter of the layer of each of the plurality of models. (Algorithm 1, col. 2, Stage 1 teaches modifying a training parameter of the layer by randomly initializing weights that weren’t transferred:

    PNG
    media_image1.png
    66
    334
    media_image1.png
    Greyscale


	Regarding claim 5, Kandaswamy teaches: The method of claim 4, wherein modifying at least one training parameter of the layer comprises modifying at least one of a number of bits of the layer, a number of neurons of the layer, weights for one or more connections of the layer, and a number of connections of the layer. (Algorithm 1, col. 2, Stage 1 teaches modifying a training parameter of the layer by randomly initializing weights that weren’t transferred. 

    PNG
    media_image1.png
    66
    334
    media_image1.png
    Greyscale

The broadest reasonable interpretation of the claim only requires one alternative from the list.)
Regarding claim 6, Kandaswamy teaches: The method of claim 1, wherein generating comprises generating, via the at least one processor, each of the plurality of models as a replica of the base model. (Under BRI, the generated models in Fig. 2 above are replicated from the base model.)

Regarding claim 7, Kandaswamy teaches: The method of claim 1, wherein tuning each modified layer comprises tuning each modified layer with an X number of epochs. (Tuning each modified layer with a certain number of epochs is disclosed in Algorithm 1, col. 2, Stage 2:

    PNG
    media_image4.png
    151
    368
    media_image4.png
    Greyscale

Since the number of epochs is not disclosed, the broadest reasonable interpretation allows for an epoch number of X = 1.)

Regarding claim 8, Kandaswamy teaches: The method of claim 7, wherein training the base model comprises training the base layer with 10X number of epochs. (On p. 343, the last paragraph states that the three networks used in experiments were pre-trained on a minimum of 25, 10, and 30 epochs each. Under the broadest reasonable interpretation, the base model layers in each experiment were trained on at least 10 epochs, which is 10 times the number of epochs the modified layers were interpreted to have been trained on in claim 7.)

Regarding claim 9, Kandaswamy teaches: The method of claim 1, further comprising: arbitrarily selecting at least one additional layer in at least one model for modification; modifying the selected at least one additional layer; and tuning the selected at least one additional layer. (In Fig. 2, the center and right models each have two modified layer. The BRI allows for an interpretation that the two layers were fine-tuned unlocked.)

Regarding claim 10, Kandaswamy teaches: The method of claim 1, wherein training the base model comprises training the base model via random initialization. (Algorithm 1 teaches randomly initializing the base model’s weights as indicated by the arrows below.)

    PNG
    media_image5.png
    175
    437
    media_image5.png
    Greyscale


Regarding claim 11, Kandaswamy teaches: One or more non-transitory computer-readable media that include instructions that, when executed by one or more processors, are configured to cause the one or more processors to perform operations, the operations comprising: (A processor is taught on p. 344: “Our code for experiments… ran with the help of an GTX 770 GPU.” The experimental results section starting on p. 343 are evidence of non-transitory computer-readable media that include instructions)
training a base model including a plurality of layers; (In Algorithm 1 on p. 340, the Baseline Stage 1 pretrains the network and the Baseline Stage 2 fine-tunes the network.)
generating a plurality of models for a model ensemble based on the base model, each model of the plurality of models including a plurality of layers; (P. 338: “Transferred Layers: We select a set of layers of the whole baseline network to transfer.” Also taught in Algorithm 1, col. 2, Stage 1: “Select which hidden layers to transfer”.)
modifying a layer of each of the plurality of models such that each model of the plurality of models includes a layer modified with respect to an associated layer of each of the base model and an associated layer of each of the other plurality of models; and (P. 338 states that after transferring layers, “The rest of the target network layer features are randomly initialized.” Algorithm 1, col. 2, Stage 1 teaches randomly initializing weights:

    PNG
    media_image1.png
    66
    334
    media_image1.png
    Greyscale

Fig. 2 above shows transferred layers and randomly initialized layers. For the left generated model, hidden layers 1 and 3 were transferred and hidden layer 2 was randomly initialized. For the center generated model, hidden layer 2 was transferred and hidden layers 1 and 3 were randomly initialized. For the right generated model, hidden layer 1 was transferred and hidden layers 2 and 3 were randomly initialized. Note that the hidden layers are color-coded in the printed publication. Transferred layers are green and randomly initialized layers are red.) 
tuning each modified layer of the plurality of models. Tuning is interpreted as fine-tuning in the following citation. P. 338: “Retraining Layers: … We have a choice to fine-tune this entire network as a multi-layer perceptron using back-propagation or lock a layer, meaning the transferred feature from source network do not change during the error propagation for the target task. Thus giving a choice of whether or not to fine-tune the certain layers of the target network.” Under the broadest reasonable interpretation, the unlock symbol in every layer of Fig. 2 means that weights in the randomly initialized layers can get updated during tuning. This is further supported by Algorithm 1, right col. Stage 2: 

    PNG
    media_image3.png
    152
    376
    media_image3.png
    Greyscale


Regarding claim 12, Kandaswamy teaches: The computer-readable media of claim 11, the operations further comprising: 
receiving an output from each of the plurality of models; and (Represented in Fig. 2 by the three arrows exiting each logistic regression and entering the ensemble of posterior probability.)
generating a model ensemble output based on the output of each of the plurality of models. (Represented in Fig. 2 by the Ensemble of Posterior Probability. Claim 2 is further disclosed by Algorithm 2 lines 12 and 14 in which an ensemble is generated.)

Regarding claim 14, Kandaswamy teaches: The computer-readable media of claim 11, wherein modifying comprises modifying at least one training parameter of the layer of each of the plurality of models. (Algorithm 1, col. 2, Stage 1 teaches modifying a training parameter of the layer by randomly initializing weights that weren’t transferred:

    PNG
    media_image1.png
    66
    334
    media_image1.png
    Greyscale


	Regarding claim 15, Kandaswamy teaches: The computer-readable media of claim 14, wherein modifying at least one training parameter of the layer comprises modifying at least one of a number of bits of the layer, a number of neurons of the layer, weights for one or more connections of the layer, and a number of connections of the layer. (Algorithm 1, col. 2, Stage 1 teaches modifying a training parameter of the layer by randomly initializing weights that weren’t transferred. 

    PNG
    media_image1.png
    66
    334
    media_image1.png
    Greyscale

The broadest reasonable interpretation of the claim only requires one alternative from the list.)
	Regarding claim 16, Kandaswamy teaches: The computer-readable media of claim 11, wherein generating comprises generating, via the at least one processor, each of the plurality of models as a replica of the base model. (Under BRI, the generated models in Fig. 2 above are replicated from the base model.)

	Regarding claim 17, Kandaswamy teaches: The computer-readable media of claim 11, wherein tuning each modified layer comprises tuning each modified layer with an X number of epochs. (Tuning each modified layer with a certain number of epochs is disclosed in Algorithm 1, col. 2, Stage 2:

    PNG
    media_image4.png
    151
    368
    media_image4.png
    Greyscale

Since the number of epochs is not disclosed, the broadest reasonable interpretation allows for an epoch number of X = 1.)

	Regarding claim 18, Kandaswamy teaches: The computer-readable media of claim 17, wherein training the base model comprises training the base layer with 10X number of epochs. (On p. 343, the last paragraph states that the three networks used in experiments were pre-trained on a minimum of 25, 10, and 30 epochs each. Under the broadest reasonable interpretation, the base model layers in each experiment were trained on at least 10 epochs, which is 10 times the number of epochs the modified layers were interpreted to have been trained on in claim 17.)

	Regarding claim 19, Kandaswamy teaches: The computer-readable media of claim 11, the operations further comprising: arbitrarily selecting at least one additional layer in at least one model for modification; modifying the selected at least one additional layer; and tuning the selected at least one additional layer. (In Fig. 2, the center and right models each have two modified layer. The BRI allows for an interpretation that the two layers were fine-tuned unlocked.)

Regarding claim 20, Kandaswamy teaches: The computer-readable media of claim 11, wherein training the base model comprises training the base model via random initialization. (Algorithm 1 teaches randomly initializing the base model’s weights as indicated by the arrows below.)

    PNG
    media_image5.png
    175
    437
    media_image5.png
    Greyscale


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kandaswamy in view of Saldana et al. (US 20180018560).
	
Regarding claim 3, Kandaswamy teaches: The method of claim 1, wherein modifying comprises modifying the layer of each of the plurality of models
However, Kandaswamy does not explicitly teach: based on at least one of clustering and quantization.
	But Saldana teaches: based on at least one of clustering and quantization. (Saldana teaches quantizing weight value into a binary values. [0055]-[0057] and Fig. 7, 720 discloses that the quantized weight is +1 if the weight is at greater than or equal to zero; and the quantized weight is -1 otherwise. The BRI allows for selecting only one alternative from the list.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the layer of each of the plurality of models (as taught by Kandaswamy) by quantizing a weight into binary values ± 1, as taught by Saldana, with a motivation to reduce memory storage requirements and reduce memory bandwidth requirements (Saldana [0016]-[0017]).

Regarding claim 13, Kandaswamy teaches: The computer-readable media of claim 11, wherein modifying comprises modifying the layer of each of the plurality of models
However, Kandaswamy does not explicitly teach: based on at least one of clustering and quantization.
But Saldana teaches: based on at least one of clustering and quantization. (Saldana teaches quantizing weight value into a binary values. [0055]-[0057] and Fig. 7, 720 discloses that the quantized weight is +1 if the weight is at greater than or equal to zero; and the quantized weight is -1 otherwise. The BRI allows for selecting only one alternative from the list.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the layer of each of the plurality of models (as taught by Kandaswamy) by quantizing a weight into binary values ± 1, as taught by Saldana, with a motivation to reduce memory storage requirements and reduce memory bandwidth requirements (Saldana [0016]-[0017]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Zang (U.S. 2018/0314975) teaches ensemble transfer learning.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASHER JABLON/Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122