DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 10/22/2021 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference characters "712" and "912" have both been used to designate resource orchestrator in specification paragraph [0110], and reference characters “922” and “932” have both been used to designate job scheduler in specification paragraph [0111].  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference character “504” has been used to designate both a model M and learning rate scaling factors in Fig. 5, and reference character “932” has been used to designate both software and job scheduler in specification paragraph [0110].  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 712 in specification paragraph [0110], and 918(1)-918(N) in specification paragraph [0108].  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 1318(1)-1318(N) in Fig. 9.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claims 1-30 are objected to because of the following informalities:  
In claim 1, lines 2-3, “at least in part, on inferencing output” should read “at least in part, on an inferencing output”
In claim 1, line 3, “from a one or more second neural networks trained” should read “from one or more second neural networks trained”
In claim 2, line 2, “based at least in part on output” should read “based, at least in part, on an output”
In claim 4, line 2, “based, at least in part, on output” should read “based, at least in part, on an output”
In claim 7, line 2, “are not adjusted based on input” should read “are not adjusted based on an input”
In claim 9, lines 2-3, “based, at least in part, on inferencing output” should read “based, at least in part, on an inferencing output”
In claim 9, line 3, “from a one or more second neural networks trained” should read “from one or more second neural networks trained”
In claim 10, line 2, “based, at least in part, on output” should read “based, at least in part, on an output”
In claim 13, line 2, “based, at least in part, on output” should read “based, at least in part, on an output”
In claim 16, line 3, “based, at least in part, on inferencing output” should read “based, at least in part, on an inferencing output”
In claim 16, lines 3-4, “from a one or more second neural networks trained” should read “from one or more second neural networks trained”
In claim 17, lines 3-4, “based at least in part on output” should read “based, at least in part, on an output”
In claim 20, line 2, “based, at least in part, on input” should read “based, at least in part, on an input”
In claim 22, line 2, “are not adjusted based on input” should read “are not adjusted based on an input”
In claim 24, line 3, “based, at least in part, on inferencing output” should read “based, at least in part, on an inferencing output”
In claim 24, lines 3-4, “from a one or more second neural networks trained” should read “from one or more second neural networks trained”
In claim 26, line 2, “based at least in part on output” should read “based, at least in part, on an output”
In claim 28, line 2, “based, at least in part, on input” should read “based, at least in part, on an input”
Dependent claims 2-8 are objected to based on being directly or indirectly dependent on objected claim 1. Dependent claims 10-15 are objected to based on being directly or indirectly dependent on objected claim 9. Dependent claims 17-23 are objected to based on being directly or indirectly dependent on objected claim 16. Dependent claims 25-30 are objected to based on being directly or indirectly dependent on objected claim 24.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8, 14, 20, and 28 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 8 recites the limitation “the first one or more neural networks” in line 1. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the first one or more neural networks” has been interpreted as “the one or more first neural networks” in reference to “one or more first neural networks” in line 2 of claim 1.
Claim 8 recites the limitation “the second one or more neural networks” in lines 1-2. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the second one or more neural networks” has been interpreted as “the one or more second neural networks” in reference to “one or more second neural networks” in line 3 of claim 1.
Claim 14 recites the limitation “the output” in line 3. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the output” has been interpreted as “the inferencing output” in reference to “inferencing output” in line 3 of claim 9.
Claim 20 recites the limitation “the output” in line 1. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the output” has been interpreted as “the inferencing output” in reference to “inferencing output” in line 3 of claim 16.
Claim 28 recites the limitation “the output” in line 1. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the output” has been interpreted as “the inferencing output” in reference to “inferencing output” in line 3 of claim 24.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 16-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims do not fall within at least one of the four categories of patent eligible subject matter because the claims could be considered signal per se.
Independent claim 16 recites “machine-readable medium.” The broadest reasonable interpretation of a claim that recites "machine-readable medium," in view of the present specification, covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of machine-readable medium, particularly when the specification is silent. See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, Aug. 24, 2009; p. 2. 1351 Off. Gaz. Pat. Off. 212 (2010). Under broadest reasonable interpretation, "machine-readable medium" recited in claim 16 encompasses a transitory, propagating signal, which is not a process, machine, manufacture, or composition of matter. Nuijten, 500 F.3d at 1357. The claim "covers material not found in any of the four statutory categories [and thus] falls outside the plainly expressed scope of § 101." Id. at 1354. A recommended amendment is to recite “non-transitory machine-readable medium” (emphasis added). Dependent claims 17-23 are rejected based on same rationale as claim 16.
Claims 24-27, 29, and 30 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 24,
Claim 24 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 24 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitation:
“perform an image processing task”
As drafted, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2103.05(g)). The above limitation in the context of this claim encompasses performing an image processing task (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can perform an image processing task such as, for example, classification).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The limitation:
“one or more processors”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). The limitations:
“one or more first neural networks trained based, at least in part, on inferencing output from a one or more second neural networks trained using non-synthetic images”
As drafted, are additional elements that correspond to insignificant extra-solution activity. See MPEP 2106.05(g). Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe one or more processors for applying the abstract ideas) or insignificant extra-solution activity (i.e. first neural network trained based on the output of a second neural network). Furthermore, the “one or more first neural networks …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Wang et al. (US 2020/0134506 A1) in specification paragraph [0032]: “In the conventional method of training a student model, knowledge distillation is deployed based on a difference between output of a teacher model and output of a student model, to train a small and quick student model.” Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 25,
Claim 25 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 25 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitation:
“wherein the image processing task comprises at least one of recognition or classification”
As drafted, is part of the abstract idea of claim 24 of performing an image processing task. The limitation of claim 25 further limits the limitation of claim 24 by further defining what the image processing task comprises. The above limitation in the context of this claim encompasses performing an image processing task comprising at least one of recognition or classification (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can perform recognition or classification of an image). 
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The recitation of additional elements in claim 24 of one or more processors, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “one or more first neural networks …” amounts to no more than insignificant extra-solution activity. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe one or more processors for applying the abstract ideas) or insignificant extra-solution activity (i.e. first neural network trained based on the output of a second neural network). Furthermore, the “one or more first neural networks …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Wang et al. (US 2020/0134506 A1) in specification paragraph [0032]: “In the conventional method of training a student model, knowledge distillation is deployed based on a difference between output of a teacher model and output of a student model, to train a small and quick student model.” Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 26,
Claim 26 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 26 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitation:
“compute an adjustment to a learning rate of the one or more first neural networks, based at least in part on output of the one or more second neural networks”
As drafted, under its broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2103.05(g)). The above limitation in the context of this claim encompasses computing an adjustment for a learning rate of the first neural network based in part on the output of the second neural network (corresponds to mathematical calculation).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The limitation:
“the one or more processors”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Furthermore, the recitation of additional elements in claim 24 of one or more processors, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “one or more first neural networks …” amounts to no more than insignificant extra-solution activity. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe one or more processors for applying the abstract ideas) or insignificant extra-solution activity (i.e. first neural network trained based on the output of a second neural network). Furthermore, the “one or more first neural networks …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Wang et al. (US 2020/0134506 A1) in specification paragraph [0032]: “In the conventional method of training a student model, knowledge distillation is deployed based on a difference between output of a teacher model and output of a student model, to train a small and quick student model.” Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 27,
Claim 27 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 27 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitation:
“wherein the adjustment to the learning rate is calculated for a region of the one or more first neural networks, the region comprising layers grouped based, at least in part, on input resolution”
As drafted, is part of the abstract idea of claim 27 of computing an adjustment to the learning rate. The limitation of claim 27 further limits the limitation of claim 26 by further defining that the adjustment to the learning rate is calculated for a region of the neural network. The above limitation in the context of this claim encompasses computing an adjustment for a learning rate of a region of the first neural network based in part on the output of the second neural network, wherein the region comprises layers grouped based on input resolution (corresponds to mathematical calculation).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The recitation of additional elements in claim 26 of one or more processors, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “one or more first neural networks …” amounts to no more than insignificant extra-solution activity. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe one or more processors for applying the abstract ideas) or insignificant extra-solution activity (i.e. first neural network trained based on the output of a second neural network). Furthermore, the “one or more first neural networks …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Wang et al. (US 2020/0134506 A1) in specification paragraph [0032]: “In the conventional method of training a student model, knowledge distillation is deployed based on a difference between output of a teacher model and output of a student model, to train a small and quick student model.” Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 29,
Claim 29 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 29 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Please see the analysis of claim 24. The limitation of claim 29 is only an additional element to the abstract ideas of claim 24. 
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The limitation:
“wherein the one or more first neural networks and one or more second neural networks are each trained to perform the image processing task”
As drafted, is an additional element that is part of the insignificant extra-solution activity of claim 24. The limitation of claim 28 further limits the limitation of claim 24 by further defining what first and second neural networks were trained for. Furthermore, the recitation of additional elements in claim 24 of one or more processors, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “one or more first neural networks …” amounts to no more than insignificant extra-solution activity. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe one or more processors for applying the abstract ideas) or insignificant extra-solution activity (i.e. the first and second neural networks trained to perform an image processing task). Furthermore, the “wherein the one or more first neural networks and one or more second neural networks are each trained to perform the image processing task” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Meng et al. (US 2020/0334538 A1) in specification paragraph [0021]: “T/S [teacher/student] learning has been widely applied to a variety of deep learning tasks in speech, language and image processing including model compression, domain adaptation, small-footprint Natural Machine Translation (“NMT”), low-resource NMT, far-field ASR, low resource language ASR neural network pre-training, etc. T/S learning falls in the category of transfer learning, where the network of interest, as a student, is trained by mimicking the behavior of a well-trained network, as a teacher, in the presence of the same or stereo training samples.” (This teaches that it is widely adapted to use a teacher neural network to train a student neural network to have the same behavior (i.e. output), where both are used for performing the same image processing task). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 30,
Claim 30 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 30 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Please see the analysis of claim 24. The limitation of claim 28 is only an additional element to the abstract ideas of claim 24. 
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The limitation:
“wherein the one or more second neural networks are trained only on real images”
As drafted, is an additional element that is part of the insignificant extra-solution activity of claim 24. The limitation of claim 28 further limits the limitation of claim 24 by further defining the non-synthetic images used when the one or more second neural networks were trained. Furthermore, the recitation of additional elements in claim 24 of one or more processors, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “one or more first neural networks …” amounts to no more than insignificant extra-solution activity. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe one or more processors for applying the abstract ideas) or insignificant extra-solution activity (i.e. the second neural networks being trained on only real images). Furthermore, the “wherein the one or more second neural networks are trained only on real images” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Sridhar et al. (US 2021/0279595 A1) in specification paragraphs [0080]: “With reference to FIG. 2A, in a first training or pre-training step, the teacher network 102 is trained conventionally using a training dataset that includes a plurality of training samples, each training sample being labelled training data. The training sample in the training dataset is provided as input data 106” and [0050]: “the input data 106 comprises image data” (This teaches that a teacher network (such as in teacher/student learning or knowledge distillation) is trained conventionally by using real image data). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 6, 9, 16, 21, 23, 24, 25, 29, and 30 are rejected under both 35 U.S.C. 102(a)(1) and 35 U.S.C. 102(a)(2) as being anticipated by Li et al. (US 2016/0078339 A1).
Regarding Claim 1,
	Li et al. teaches a processor (Fig. 1; [0020]: "Turning now to FIG. 1, a block diagram is provided showing aspects of one example of a system architecture suitable for implementing an embodiment of the invention and designated generally as system 100… Various functions described herein as being performed by one or more entities may be carried out by hardware, firmware, and/or software. For instance, various functions or services may be carried out by a processor executing instructions stored in memory" teaches a system comprising a DNN model generator, where the system may be implemented by a processor), comprising: 
one or more circuits to train one or more first neural networks based, at least in part, on inferencing output from a one or more second neural networks trained using non-synthetic images (Fig. 1; [0028]: "DNN model generator 120 and its components 122, 124, 126, and 128 may be embodied as a set of compiled computer instructions or functions, program modules, computer software services, or an arrangement of processes carried out on one or more computer systems … generator 120 and/or the embodiments of the invention described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc." teaches that the DNN model generator may be implemented by hardware components consisting of one or more circuits. Fig. 1; Fig. 3; [0034]: "Training component 126 facilitate the learning of the student DNN through an iterative process with evaluation component 128, that provides the same un-labeled data to the teacher and student DNN models, evaluates the output distributions of the DNN models to determine the error of the student DNN's output distribution from the teacher's, performs back propagation on the student DNN model based on the error to update the student DNN model, and repeats this cycle until the output distributions converge" teaches that learning (training) of a student DNN model (first neural network) is based on the output of the teacher DNN model (second neural network). [0022]: "the data includes one or more phone sets (sounds) and may also include corresponding transcription information or senone labels that may be used for initializing the teacher DNN model … Other examples of data sources may include by way of example and not limitation … captured images (e.g. depth camera images)" teaches that the teacher DNN model (second neural network) is pre-trained using captured images (non-synthetic images)).
Regarding Claim 6,
	Li et al. teaches the processor of claim 1.
	In addition, Li et al. further teaches wherein the one or more second neural networks are trained to perform an image processing task equivalent to an image processing task performed by the one or more first neural networks ([0005]: "To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used to train the smaller “student” DNN … the student DNN approaches the behavior of the teacher, so that whatever the output of the teacher, the student will approximate, even where the teacher may be wrong. An embodiment of the invention is thus particularly suitable for providing accurate signal processing applications (e.g. ASR or image processing)" teaches that the teacher model (second neural network) and the student model (first neural network) are trained for the same signal processing application, such as image processing (i.e. the teacher and student with result in the same output for a given input)).
Regarding Claim 9,
Li et al. teaches a system (Fig. 1; [0020]: "Turning now to FIG. 1, a block diagram is provided showing aspects of one example of a system architecture suitable for implementing an embodiment of the invention and designated generally as system 100" teaches a system comprising a DNN model generator), comprising: 
one or more processors to train one or more first neural networks based, at least in part, on inferencing output from a one or more second neural networks trained using non-synthetic images (Fig. 1; [0020]: "Various functions described herein as being performed by one or more entities may be carried out by hardware, firmware, and/or software. For instance, various functions or services may be carried out by a processor executing instructions stored in memory" teaches that the system may be implemented by a processor. Fig. 1; [0028]: "DNN model generator 120 and its components 122, 124, 126, and 128 may be embodied as a set of compiled computer instructions or functions, program modules, computer software services, or an arrangement of processes carried out on one or more computer systems … generator 120 and/or the embodiments of the invention described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc." teaches that the DNN model generator may be implemented by hardware components consisting of one or more circuits. Fig. 1; Fig. 3; [0034]: "Training component 126 facilitate the learning of the student DNN through an iterative process with evaluation component 128, that provides the same un-labeled data to the teacher and student DNN models, evaluates the output distributions of the DNN models to determine the error of the student DNN's output distribution from the teacher's, performs back propagation on the student DNN model based on the error to update the student DNN model, and repeats this cycle until the output distributions converge" teaches that learning (training) of a student DNN model (first neural network) is based on the output of the teacher DNN model (second neural network). [0022]: "the data includes one or more phone sets (sounds) and may also include corresponding transcription information or senone labels that may be used for initializing the teacher DNN model … Other examples of data sources may include by way of example and not limitation … captured images (e.g. depth camera images)" teaches that the teacher DNN model (second neural network) is pre-trained using captured images (non-synthetic images)).
Regarding Claim 16,
	Li et al. teaches a machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors ([0086]: "an embodiment of the invention is directed to one or more computer-readable media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the computing system to perform a method for generating a DNN classifier for deployment on a computing device" teaches a computer-readable media (machine readable medium) having instructions for execution by a processor), cause the one or more processors to at least: 
train one or more first neural networks based, at least in part, on inferencing output from a one or more second neural networks trained using non-synthetic images ([0086]: "The method also includes, for a number of iterations: (a) using a subset of the set of training data, determine a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model; (b) determine an evaluation of the student output distribution vs. the teacher output distribution; and (c) based on the evaluation, update the student DNN model" teaches that the instructions cause the processor to update (train) a student DNN model (first neural network) based on the output of a teacher DNN model (second neural network). [0022]: "the data includes one or more phone sets (sounds) and may also include corresponding transcription information or senone labels that may be used for initializing the teacher DNN model … Other examples of data sources may include by way of example and not limitation … captured images (e.g. depth camera images)" teaches that the teacher DNN model (second neural network) is pre-trained using captured images (non-synthetic images)).
Regarding Claim 21,
Li et al. teaches the machine-readable medium of claim 16.
In addition, Li et al. further teaches wherein the one or more second neural networks are trained to perform an image processing task equivalent to an image processing task performed by the one or more first neural networks ([0005]: "To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used to train the smaller “student” DNN … the student DNN approaches the behavior of the teacher, so that whatever the output of the teacher, the student will approximate, even where the teacher may be wrong. An embodiment of the invention is thus particularly suitable for providing accurate signal processing applications (e.g. ASR or image processing)" teaches that the teacher model (second neural network) and the student model (first neural network) are trained for the same signal processing application, such as image processing (i.e. the teacher and student with result in the same output for a given input)).
Regarding Claim 23,
Li et al. teaches the machine-readable medium of claim 16.
In addition, Li et al. further teaches wherein the one or more first neural networks, once trained, are usable to perform an inferencing task independently of the one or more second neural networks (Fig. 5; [0067]: "For example, the trained student DNN may be deployed on a smart phone or smart glasses. Based on the teacher DNN model and the training data, the trained student DNN may be specialized for a specific application (e.g. image processing or ASR)" teaches that once the student DNN (first neural network is trained) is trained, it may be deployed as a DNN classifier on a computing device (i.e. independently of the teacher DNN (second neural network))).
Regarding Claim 24,
	Li et al. teaches a computing device (Fig. 7; [0078]: "With reference to FIG. 7, an exemplary computing device is provided and referred to generally as computing device 700" teaches a computing device 700), comprising: 
one or more processors to perform an image processing task based at least in part on one or more first neural networks trained based, at least in part, on inferencing output from a one or more second neural networks trained using non-synthetic images (Fig. 7; [0080]: "With reference to FIG. 7, computing device 700 includes a bus 710 that directly or indirectly couples the following devices: memory 712, one or more processors 714, one or more presentation components 716, one or more input/output (I/O) ports 618, one or more I/O components 720, and an illustrative power supply 722" teaches that computing device 700 comprises one or more processors 714. Fig. 1; Fig. 7; [0028]: "DNN model generator 120 and its components 122, 124, 126, and 128 may be embodied as a set of compiled computer instructions or functions, program modules, computer software services, or an arrangement of processes carried out on one or more computer systems, such as computing device 700, described in connection to FIG. 7" teaches that the computing device 700 may be used for implementing the DNN model generator 120 of Fig. 1. [0076]: "Accordingly, we have described various aspects of technology directed to systems and methods for providing a more accurate DNN classifier of reduced size for deployment on computing devices by “learning” the deployed DNN from a teacher DNN with larger capacity (number of hidden nodes). The DNN classifiers trained in according to some embodiments of the invention, are particularly suitable for providing accurate signal processing applications (e.g. ASR or image processing)" teaches that the student DNN model (first neural network) is used for signal processing applications, including image processing. Fig. 1; Fig. 3; [0034]: "Training component 126 facilitate the learning of the student DNN through an iterative process with evaluation component 128, that provides the same un-labeled data to the teacher and student DNN models, evaluates the output distributions of the DNN models to determine the error of the student DNN's output distribution from the teacher's, performs back propagation on the student DNN model based on the error to update the student DNN model, and repeats this cycle until the output distributions converge" teaches that learning (training) of a student DNN model (first neural network) is based on the output of the teacher DNN model (second neural network). [0022]: "the data includes one or more phone sets (sounds) and may also include corresponding transcription information or senone labels that may be used for initializing the teacher DNN model … Other examples of data sources may include by way of example and not limitation … captured images (e.g. depth camera images)" teaches that the teacher DNN model (second neural network) is pre-trained using captured images (non-synthetic images)).
Regarding Claim 25,
Li et al. teaches the computing device of claim 24.
In addition, Li et al. further teaches wherein the image processing task comprises at least one of recognition or classification ([0049]: "Once trained, the student DNN model may be deployed as a classifier on a computer system" teaches that the trained student DNN model (first neural network) may be deployed as a classifier on a computer system, meaning that the computer system performs classification based on the model).
Regarding Claim 29,
Li et al. teaches the computing device of claim 24.
In addition, Li et al. further teaches wherein the one or more first neural networks and one or more second neural networks are each trained to perform the image processing task ([0005]: "To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used to train the smaller “student” DNN … the student DNN approaches the behavior of the teacher, so that whatever the output of the teacher, the student will approximate, even where the teacher may be wrong. An embodiment of the invention is thus particularly suitable for providing accurate signal processing applications (e.g. ASR or image processing)" teaches that the teacher model (second neural network) and the student model (first neural network) are trained for the same signal processing application, such as image processing (i.e. the teacher and student with result in the same output for a given input)).
Regarding Claim 30,
Li et al. teaches the computing device of claim 29.
In addition, Li et al. further teaches wherein the one or more second neural networks are trained only on real images ([0022]: "the data includes one or more phone sets (sounds) and may also include corresponding transcription information or senone labels that may be used for initializing the teacher DNN model … Other examples of data sources may include by way of example and not limitation … captured images (e.g. depth camera images)" teaches that the teacher DNN model (second neural network) is pre-trained using captured images (i.e. real image and not synthetic images)).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 8, 10, 11, 14, 17, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 2016/0078339 A1) in view of Li and Hoeim (“Learning without Forgetting”), hereinafter Hoeim et al.
Regarding Claim 2,
	Li et al. teaches the processor of claim 1.
	Li et al. does not appear to explicitly teach wherein a learning rate of the one or more first neural networks is adjusted, during training, based at least in part on output of the one or more second neural networks.
	However, Hoeim et al. teaches wherein a learning rate of the one or more first neural networks is adjusted, during training, based at least in part on output of the one or more second neural networks (Section 3.1, fourth paragraph: "We lower the learning rate once by 10x at the epoch when the held out accuracy plateaus" teaches that the learning rate is adjusted during training based on accuracy (i.e. based on the accuracy of the output when compared to the output of the original (second) neural network)).
	Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein a learning rate of the one or more first neural networks is adjusted, during training, based at least in part on output of the one or more second neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
Regarding Claim 8,
Li et al. teaches the processor of claim 1.
	Li et al. does not appear to explicitly teach wherein the first one or more neural networks and the second one or more neural networks have equivalent structures.
	However, Hoeim et al. teaches wherein the first one or more neural networks and the second one or more neural networks have equivalent structures (Fig. 2; Section 1, second paragraph: "In our setting, a CNN has a set of shared parameters θs (e.g., five convolutional layers and two fully connected layers for AlexNet [3] architecture), task-specific parameters for previously learned tasks θo (e.g., the output layer for ImageNet [4] classification and corresponding weights), and randomly initialized task-specific parameters for new tasks θn (e.g., scene classifiers)" teaches that the new (first) neural network and the original (second) neural network have an equivalent structure of 5 convolutional layers and 2 fully connected layers with shared parameters and a task specific output layer).
	Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the first one or more neural networks and the second one or more neural networks have equivalent structures as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
Regarding Claim 10,
Li et al. teaches the system of claim 9.
	Li et al. does not appear to explicitly teach wherein a learning rate of the one or more first neural networks is adjusted based, at least in part, on output of the one or more second neural networks.
	However, Hoeim et al. teaches wherein a learning rate of the one or more first neural networks is adjusted based, at least in part, on output of the one or more second neural networks (Section 3.1, fourth paragraph: "We lower the learning rate once by 10x at the epoch when the held out accuracy plateaus" teaches that the learning rate is adjusted during training based on accuracy (i.e. based on the accuracy of the output when compared to the output of the original (second) neural network)).
	Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein a learning rate of the one or more first neural networks is adjusted based, at least in part, on output of the one or more second neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
Regarding Claim 11,
Li et al. in view of Hoeim et al. teaches the system of claim 10.
	In addition, Hoeim et al. further teaches the one or more processors to adjust the learning rate for a region of the one or more first neural networks (Section 2.1, second paragraph: "As mentioned in Section 1, a small learning rate is often used, and sometimes part of the network is frozen to prevent overfitting" teaches that during training, part (a region) of the neural network is frozen to prevent overfitting (i.e. the learning rate for that region is changed to 0)).
	Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the one or more processors to adjust the learning rate for a region of the one or more first neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
Regarding Claim 14,
Li et al. teaches the system of claim 9.
	Li et al. does not appear to explicitly teach the one or more processors to compute a learning rate scaling factor based, at least in part, on a divergence factor, the divergence factor computed based, at least in part, on the output of the one or more second neural networks.
	However, Hoeim et al. teaches the one or more processors to compute a learning rate scaling factor based, at least in part, on a divergence factor (Section 3.1, fourth paragraph: "We lower the learning rate once by 10x at the epoch when the held out accuracy plateaus" teaches that the learning rate is adjusted during training based on accuracy (divergence factor) (i.e. based on the accuracy of the output when compared to the output of the original (second) neural network)), 
the divergence factor computed based, at least in part, on the output of the one or more second neural networks (Section 2.2, first paragraph: "The smaller network is trained using a modified cross-entropy loss (further described in Section 3) that encourages both large and small responses of the original and new network to be similar" teaches that the loss (accuracy) is based on the similarity of the output to the output of the original (second) network).
	Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the one or more processors to compute a learning rate scaling factor based, at least in part, on a divergence factor, the divergence factor computed based, at least in part, on the output of the one or more second neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
Regarding Claim 17,
Li et al. teaches the machine-readable medium of claim 16.
	Li et al. does not appear to explicitly teach compute an adjustment to a learning rate of the one or more first neural networks, based at least in part on output of the one or more second neural networks.
	However, Hoeim et al. teaches compute an adjustment to a learning rate of the one or more first neural networks, based at least in part on output of the one or more second neural networks (Section 3.1, fourth paragraph: "We lower the learning rate once by 10x at the epoch when the held out accuracy plateaus" teaches that the learning rate is adjusted during training based on accuracy (i.e. based on the accuracy of the output when compared to the output of the original (second) neural network)).
	Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate compute an adjustment to a learning rate of the one or more first neural networks, based at least in part on output of the one or more second neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
Regarding Claim 26,
Li et al. teaches the computing device of claim 24.
	Li et al. does not appear to explicitly teach the one or more processors to compute an adjustment to a learning rate of the one or more first neural networks, based at least in part on output of the one or more second neural networks.
	However, Hoeim et al. teaches the one or more processors to compute an adjustment to a learning rate of the one or more first neural networks, based at least in part on output of the one or more second neural networks (Section 3.1, second paragraph: "We lower the learning rate once by 10x at the epoch when the held out accuracy plateaus" teaches that the learning rate is adjusted during training based on accuracy (i.e. based on the accuracy of the output when compared to the output of the original (second) neural network)).
	Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the one or more processors to compute an adjustment to a learning rate of the one or more first neural networks, based at least in part on output of the one or more second neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).

Claims 3, 12, 18, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 2016/0078339 A1) in view of Li and Hoeim (“Learning without Forgetting”), hereinafter Hoeim et al., and further in view of Han (US 2019/0030371 A1).
Regarding Claim 3,
	Li et al. in view of Hoeim et al. teaches the processor of claim 2.
	In addition, Hoeim et al. further teaches wherein the learning rate adjusted for a region of the one or more first neural networks (Section 2.1, second paragraph: "As mentioned in Section 1, a small learning rate is often used, and sometimes part of the network is frozen to prevent overfitting" teaches that during training, part (a region) of the neural network is frozen to prevent overfitting (i.e. the learning rate for that region is changed to 0)).
Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the learning rate adjusted for a region of the one or more first neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
	Li et al. in view of Hoeim et al. does not appear to explicitly teach the region comprising layers grouped based, at least in part, on input resolution.
	However, Han teaches the region comprising layers grouped based, at least in part, on input resolution (Fig. 6; [0085]: "The DCNN 600A of FIG. 6A may include five different resolution layers, such as a first downsampled layer 628A having an output size of 160×160×128 channels, a second downsampled layer 628B having an output size of 80×80×256 channels, a third downsampled layer 628C having an output size of 40×40×512 layers, and a bottom layer 629A having an output size of 20×20×512 channels" teaches that a neural network may comprise regions of layers grouped based on imaging resolution).
Li et al., Hoeim et al., and Han are analogous to the claimed invention because they are directed to training a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the region comprising layers grouped based, at least in part, on input resolution as taught by Han to the disclosed invention of Li et al. in view of Hoeim et al.
	One of ordinary skill in the art would have been motivated to make this modification to "allow the complete segmentation of an entire (e.g. 2D) image in a single pass instead of classifying the center pixel of a small image patch each time. In addition to being more efficient, using an entire image as input may offer much richer contextual information than a small image patch, which may lead to more reliable and more accurate segmentation results" (Han [0083]).
Regarding Claim 12,
Li et al. in view of Hoeim et al. teaches the system of claim 11.
	Li et al. in view of Hoeim et al. does not appear to explicitly teach wherein the region comprises layers grouped according to input resolution.
	However, Han teaches wherein the region comprises layers grouped according to input resolution (Fig. 6; [0085]: "The DCNN 600A of FIG. 6A may include five different resolution layers, such as a first downsampled layer 628A having an output size of 160×160×128 channels, a second downsampled layer 628B having an output size of 80×80×256 channels, a third downsampled layer 628C having an output size of 40×40×512 layers, and a bottom layer 629A having an output size of 20×20×512 channels" teaches that a neural network may comprise regions of layers grouped based on imaging resolution).
Li et al., Hoeim et al., and Han are analogous to the claimed invention because they are directed to training a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the region comprises layers grouped according to input resolution as taught by Han to the disclosed invention of Li et al. in view of Hoeim et al.
	One of ordinary skill in the art would have been motivated to make this modification to "allow the complete segmentation of an entire (e.g. 2D) image in a single pass instead of classifying the center pixel of a small image patch each time. In addition to being more efficient, using an entire image as input may offer much richer contextual information than a small image patch, which may lead to more reliable and more accurate segmentation results" (Han [0083]).
Regarding Claim 18,
Li et al. in view of Hoeim et al. teaches the machine-readable medium of claim 17.
	In addition, Hoeim et al. further teaches wherein the adjustment to the learning rate is calculated for a region of the one or more first neural networks (Section 2.1, second paragraph: "As mentioned in Section 1, a small learning rate is often used, and sometimes part of the network is frozen to prevent overfitting" teaches that during training, part (a region) of the neural network is frozen to prevent overfitting (i.e. the learning rate for that region is changed to 0)).
Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the adjustment to the learning rate is calculated for a region of the one or more first neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
	Li et al. in view of Hoeim et al. does not appear to explicitly teach the region comprising layers grouped based, at least in part, on input resolution.
	However, Han teaches the region comprising layers grouped based, at least in part, on input resolution (Fig. 6; [0085]: "The DCNN 600A of FIG. 6A may include five different resolution layers, such as a first downsampled layer 628A having an output size of 160×160×128 channels, a second downsampled layer 628B having an output size of 80×80×256 channels, a third downsampled layer 628C having an output size of 40×40×512 layers, and a bottom layer 629A having an output size of 20×20×512 channels" teaches that a neural network may comprise regions of layers grouped based on imaging resolution).
Li et al., Hoeim et al., and Han are analogous to the claimed invention because they are directed to training a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the region comprising layers grouped based, at least in part, on input resolution as taught by Han to the disclosed invention of Li et al. in view of Hoeim et al.
	One of ordinary skill in the art would have been motivated to make this modification to "allow the complete segmentation of an entire (e.g. 2D) image in a single pass instead of classifying the center pixel of a small image patch each time. In addition to being more efficient, using an entire image as input may offer much richer contextual information than a small image patch, which may lead to more reliable and more accurate segmentation results" (Han [0083]).
Regarding Claim 27,
Li et al. in view of Hoeim et al. teaches the computing device of claim 26.
	In addition, Hoeim et al. further teaches wherein the adjustment to the learning rate is calculated for a region of the one or more first neural networks (Section 2.1, second paragraph: "As mentioned in Section 1, a small learning rate is often used, and sometimes part of the network is frozen to prevent overfitting" teaches that during training, part (a region) of the neural network is frozen to prevent overfitting (i.e. the learning rate for that region is changed to 0)).
Li et al. and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the adjustment to the learning rate is calculated for a region of the one or more first neural networks as taught by Hoeim et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).
	Li et al. in view of Hoeim et al. does not appear to explicitly teach the region comprising layers grouped based, at least in part, on input resolution.
	However, Han teaches the region comprising layers grouped based, at least in part, on input resolution (Fig. 6; [0085]: "The DCNN 600A of FIG. 6A may include five different resolution layers, such as a first downsampled layer 628A having an output size of 160×160×128 channels, a second downsampled layer 628B having an output size of 80×80×256 channels, a third downsampled layer 628C having an output size of 40×40×512 layers, and a bottom layer 629A having an output size of 20×20×512 channels" teaches that a neural network may comprise regions of layers grouped based on imaging resolution).
Li et al., Hoeim et al., and Han are analogous to the claimed invention because they are directed to training a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the region comprising layers grouped based, at least in part, on input resolution as taught by Han to the disclosed invention of Li et al. in view of Hoeim et al.
	One of ordinary skill in the art would have been motivated to make this modification to "allow the complete segmentation of an entire (e.g. 2D) image in a single pass instead of classifying the center pixel of a small image patch each time. In addition to being more efficient, using an entire image as input may offer much richer contextual information than a small image patch, which may lead to more reliable and more accurate segmentation results" (Han [0083]).

Claims 4, 7, 13, 15, 20, 22, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 2016/0078339 A1) in view of Bhardwaj et al. ("Dream Distillation: A Data-Independent Model Compression Framework").
Regarding Claim 4,
	Li et al. teaches the processor of claim 1.
	Li et al. does not appear to explicitly teach the one or more circuits to adjust training of the one or more first neural networks based, at least in part, on output of the one or more second neural networks in response to an input based on synthetic image data.
	However, Bhardwaj et al. teaches the one or more circuits to adjust training of the one or more first neural networks based, at least in part, on output of the one or more second neural networks in response to an input based on synthetic image data (Section 1, last paragraph: "We then use these synthetic images for KD [Knowledge Distillation] … This effective transfer of knowledge via synthetic data can make the student model learn characteristics about original classification problem without actually training on any real data" teaches that synthetic images are used as input for knowledge distillation (i.e. training of the student (first) network based on the teacher (second network) is adjusted based on synthetic image data)).
Li et al. and Bhardwaj et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the one or more circuits to adjust training of the one or more first neural networks based, at least in part, on output of the one or more second neural networks in response to an input based on synthetic image data as taught by Bhardwaj et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to transfer significant knowledge about the real data without accessing any real or alternate datasets" in order to "greatly increase the scale of deep learning at the edge since industries can quickly deploy models without the need for proprietary datasets" (Bhardwaj et al. Section 4, last paragraph).
Regarding Claim 7,
Li et al. teaches the processor of claim 1.
	Li et al. does not appear to explicitly teach wherein parameters of the one or more second neural networks are not adjusted based on input derived from synthetic images.
	However, Bhardwaj et al. teaches wherein parameters of the one or more second neural networks are not adjusted based on input derived from synthetic images (Section 1, first paragraph: "On the other hand, KD [Knowledge Distillation] trains a significantly smaller student model to mimic the outputs of a large pretrained teacher model" teaches that in knowledge distillation, the teacher model (second neural network) is a pretrained model based on an original dataset. Section 3.2, second paragraph: "Finally, these synthetic images are used to train the student via KD [Knowledge Distillation]" teaches that the input synthetic images are only used to train the student model (first neural network) in knowledge distillation (i.e. the teacher (second neural network) is not trained (adjusted) based on the synthetic images)).
Li et al. and Bhardwaj et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein parameters of the one or more second neural networks are not adjusted based on input derived from synthetic images as taught by Bhardwaj et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to transfer significant knowledge about the real data without accessing any real or alternate datasets" in order to "greatly increase the scale of deep learning at the edge since industries can quickly deploy models without the need for proprietary datasets" (Bhardwaj et al. Section 4, last paragraph).
Regarding Claim 13,
Li et al. teaches the system of claim 9.
	Li et al. does not appear to explicitly teach the one or more processors to adjust training of the one or more first neural networks based, at least in part, on output of the one or more second neural networks in response to an input based on synthetic image data.
	However, Bhardwaj et al. teaches the one or more processors to adjust training of the one or more first neural networks based, at least in part, on output of the one or more second neural networks in response to an input based on synthetic image data (Section 1, last paragraph: "We then use these synthetic images for KD [Knowledge Distillation] … This effective transfer of knowledge via synthetic data can make the student model learn characteristics about original classification problem without actually training on any real data" teaches that synthetic images are used as input for knowledge distillation (i.e. training of the student (first) network based on the teacher (second network) is adjusted based on synthetic image data)).
Li et al. and Bhardwaj et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the one or more processors to adjust training of the one or more first neural networks based, at least in part, on output of the one or more second neural networks in response to an input based on synthetic image data as taught by Bhardwaj et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to transfer significant knowledge about the real data without accessing any real or alternate datasets" in order to "greatly increase the scale of deep learning at the edge since industries can quickly deploy models without the need for proprietary datasets" (Bhardwaj et al. Section 4, last paragraph).
Regarding Claim 15,
Li et al. teaches the system of claim 9.
	Li et al. does not appear to explicitly teach wherein the one or more second neural networks are frozen during training of the one or more first neural networks.
	However, Bhardwaj et al. teaches wherein the one or more second neural networks are frozen during training of the one or more first neural networks (Section 1, first paragraph: "On the other hand, KD [Knowledge Distillation] trains a significantly smaller student model to mimic the outputs of a large pretrained teacher model" teaches that in knowledge distillation, the teacher model (second neural network) is a pretrained model based on an original dataset. Section 3.2, second paragraph: "Finally, these synthetic images are used to train the student via KD [Knowledge Distillation]" teaches that the input synthetic images are only used to train the student model (first neural network) in knowledge distillation (i.e. the teacher (second neural network) is not trained (is frozen) during training of the student (first neural network))).
Li et al. and Bhardwaj et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the one or more second neural networks are frozen during training of the one or more first neural networks as taught by Bhardwaj et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to transfer significant knowledge about the real data without accessing any real or alternate datasets" in order to "greatly increase the scale of deep learning at the edge since industries can quickly deploy models without the need for proprietary datasets" (Bhardwaj et al. Section 4, last paragraph).
Regarding Claim 20,
Li et al. teaches the machine-readable medium of claim 16.
	Li et al. does not appear to explicitly teach wherein the output of the one or more second neural networks is based, at least in part, on input based on synthetic image data.
	However, Bhardwaj et al. teaches wherein the output of the one or more second neural networks is based, at least in part, on input based on synthetic image data (Fig. 1; Section 1, last paragraph: "We then use these synthetic images for KD [Knowledge Distillation] … This effective transfer of knowledge via synthetic data can make the student model learn characteristics about original classification problem without actually training on any real data" teaches that synthetic images are used for knowledge distillation between the teacher (second neural network) and student (first neural network), meaning that the output of the teacher (second neural network) is based on the input synthetic images during the knowledge distillation process).
Li et al. and Bhardwaj et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the output of the one or more second neural networks is based, at least in part, on input based on synthetic image data as taught by Bhardwaj et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to transfer significant knowledge about the real data without accessing any real or alternate datasets" in order to "greatly increase the scale of deep learning at the edge since industries can quickly deploy models without the need for proprietary datasets" (Bhardwaj et al. Section 4, last paragraph).
Regarding Claim 22,
Li et al. teaches the machine-readable medium of claim 16.
	Li et al. does not appear to explicitly teach wherein parameters of the one or more second neural networks are not adjusted based on input derived from synthetic images.
	However, Bhardwaj et al. teaches wherein parameters of the one or more second neural networks are not adjusted based on input derived from synthetic images (Section 1, first paragraph: "On the other hand, KD [Knowledge Distillation] trains a significantly smaller student model to mimic the outputs of a large pretrained teacher model" teaches that in knowledge distillation, the teacher model (second neural network) is a pretrained model based on an original dataset. Section 3.2, second paragraph: "Finally, these synthetic images are used to train the student via KD [Knowledge Distillation]" teaches that the input synthetic images are only used to train the student model (first neural network) in knowledge distillation (i.e. the teacher (second neural network) is not trained (adjusted) based on the synthetic images)).
Li et al. and Bhardwaj et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein parameters of the one or more second neural networks are not adjusted based on input derived from synthetic images as taught by Bhardwaj et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to transfer significant knowledge about the real data without accessing any real or alternate datasets" in order to "greatly increase the scale of deep learning at the edge since industries can quickly deploy models without the need for proprietary datasets" (Bhardwaj et al. Section 4, last paragraph).
Regarding Claim 28,
Li et al. teaches the computing device of claim 24.
	Li et al. does not appear to explicitly teach wherein the output of the one or more second neural networks is based, at least in part, on input based on synthetic image data.
	However, Bhardwaj et al. teaches wherein the output of the one or more second neural networks is based, at least in part, on input based on synthetic image data (Fig. 1; Section 1, last paragraph: "We then use these synthetic images for KD [Knowledge Distillation] … This effective transfer of knowledge via synthetic data can make the student model learn characteristics about original classification problem without actually training on any real data" teaches that synthetic images are used for knowledge distillation between the teacher (second neural network) and student (first neural network), meaning that the output of the teacher (second neural network) is based on the input synthetic images during the knowledge distillation process).
Li et al. and Bhardwaj et al. are analogous to the claimed invention because they are directed to training a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the output of the one or more second neural networks is based, at least in part, on input based on synthetic image data as taught by Bhardwaj et al. to the disclosed invention of Li et al.
	One of ordinary skill in the art would have been motivated to make this modification to transfer significant knowledge about the real data without accessing any real or alternate datasets" in order to "greatly increase the scale of deep learning at the edge since industries can quickly deploy models without the need for proprietary datasets" (Bhardwaj et al. Section 4, last paragraph).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 2016/0078339 A1) in view of Bhardwaj et al. ("Dream Distillation: A Data-Independent Model Compression Framework") and further in view of Li and Hoeim (“Learning without Forgetting”), hereinafter Hoeim et al.
Regarding Claim 5,
	Li et al. in view of Bhardwaj et al. teaches the processor of claim 4.
	Li et al. in view of Bhardwaj et al. does not appear to explicitly teach the one or more circuits to compute a learning rate scaling factor and adjust a learning rate of the one or more first neural networks based, at least in part, on the learning rate scaling factor.
	However, Hoeim et al. teaches the one or more circuits to compute a learning rate scaling factor and adjust a learning rate of the one or more first neural networks based, at least in part, on the learning rate scaling factor (Section 3.1, second paragraph: "When training networks, we follow the standard practices for fine-tuning existing networks. The selection of hyperparameters, mainly the number of epochs, warm-up period, and the learning rate schedule, are chosen using the new task performance on a held-out set" teaches that the learning rate schedule (i.e. when and how much the learning rate changes) are determined based on new task performance (i.e. based on output accuracy for example), meaning that the learning rate is adjusted based on performance. Section 3.1, fourth paragraph: "We lower the learning rate once by 10x at the epoch when the held out accuracy plateaus" teaches that the learning rate is adjusted during training by a factor (scaling factor) of 10 when accuracy plateaus (i.e. scaling factor is based on performance)).
Li et al., Bhardwaj et al., and Hoeim et al. are analogous to the claimed invention because they are directed to training a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the one or more circuits to compute a learning rate scaling factor and adjust a learning rate of the one or more first neural networks based, at least in part, on the learning rate scaling factor as taught by Hoeim et al. to the disclosed invention of Li et al. in view of Bhardwaj et al.
One of ordinary skill in the art would have been motivated to make this modification to "optimize both for high accuracy for the new task and for preservation of responses on the existing tasks from the original network" (Hoeim et al. Section 1, ninth paragraph).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 2016/0078339 A1) in view of Li and Hoeim (“Learning without Forgetting”), hereinafter Hoeim et al., in view of Han (US 2019/0030371 A1), and further in view of Meng et al. ("Conditional Teacher-student Learning").
Regarding Claim 19,
	Li et al. in view of Hoeim et al. and further in view of Han teaches the machine-readable medium of claim 18.
	Li et al. in view of Hoeim et al. and further in view of Han does not appear to explicitly teach wherein the adjustment to the learning rate is calculated based, at least in part, by a long short-term memory ("LSTM") module.
	However, Meng et al. teaches wherein the adjustment to the learning rate is calculated based, at least in part, by a long short-term memory ("LSTM") module (Section 5.1, second paragraph: "As a source-domain acoustic model, a clean long short-term memory (LSTM)- recurrent neural networks (RNN) [37, 38, 39] is trained … The clean LSTM acoustic model serves as the teacher network in the subsequent T/S learning methods" teaches that a LSTM can be used to implement the teacher (second) network, meaning that the learning rate is adjusted based on the output calculated in the LSTM).
Li et al., Hoeim et al., Han, and Meng et al. are analogous to the claimed invention because they are directed to training a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the adjustment to the learning rate is calculated based, at least in part, by a long short-term memory ("LSTM") module as taught by Meng et al. to the disclosed invention of Li et al. in view of Hoeim et al. and further in view of Han.
One of ordinary skill in the art would have been motivated to make this modification "to minimize the frame-level cross-entropy criterion" (Meng et al. Section 5.2, first paragraph).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN J HALES whose telephone number is (571)272-0878. The examiner can normally be reached M-Th 8:00am - 5:00pm and F 8:00am - 2:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRIAN J HALES/Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125