DETAILED ACTION
This Office Action is in response to the remarks entered on 5/16/2022. Claims 1-2, 5-9, 12-16, 19-20 are amended. Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al (US Pub. No. 2019/0051290- hereinafter Li) in view of Xu et al (US Pub. 2008/0262984- hereinafter Xu) and further in view of Volkonsky et al (US Patent No. 6,594,824- hereinafter Volkonsky).
Referring to Claim 1, Li teaches a system to improve data training of a neural network, the system comprising: 
one or more processors, the one or more processors respectively associated with one or more corresponding users, the one or more processors to train first neural networks based on data associated with the corresponding users (see Li at [0018]: “[t]he model trainer 120 receives source domain data 130 and a target domain data 140 of various utterances from different domains”. Also, at [0020]: “the source domain data 130 are stored on the user device 110, within the model trainer 120, or in a database or other computing accessible by the model trainer 120. In some aspects, the target domain data 140 are part of a pre-existing dataset of a different domain than the source domain data 130 having parallel content”. Further, at [0021]: “[t]he source domain data 130 are fed to the teacher model 150 and the target domain data 140 are fed to the student model 160 to train the student model 160 to evaluate utterances in the target domain accurately. At initiation, the teacher model 150 is fully trained for the source domain, and is cloned (i.e., copied as a new instance) to create the initial student model 160”. In addition, at [0022]: “the teacher model 150 is a speech recognition model trained for a baseline domain”. Therefore, the teacher model corresponds to first neural network).
configure a second neural network based on a first set of parameters from the one or more processors, the first set of parameters associated with the first neural networks (see Li at [0042]: “an initial student model 160 is generated based on the teacher model 150. In various aspects, the initial student model 160 is a clone of the teacher model 150, wherein the weightings, Neural Networks are set exactly like those of the teacher model 150”. Therefore, the student model corresponds to the second neural network);
execute the second neural network to determine a difference between a first output associated with one or more of the first neural networks and a second output of the second neural network (see Li at Abstract: “[t]he outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain”).
However, Li fails to teach:
a field-programmable gate array (FPGA) to:
HANLEY, FLIGHT & ZIMMERMAN, LLCPage 2 of 16 Attorney Docket No. AB0989-USApplication No. 16/147,037Response to the Office action dated February 16, 2022generate a second set of parameters based on the difference between the first output and the second output; and 
cause transmission of the second set of parameters to at least some of the first neural networks to cause an update of the at least some of the first neural networks.  
Xu teaches, in analogous system, a field-programmable gate array (FPGA) (see Xu at [0017]- “An FPGA-based accelerator system for machine learning as described and claimed herein accelerates selected algorithms by providing better processing parallelism and memory access”. Further, at [0031]- “[t]emporary data structures, such as intermediate variables, parameters, and so forth, and results, e.g., the learned model, could be stored in distributed memory or registers inside the FPGA, which would act as high bandwidth, low latency cache”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with the above teachings of Xu by configuring a first and a second neural networks and comparing their outputs to determine differences, as taught by Li, and using an FPGA, as taught by Xu. The modification would have been obvious because one of ordinary skill in the art would be motivated to store the learned model in distributed memory or registers inside the FPGA, which would act as high bandwidth, low latency cache (see Xu at [0031]).

Volkonsky teaches, in an analogous system:
HANLEY, FLIGHT & ZIMMERMAN, LLCPage 2 of 16 Attorney Docket No. AB0989-USApplication No. 16/147,037 Response to the Office action dated February 16, 2022generate a second set of parameters based on the difference between the first output and the second output (see Volkonsky at Col. 3: 18-21: “[a] goal function, which is designed as an appropriate measure of the degree of optimization of the program according to its intermediate representation, is then calculated”. Further, at Col. 3: 33-40: “[t]he first instruction of the current basic block is then moved to all of its predecessors to create a test intermediate representation. A new goal function is calculated based on the test intermediate representation. If the difference between the previous goal function and the new goal function exceeds the stored profit value, the test intermediate representation is adopted, but is not adopted if the difference is less than or equal to the stored profit value”); and 
cause transmission of the second set of parameters to at least some of the first neural networks to cause an update of the at least some of the first neural networks (see Volkonsky at Col. 3: 18-21: “[a] goal function, which is designed as an appropriate measure of the degree of optimization of the program according to its intermediate representation, is then calculated”. Further, at Col. 3: 33-40: “[t]he first instruction of the current basic block is then moved to all of its predecessors to create a test intermediate representation. A new goal function is calculated based on the test intermediate representation. If the difference between the previous goal function and the new goal function exceeds the stored profit value, the test intermediate representation is adopted, but is not adopted if the difference is less than or equal to the stored profit value”. The adoption of the test intermediate representation corresponds to the claimed “cause an update of the at least some of the first neural networks”).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Li and Xu with the above teachings of Volkosnky by configuring a first and a second neural networks and comparing their outputs to determine differences using an FPGA, as taught by Li and Xu, and  Response to the Office action dated February 16, 2022generate a second set of parameters based on the difference between the first output and the second output, as taught by Volkonsky. The modification would have been obvious because one of ordinary skill in the art would be motivated to generate an optimized intermediate representation (IR) and measure the degree of optimization of models (see Volkonsky at Abtract).

Referring to Claim 2, the combination of Li, Xu and Volkonsky teaches the system of claim 1, wherein a first one of the one or more processors is associated with a headset, a mobile device, or a wearable device (see Li at [0018]: “a user device 110 is in communication with a model trainer 120 to develop speech recognition models for use in particular domains”. Further, at Fig. 1: user device 110 shows mobile devices).  

Referring to Claim 3, the combination of Li, Xu and Volkonsky teaches the system of claim 1, wherein the data is audio data, visual data, or text data (see Li at [0024]: “[t]he student model 160 is trained under the supervision of the teacher model 150, wherein each model 150, 160 receives utterances in their respective domains in parallel. Parallel utterances contain the same words, but have different audio features. For example, a child saying a given word will generally use a higher mean vocal frequency than an adult saying the same word, due to adults generally having deeper voices than children). 

Referring to Claim 4, the combination of Li, Xu and Volkonsky teaches the system of claim 1, wherein at least one of the first set of parameters or the second set of parameters includes at least one of an artificial neuron weight, a bias value, a network topology, a quantity of activation layers, or a quantity of pooling layers (see Li at [0042]: “an initial student model 160 is generated based on the teacher model 150. In various aspects, the initial student model 160 is a clone of the teacher model 150, wherein the weightings, Neural Networks are set exactly like those of the teacher model 150”).

Referring to Claim 5, the combination of Li, Xu and Volkonsky teaches the system of claim 1, wherein a first one of the one or more processors include storage to store the data, the first one of the processors to execute: 
collection engine software to obtain the data (see Li at Fig. 1 loop 260, which feedbacks data using backpropagation to update student model and then forward propagate parallel data to teach and student models); and 
network configuration software to train the first neural networks based on the second set of parameters (see Li at Fig. 1 loop 260, which feedbacks data using backpropagation to update student model and then forward propagate parallel data to teach and student models. This loop going back to the teacher model teaches the training. Further, at [0046]: “[o]nce the student model is updated, method 200 returns to OPERATION 230 to feed the teacher model 150 and the updated student model 160 parallel data from their associated domains”. Therefore, the operation 230 corresponds to the network configuration software). 
 
Referring to Claim 6 the combination of Li, Xu and Volkonsky teaches the system of claim 1, wherein the FPGA is to implement: 
model optimizer circuitry to generate a first intermediate representation based on one of the first neural networks (see Volkonsky at Abstract- “[a]n initial intermediate representation is extracted from the source code by organizing it as a plurality of basic blocks that each contain at least one program instruction ordered according to respective estimated profit values”); 
inference engine circuitry to cause an adjustment of the first intermediate representation after a test of the first intermediate representation, the test based on the data (see Volkonsky at Col. 3: 33-40: “[t]he first instruction of the current basic block is then moved to all of its predecessors to create a test intermediate representation. A new goal function is calculated based on the test intermediate representation. If the difference between the previous goal function and the new goal function exceeds the stored profit value, the test intermediate representation is adopted, but is not adopted if the difference is less than or equal to the stored profit value”); 
high-graph compiler circuitry to generate a second intermediate representation based on the adjustment (see Volkonsky at Col. 3:30-36 “the iterative process proceeds by selecting a current basic block and storing the profit value of the basic block subsequent to it in the ordered representation of basic blocks. The first instruction of the current basic block is then moved to all of its predecessors to create a test intermediate representation. A new goal function is calculated based on the test intermediate representation”); and  
assembler circuitry to generate an executable file based on the second intermediate representation, the executable file to be executed at runtime (see Volkonsky at Col. 4: 44-53 “The intermediate representation 120 is acted on by the optimizer 130 to generate an optimized intermediate representation 140, which also comprises a plurality of basic blocks. The optimized intermediate output 140 is then acted on by a code generator 150, which generates the object code 160 to be used by a target machine. In generating the object code 160 from the optimized intermediate representation 140, the code generator 150 determines when each instruction will execute relative to each of the other instructions”).  

Referring to Claim 7 The system of claim 6, wherein the executable file is a hardware configuration of the FPGA or machine readable instructions (see Volkonsky at Col. 4: 44-53 “The intermediate representation 120 is acted on by the optimizer 130 to generate an optimized intermediate representation 140, which also comprises a plurality of basic blocks. The optimized intermediate output 140 is then acted on by a code generator 150, which generates the object code 160 to be used by a target machine. In generating the object code 160 from the optimized intermediate representation 140, the code generator 150 determines when each instruction will execute relative to each of the other instructions”).
Referring to independent Claim 8 and Claim 15, they are rejected on the same basis as independent claim 1 since they are analogous claims.
Referring to dependent Claim 9 and Claim 16, they are rejected on the same basis as dependent claim 2 since they are analogous claims.
Referring to dependent Claim 10 and Claim 17, they are rejected on the same basis as dependent claim 3 since they are analogous claims.
Referring to dependent Claim 11 and Claim 18, they are rejected on the same basis as dependent claim 4 since they are analogous claims.
Referring to dependent Claim 12 and Claim 19, they are rejected on the same basis as dependent claim 5 since they are analogous claims.
Referring to dependent Claim 13 and Claim 20, they are rejected on the same basis as dependent claim 6 since they are analogous claims.
Referring to dependent Claim 14, it is rejected on the same basis as dependent claim 7 since they are analogous claims.
                                	
   Response to Arguments
The Applicant’s arguments regarding the rejection of above claims have been fully considered.
In reference to Applicant’s arguments about:
35 USC 112 rejections.
Examiner’s response:
            Rejections are withdrawn.
In reference to Applicant’s arguments about:
Rejections under 35 USC 103.
Examiner’s response:
Regarding the 35 USC 103 rejections, arguments are moot in view of the new grounds of rejection.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUIS A SITIRICHE whose telephone number is (571)270-1316. The examiner can normally be reached M-F 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126