DETAILED ACTION
 
Notice of Pre-AIA  or AIA  Status
1.            The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
2.            The following claim(s) is/are pending in this Office action: 1-20.
3.            Claim(s) 1-20 are rejected.  This rejection is NON-FINAL.
 
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
 
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.

(A)       the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B)       the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C)       the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 
Claim limitations in this application that use the word “means” (or “step”) and a generic placeholder are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
4.       This application includes one or more claim limitations that use the word “means” and are thus interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses the word “means” that is coupled with functional language without reciting sufficient structure to perform the recited function and the “means” is not preceded by a structural modifier.  Such claim limitation(s) is/are: “means to train …,” “means to obtain …,” “means to configure …,” “means to execute …,” and “means to transmit …” in claims 15-20.
          In addition, this application includes one or more claim limitations that use a generic placeholder and are thus interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph,
More specifically, claim 15 recites the following bolded limitations that are interpreted under 35 U.S.C. § 112(f).
one or more computation modules, each of the one or more computation modules associated with a corresponding user, the one or more computation modules training first neural networks using data associated with the corresponding users; and
means to train first neural networks, each of the means associated with a corresponding user, the means to train the first neural networks is to use data associated with the corresponding users;
means to obtain a first set of parameters associated with the first neural networks;
means to configure a second neural network based on the first set of parameters;
means to execute the second neural network to generate a second set of parameters; and
means to transmit the second set of parameters to the first neural networks to update the first neural networks.
 
These “means” limitations and generic placer holder (one or more computation modules) are respectively followed by functional limitations without reciting sufficient structure, material, or act.  Moreover, the limitation “one or more computation modules” is also followed by function limitations without reciting sufficient structure, material, or act.  Therefore, claims 15-20 recite means to perform the respectively claimed functions and generic placeholder to perform the recited function(s) without being modified with sufficient structure, material, or acts and thus invoke the interpretation under 35 USC 112(f). 
(a)      The corresponding structure described in the disclosure for the claimed “one or more computation modules” appears to be a software implementation as described in ¶ [0018] where a “computation module” is the subject of training or the rectangular boxes of reference numerals 108 of FIG. 1 as well as 202, 204, and 206 of FIG. 2.
          In addition, the corresponding structure described in the disclosure for the claimed “one or more computation modules” further includes “computing systems such as a personal computer, a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPad™), a headset, or other wearable device, or any other type of computing device”, 
(b)      The corresponding structure described in the disclosure for the claimed “means to train first neural networks” appears to be a software implementation - “machine learning system 100” - illustrated in FIG. 1 and described in the description of FIG. 1 in ¶ [0016].  Moreover, the corresponding structure described in the disclosure for the claimed “means to train first neural networks” appears to be one or more software implementations including a neural network itself as described (¶ [0038] “the first neural network 212 may continuously update and train the first neural network 212” and ¶ [0079] “the first neural network 212 may train and/or otherwise update the first neural network 212 using the parameters determined at block 612”) and a “network configurator” (see e.g., ¶¶ [0115] and [0122]).
In addition, the corresponding structure described in the disclosure for the claimed “means to train first neural networks” further includes the “FPGA 208” and “computation modules 202, 204, 206” illustrated in FIG. 2 and described in ¶¶ [0022]-[0028] where “computation modules 202, 204, 206 are computing systems such as a personal computer, a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPad™), a headset, or other wearable device, or any other type of computing device.”  Further, the corresponding structure described in the disclosure for the claimed “means to train first neural networks” includes “a headset, a mobile device, or a wearable device” (e.g., ¶ [0126]) and “an example processor platform 800” (e.g., FIG. 8 and ¶ [0091]).

          Moreover, the corresponding structure described in the disclosure for the claimed “means to obtain a first set of parameters associated with the first neural networks” appears to be one or more software implementations including “a collection engine” (e.g., FIG. 2 and ¶¶ [0025] and [0051]-[0052]), a “neural network” (e.g., ¶¶ [0036], [0038], and [0082]), “one or more learning models” (e.g., ¶ [0040]), a “computation module” (e.g., ¶¶ [0047] and [0058]), an “artificial neuron” (e.g., ¶ [0056]), and an “interconnect and width adapters 408” (e.g., ¶ [0061])
(d)      The corresponding structure described in the disclosure for the claimed “means to configure a second neural network based on the first set of parameters” includes “an FPGA” (e.g., ¶¶ [0111] and [0118])
          Moreover, the corresponding structure described in the disclosure for the claimed “means to configure a second neural network based on the first set of parameters” appears to be one or more software implementations including a “network configurator 218” (e.g., FIG. 2 and ¶¶ [0048] and [0054], [0090], [0115], and [0122]), “one or more machine learning model” (e.g., ¶ [0035]), “interconnect and width adapters 408” (e.g., ¶ [0061]), and a “computation module” (e.g., ¶ [0073]).
(e)      The corresponding structure described in the disclosure for the claimed “means to execute the second neural network to generate a second set of parameters” includes “computation modules 202, 204, 206” (e.g., FIG. 2 and ¶¶ [0022] and [0043], [0069], and 
(f)       The corresponding structure described in the disclosure for the claimed “means to transmit the second set of parameters to the first neural networks to update the first neural networks” includes an “FPGA” (e.g., ¶¶ [0090], [0111], and [0118]).
          Moreover, the corresponding structure described in the disclosure for the claimed “means to transmit the second set of parameters to the first neural networks to update the first neural networks” appears to be one or more software implementations including a “network configurator 218” (e.g., FIG. 2 and ¶ [0049]).
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.

 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 
 
5.            Claims 1-20 stand rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
(a)	Claims 1 and 8: The claimed limitation “one or more computation modules” of claims 1 and 8 as well as the claimed “means to train the first neural networks” invokes 35 USC § 112(f).  Nonetheless, according to the present disclosure, the “one or more computation modules” and “means to train” can be a software implementation (e.g., ¶ [0018] and FIGS. 1-2 of the disclosure) or a hardware implementation (see e.g., ¶¶ [0024], [0111], and [0119] of the disclosure). The structure of the claimed “one or more 
(b)	Dependent claims 2-7, 9-14, and 15-20 respectively depend from independent claims 1, 8, and 15 and thus inherit the defects of their respective independent claims.  Claims 2-7, 9-14, and 15-20 are thus rejected under 35 U.S.C. § 112(b) accordingly, the same rationale applying. 
(c)      Claims 2, 9, and 16: The claimed limitation “wherein each of the one or more computation modules is a headset, a mobile device, or a wearable device” in claims 2 and 9 as well as the claimed limitation “the means to train the first neural networks include a headset, a mobile deice, or a wearable device” is indefinite because it is unclear how a headset, a mobile device, or a wearable device trains neural networks.  Clarification is required. 
(d)      Claims 6, 13, and 20: The limitation “the intermediate representation” lacks proper antecedent basis because these claims first respectively recite “a first intermediate representation” that does not provide adequate antecedent basis for the recited “the intermediate representation”.  For the purpose of examination, the recited “the intermediate representation” is interpreted as “the first intermediate representation”.
 
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to 
 
6.            Claim(s) 1-5, 7, 9-11, and 15-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al., Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA (September 4, 2018) (hereinafter Zhao) in view of Chaplot et al., Personalized Adaptive Learning using Neural Networks (April 2016) (hereinafter Chaplot).
          With respect to claim 1, Zhao teaches:  
          A system to improve data training of a neural network, the system comprising: one or more computation modules, (Zhao at Abstract: “Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which commonly-used CNN models pre-trained on general datasets may not be efficient enough. This paper presents TuRF, an end-to-end CNN acceleration framework to efficiently deploy domain-specific applications on FPGA by transfer learning that adapts pre-trained models to specific domains, replacing standard convolution layers with efficient convolution blocks, and applying layer fusion to enhance hardware design performance.” The examiner notes that Zhao’s “domain-specific applications on FPGA” teach one or more computation modules.)
the one or more computation modules training first neural networks using data; and (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.” ¶ 1, § V-A-(2): “we devise the following optimisation approach, inspired by the principles of transfer learning. The input to our exploration procedure can be any models pre-trained based on ImageNet, which supposedly is general and consists of removable redundancies regarding the targeting application. We intend to achieve the required accuracy by fine-tuning the input model, in which only top layers are trained and others are fixed.” The examiner notes that Zhao’s training layers of a convolutional neural network in its “domain specific applications on FPGA” teaches one or more computing modules training a first neural network using data.)
a field-programmable gate array (FPGA) to: (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.” Abstract: “We evaluate TuRF by deploying a pre-trained VGG-16 model for a domain-specific image recognition task onto a Stratix V FPGA.” The examiner notes that an FPGA implementation of the aforementioned FPGA design teaches this limitation.)
obtain a first set of parameters from each of the one or more computation modules, the first set of parameters associated with the first neural networks; (Zhao at FIG. 4; § III-B-(3): “Winograd Transformation”; and Equations (7)-(9): “Let Tk be the Winograd tile size (m + K −1), (7), (8), (9) illustrate the configurations and interfaces of the transformation modules for input feature map BTdB, weights GdGT, and output ATXA respectively.”

    PNG
    media_image1.png
    89
    351
    media_image1.png
    Greyscale
 
¶ 1, § II-B: “where ⊙ is the Hadamard product. G, B, A are three transformation matrices with (m + r - 1) x r, (m + r - 1) x (m + r - 1), (m + r - 1) x m in shape, g is an r2 filter kernel and d with the size (m + r - 1) x (m + r - 1) is a tile of the input feature map.” The examiner first notes that Zhao’s input feature map (d) and/or the weights (g) received for respective transformation into BTdB, weights GdGT for its FPGA implementation teach a first set of parameters associated with the CNN model for FPGA implementation and thus teaches this limitation.)
configure a second neural network based on the first set of parameters; (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.” The examiner notes that Zhao’s “various convolution blocks” in its FPGA implementation teaches a second neural network.  The examiner further notes that Zhao’s receiving a pre-trained CNN model, replacing standard convolution layers with convolution blocks, and fine-tuning and evaluating the layer-replaced model in its FPGA implementation in order to output an efficient FPGA design teaches the above limitation.)
execute the second neural network to generate a second set of parameters; and (Zhao at ¶ 1, § VI: “we evaluate TuRF in terms of model transformation and optimisation by accepting conventional model VGG-16 pretrained based on a large dataset and generating a set of smaller models with different number of groups replaced.”  ¶ 1, § VI-A: “All the CNN models evaluated here are built, trained and evaluated using the latest TensorFlow (v1.6). Pre-trained models are downloaded directly from TF-Slim. The experimental FPGA platform is Stratix-V 5SGSD8”. ¶ 1, § VI-B: “We first evaluate the performance of three popular efficient CNN models: ResNet-50, MobileNet V1 and V2 generated by our framework”.  § III-B-(3): “Winograd Transformation”; and Equations (7)-(9): “Let Tk be the Winograd tile size (m + K −1), (7), (8), (9) illustrate the configurations and interfaces of the transformation modules for input feature map BTdB, weights GdGT, and output ATXA respectively.”
The examiner notes that Zhao’s building an FPGA for a domain specific application with Zhao’s TuRF and evaluating the FPGA in § VI cited above teaches executing its FPGA design.  The examiner further notes that Zhao’s transformed parameters (e.g., the input feature map BTdB, weights GdGT” from the input feature map (d) and r2 kernel in § III-B-(3) cited above) to generated a second set of parameters for FPGA implementation.)
transmit the second set of parameters to the first neural networks to update the first neural networks. (Zhao at FIG. 1: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

¶ 1, § VI: “In this evaluation, we look at the capability of TuRF by generating hardware design for typical efficient CNN models. Then, we evaluate TuRF in terms of model transformation and optimisation by accepting conventional model VGG-16 pretrained based on a large dataset and generating a set of smaller models with different number of groups replaced.” ¶ 1, § VI-(C): “The accuracy requirement supplied to the framework is gradually adjusted to generate implementations with different number of groups replaced. This enables us to understand the implications of replacing the standard convolution layer with various types of convolution block in conventional CNN model.”
The examiner notes that Zhao’s explicitly teaching in FIG. 1 of its iteratively updating its “Hardware Generation” in § IV and “Layer-wise Model Optimization” in § V to output an “optimized model” for the originally pre-trained CNN model and the “FPGA design” with “transfer learning” teaches transmitting one or more sets of parameters (e.g., the second set of parameters) from the FPGA implementation design to update the first neural networks.  The examiner further notes that Zhao’s iteratively replacing a convolutional layer with various types of convolution block for performance comparisons also teaches transmit the second set of parameters to update the first neural network as claimed.)
Zhao does not appear to explicitly teach:
each of the one or more computation modules associated with a corresponding user,
the one or more computation modules training first neural networks using data associated with the corresponding users; and 
 
In the same field of endeavor, Chaplot does, however, teach:
each of the one or more computation modules associated with a corresponding user, (Chaplot: at Abstract: “Adaptive learning is the core technology behind intelligent tutoring systems, which are responsible for estimating student knowledge and providing personalized instruction to students based on their skill level. In this paper, we present a new adaptive learning system architecture, which uses Artificial Neural Network to construct the Learner Model, which automatically models relationship between different concepts in the curriculum and beats Knowledge Tracing in predicting student performance.”; p. 1, ¶ 1, Introduction: “Adaptive learning refers broadly to a learning process where the content taught or the way such content is presented changes, or “adapts,” based on the responses of the individual student [5]. It is the core technology for intelligent tutoring systems having 3 major components: model of content to be learned (Content Model), model to estimate student proficiency (Learner Model) and a model to present content to the student in a personalized fashion based on his proficiency (Instructional Model).”)
the one or more computation modules training first neural networks using data associated with the corresponding users; and (Chaplot at ¶ 1, right-hand column, p. 2: “Learner Model” ¶ 1, left-hand column, p. 3: “ESTIMATING STUDENT LEARNING RATE”: “We create an individualized neural network for each student Tk, which is trained only on transactions by that student.” ¶ 1, right-hand column, p. 2: “Learner Model”: “Student performance data contains student-item transactions, each containing Student ID T = Tk, Item ID M = Mj set of concepts involved in item Mj (denoted by Sj), Current Opportunity Count(s) (OC) [2] of concept(s) in set Sj and student response Xt (1 for correct, 0 for incorrect). The OC(s) of concept(s) involved in the item and corresponding student response are used as input and output, respectively, for training the Neural Network as shown in Figure 2.”

    PNG
    media_image3.png
    146
    180
    media_image3.png
    Greyscale

The examiner notes that Chaplot’s “learner model” teaches one or more computation modules”, that Chaplot’s “transactions by that student” and/or “corresponding student response” teaches “data associated with the corresponding users, and that Chaplot’s Learner Model’s training its neural network with the aforementioned data teaches the above limitation in its entirety.)
          Zhao and Chaplot are analogous art because both references pertain to training neural networks for domain specific applications.  
Zhao’s  system for training a first neural network with parameters generated by a second neural network executing on an FPGA for a domain-specific application (Zhao at Abstract and §§ I-II) with Chaplot’s personalized adaptive learner model optimized by user-specific data and responses  (Chaplot at Abstract; ¶ 1, Introduction, p. 1; ¶ 1, right-hand column, p. 2, ¶ 1, left-hand column, p. 3; p. 2, right-hand column, ¶ 1, supra).  The modification not only overcomes shortcomings of conventional approaches that only provide predefined, non-user-specific application that can be inappropriate for individual, different users (Chaplot at p. 4, left-hand column, ¶ 2: “The proposed adaptive learning system overcomes two important shortcomings of existing adaptive learning systems: (1) inability of Learner Model to handle multi-concept problems and (2) inability of Instructional Model to systematically select problems of appropriate difficulty for the student to maximize learning gain. We propose a new adaptive learning system architecture as shown in Figure 1, based on Artificial Neural Networks, which overcomes these shortcomings”) but also maximizes the gain of the domain-specific application for each individual user (Chaplot, ¶ 2, left-hand column, p. 4: “we use Maximizing Learning Gain and Maximizing Personalized Learning Gain in idealized setting using real values of parameters (item difficulty and student learning rate), which were used to generate data. As the real values will not be available in practice, we also maximize LG and PLG using parameter estimates from Neural Networks. Results in Table 3 show that maximizing PLG (ideal) reduces items required to achieve mastery by 26.5% over pre-defined curriculum sequence policy.  Max PLG (NN) is able to achieve learning efficiency comparable to the ideal scenario.”)
 
          With respect to claim 2, Zhao modified by Chaplot teaches a system of claim 1, and Zhao further teaches:
wherein each of the one or more computation modules is a headset, a mobile device, or a wearable device. (Zhao at Last paragraph, § II-C, “Efficient CNN Models”, p. 2: “In this paper, we mainly study three efﬁcient CNN models: ResNet-50 [4], MobileNet V1 [5] and V2 [6], to gain research insight for our hardware template and to make a comparison to our generated CNN model for domain-speciﬁc application.” The examiner notes that MobileNet is known to execute on mobile devices (see [5] Howard et al. cited in Zhao. Therefore, Zhao teaches this limitation.)
 
          With respect to claim 3, Zhao modified by Chaplot teaches a system of claim 1, and Zhao further teaches:
wherein the data is audio data, visual data, or text data. (Zhao at Abstract: “We evaluate TuRF by deploying a pre-trained VGG-16 model for a domain-specific image recognition task onto a Stratix V FPGA.”)
 
          With respect to claim 4, Zhao modified by Chaplot teaches a system of claim 1, and Zhao further teaches:
(Zhao at FIG. 4; § III-B-(3) “Winograd Transformation”; and Equations (7)-(9): “Let Tk be the Winograd tile size (m + K −1), (7), (8), (9) illustrate the configurations and interfaces of the transformation modules for input feature map BTdB, weights GdGT, and output ATXA respectively.”  § V “Layer-wise Model Optimisation”, p. 6 and ¶ 3, right-hand column, § I, p. 1: “2) Characterisation of the design space of CNN model regarding domain-speciﬁc applications and a transfer learning inspired layer-wise optimisation that replaces standard convolution layers by blocks with ﬁne-tuning (Section V).” § III-B-(5): “5) Other Design Modules: An element-wise addition module performs addition of two identically sized feature maps. An activation module implements non-linear activation functions.” FIG. 4: “Max or Average Pooling”.)
 
          With respect to claim 5, Zhao modified by Chaplot teaches a system of claim 1, and Zhao further teaches:
          wherein each of the one or more computation modules includes: a collection engine to obtain the data; (Zhao at FIG. 1: “Inputs: - Domain knowledge – Pre-trained models – platform spec – Requirements”.  The examiner notes that the input module in Zhao’s system that receives the aforementioned “inputs” teaches the claimed collection engine.)
(Zhao at § III-A-2: “An input buffer caches input feature map to be reused throughout the computation, and an output buffer stores and accumulates temporary results.” The examiner notes that Zhao’s buffer stores the structured data feature map and thus teaches a database.)
a network configurator to train the first neural networks using the second set of parameters. (Zhao at “TuRF design flow” in FIG. 1, p. 1:

    PNG
    media_image4.png
    256
    445
    media_image4.png
    Greyscale

The examiner notes that Zhao’s iterative exploration & evaluation (§ VI) between the layer-wise model optimization (§ V) and hardware generation (§ IV) teaches providing the second set of parameters from Zhao’s FPGA design to Zhao’s CNN model for layer-wise model optimization and thus teaches the above limitation.)

With respect to claim 13, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 15, Zhao teaches: 
(Zhao at Abstract: “Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which commonly-used CNN models pre-trained on general datasets may not be efficient enough. This paper presents TuRF, an end-to-end CNN acceleration framework to efficiently deploy domain-specific applications on FPGA by transfer learning that adapts pre-trained models to specific domains, replacing standard convolution layers with efficient convolution blocks, and applying layer fusion to enhance hardware design performance.” ¶ 1, § V-A-(2): “we devise the following optimisation approach, inspired by the principles of transfer learning. The input to our exploration procedure can be any models pre-trained based on ImageNet, which supposedly is general and consists of removable redundancies regarding the targeting application. We intend to achieve the required accuracy by fine-tuning the input model, in which only top layers are trained and others are fixed.” The examiner notes that Zhao’s training layers of a convolutional neural network in its “domain specific applications on FPGA” teaches one or more computing modules training a first neural network using data.)
means to obtain a first set of parameters associated with the first neural networks; (Zhao at Abstract: “We evaluate TuRF by deploying a pre-trained VGG-16 model for a domain-specific image recognition task onto a Stratix V FPGA.” § III-B-(3): “Winograd Transformation”; and Equations (7)-(9): “Let Tk be the Winograd tile size (m + K −1), (7), (8), (9) illustrate the configurations and interfaces of the transformation modules for input feature map BTdB, weights GdGT, and output ATXA respectively.”

    PNG
    media_image1.png
    89
    351
    media_image1.png
    Greyscale
 
¶ 1, ¶ II-B: “where ⊙ is the Hadamard product. G, B, A are three transformation matrices with (m + r - 1) x r, (m + r - 1) x (m + r - 1), (m + r - 1) x m in shape, g is an r2 filter kernel and d with the size (m + r - 1) x (m + r - 1) is a tile of the input feature map.” The examiner first notes that Zhao’s FPGA for a domain specific application teaches the claimed means to obtain.  The examiner further notes that Zhao’s input feature map (d) and/or the weights (g) received for respective transformation into BTdB, weights GdGT for its FPGA implementation teach a first set of parameters associated with the CNN model for FPGA implementation and thus teaches this limitation.)
means to configure a second neural network based on the first set of parameters; (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.” The examiner notes that Zhao’s “various convolution blocks” in its FPGA implementation teaches a second neural network.  The examiner further notes that Zhao’s receiving a pre-trained CNN model, replacing standard convolution layers with convolution blocks, and fine-tuning and evaluating the layer-replaced model in its FPGA implementation in order to output an efficient FPGA design teaches the above limitation.)
(Zhao at ¶ 1, § VI: “we evaluate TuRF in terms of model transformation and optimisation by accepting conventional model VGG-16 pretrained based on a large dataset and generating a set of smaller models with different number of groups replaced.”  ¶ 1, § VI-A: “All the CNN models evaluated here are built, trained and evaluated using the latest TensorFlow (v1.6). Pre-trained models are downloaded directly from TF-Slim. The experimental FPGA platform is Stratix-V 5SGSD8”. ¶ 1, § VI-B: “We first evaluate the performance of three popular efficient CNN models: ResNet-50, MobileNet V1 and V2 generated by our framework”.  § III-B-(3): “Winograd Transformation”; and Equations (7)-(9): “Let Tk be the Winograd tile size (m + K −1), (7), (8), (9) illustrate the configurations and interfaces of the transformation modules for input feature map BTdB, weights GdGT, and output ATXA respectively.”
The examiner first notes that Zhao’s TuRG and/or the FPGA teaches the claimed means to execute.  The examiner also notes that Zhao’s building and evaluating the FPGA in § VI cited above teaches executing its FPGA design.  The examiner further notes that Zhao’s transformed parameters (e.g., the input feature map BTdB, weights GdGT” from the input feature map (d) and r2 kernel in § III-B-(3) cited above) that are generated by Zhao’s TuRF and further fine-tuned teach a second set of parameters for Zhao’s FPGA implementation.)
means to transmit the second set of parameters to the first neural networks to update the first neural networks. (Zhao at FIG. 1: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

¶ 1, § VI: “In this evaluation, we look at the capability of TuRF by generating hardware design for typical efficient CNN models. Then, we evaluate TuRF in terms of model transformation and optimisation by accepting conventional model VGG-16 pretrained based on a large dataset and generating a set of smaller models with different number of groups replaced.” ¶ 1, § VI-(C): “The accuracy requirement supplied to the framework is gradually adjusted to generate implementations with different number of groups replaced. This enables us to understand the implications of replacing the standard convolution layer with various types of convolution block in conventional CNN model.”
The examiner notes that Zhao’s explicitly teaching in FIG. 1 of its iteratively updating its “Hardware Generation” in § IV and “Layer-wise Model Optimization” in § V to output an “optimized model” for the originally pre-trained CNN model and the “FPGA design” with “transfer learning” teaches transmitting one or more sets of parameters (e.g., the second set of parameters) from the FPGA implementation design to update the first neural networks.  The examiner further notes that Zhao’s iteratively replacing a convolutional layer with various types of convolution block for performance comparisons also teaches transmit the second set of parameters to update the first neural network as claimed.)
Zhao does not appear to explicitly teach:
each of the means associated with a corresponding user, 
the means to train the first neural networks is to use data associated with the corresponding users;
 
In the same field of endeavor, Chaplot does, however, teach:
each of the means associated with a corresponding user, (Chaplot: at Abstract: “Adaptive learning is the core technology behind intelligent tutoring systems, which are responsible for estimating student knowledge and providing personalized instruction to students based on their skill level. In this paper, we present a new adaptive learning system architecture, which uses Artificial Neural Network to construct the Learner Model, which automatically models relationship between different concepts in the curriculum and beats Knowledge Tracing in predicting student performance.”) (¶ 1, Introduction: “Adaptive learning refers broadly to a learning process where the content taught or the way such content is presented changes, or “adapts,” based on the responses of the individual student [5]. It is the core technology for intelligent tutoring systems having 3 major components: model of content to be learned (Content Model), model to estimate student proficiency (Learner Model) and a model to present content to the student in a personalized fashion based on his proficiency (Instructional Model).”)
(Chaplot at ¶ 1, right-hand column, p. 2: “Learner Model” ¶ 1, left-hand column, p. 3: “ESTIMATING STUDENT LEARNING RATE”: “We create an individualized neural network for each student Tk, which is trained only on transactions by that student.” ¶ 1, right-hand column, p. 2: “Learner Model”: “Student performance data contains student-item transactions, each containing Student ID T = Tk, Item ID M = Mj set of concepts involved in item Mj (denoted by Sj), Current Opportunity Count(s) (OC) [2] of concept(s) in set Sj and student response Xt (1 for correct, 0 for incorrect). The OC(s) of concept(s) involved in the item and corresponding student response are used as input and output, respectively, for training the Neural Network as shown in Figure 2.”

    PNG
    media_image3.png
    146
    180
    media_image3.png
    Greyscale

The examiner notes that Chaplot’s “learner model” teaches one or more computation modules”, that Chaplot’s “transactions by that student” and/or “corresponding student response” teaches “data associated with the corresponding users, and that Chaplot’s Learner Model’s training its neural network with the aforementioned data teaches the above limitation in its entirety.)
Zhao and Chaplot are analogous art because both references pertain to training neural networks for domain specific applications.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to combine Zhao’s system for training a first neural network with parameters generated by a second neural network executing on an FPGA for a domain-specific application (Zhao at Abstract and §§ I-II) with Chaplot’s personalized adaptive learner model optimized by user-specific data and responses (Chaplot at Abstract; ¶ 1, Introduction, p. 1; ¶ 1, right-hand column, p. 2, ¶ 1, left-hand column, p. 3; p. 2, right-hand column, ¶ 1, supra). The modification not only to overcomes shortcomings of conventional approaches (Chaplot at p. 4, left-hand column, ¶ 2:  “The proposed adaptive learning system overcomes two important shortcomings of existing adaptive learning systems: (1) inability of Learner Model to handle multi-concept problems and (2) inability of Instructional Model to systematically select problems of appropriate difficulty for the student to maximize learning gain. We propose a new adaptive learning system architecture as shown in Figure 1, based on Artificial Neural Networks, which overcomes these shortcomings”) but also maximizes the gain of the domain-specific application for each individual user (Chaplot, ¶ 2, left-hand column, p. 4: “we use Maximizing Learning Gain and Maximizing Personalized Learning Gain in idealized setting using real values of parameters (item difficulty and student learning rate), which were used to generate data. As the real values will not be available in practice, we also maximize LG and PLG using parameter estimates from Neural Networks. Results in Table 3 show that maximizing PLG (ideal) reduces items required to achieve mastery by 26.5% over pre-defined curriculum sequence policy.  Max PLG (NN) is able to achieve learning efficiency comparable to the ideal scenario.”)  

With respect to claim 16, it is substantially similar to claim 2 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 17, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 18, it is substantially similar to claim 4 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 19, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying. 

7.            Claims 8-12 and 14 stand rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al., Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA (September 4, 2018) (hereinafter Zhao) in view of Chaplot et al., Personalized Adaptive Learning using Neural Networks (April 2016) (hereinafter Chaplot) and further in view of Werzynski, C. US PGPub 2017/0024641 published on Jan. 26, 2017 (hereinafter Werzynski).
          With respect to claim 8, Zhao teaches:
A memory comprising instructions which, when executed, cause a machine to at least: (Zhao at ¶ 1, § VI-A: “The experimental FPGA platform is Stratix-V 5SGSD8 on a Maxeler MPC-X node,  which contains 262.4K adaptive logic modules (ALM), 1963 variable-precision DSP blocks, and 2567 BRAM (M20K)”)
train first neural networks using one or more computation modules, (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.” 
“TuRF design flow” in FIG. 1, p. 1:

    PNG
    media_image4.png
    256
    445
    media_image4.png
    Greyscale

The examiner first notes that Zhao’s system or “CNN application” that generates the aforementioned “pre-trained model” for further processing by its TuRF (an “end-to-end CNN acceleration framework”) teaches one or more computation modules. The examiner further notes that Zhao’s accepting a pre-trained CNN model teaches training, prior to FPGA implementation, a first neural network using data.)
the one or more computation modules to train the first neural networks using data associated with the corresponding users; (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.” The examiner notes that Zhao’s accepting a pre-trained CNN model teaches training, prior to FPGA implementation, a first neural network using data.)
obtain, with a field-programmable gate array (FPGA), a first set of parameters from each of the one or more computation modules, the first set of parameters associated with the first neural networks; (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.”
§ III-B-(3): “Winograd Transformation”; and Equations (7)-(9): “Let Tk be the Winograd tile size (m + K −1), (7), (8), (9) illustrate the configurations and interfaces of the transformation modules for input feature map BTdB, weights GdGT, and output ATXA respectively.”

    PNG
    media_image1.png
    89
    351
    media_image1.png
    Greyscale
 
¶ 1, ¶ II-B: “where ⊙ is the Hadamard product. G, B, A are three transformation matrices with (m + r - 1) x r, (m + r - 1) x (m + r - 1), (m + r - 1) x m in shape, g is an r2 filter kernel and d with the size (m + r - 1) x (m + r - 1) is a tile of the input feature map.” The examiner first notes that Zhao’s input feature map (d) and/or the weights (g) received for respective transformation into BTdB, weights GdGT for its “efficient FPGA design” teach a first set of parameters associated with the first neural network.  The examiner further notes that Zhao’s receiving the aforementioned input feature map (d) and weights (g) for transformation for the “efficient FPGA design” teaches obtain a first set of parameters with an FPGA as claimed.)
configure, with the FPGA, a second neural network based on the first set of parameters; (Zhao at ¶ 1, right-hand column, § I, p. 1: “TuRF accepts a CNN model pre-trained from a large-scale dataset, replaces its selected standard convolution layers with various convolution blocks, fine-tunes and evaluates the layer-replaced model, and outputs an efficient FPGA design in the end.” The examiner notes that Zhao’s “various convolution blocks” in its FPGA implementation teaches a second neural network.  The examiner further notes that Zhao’s receiving a pre-trained CNN model, replacing standard convolution layers with convolution blocks, and fine-tuning and evaluating the layer-replaced model in its FPGA implementation in order to output an efficient FPGA design teaches the above limitation.)
execute, with the FPGA, the second neural network to generate a second set of parameters; and (Zhao at ¶ 1, § VI: “we evaluate TuRF in terms of model transformation and optimisation by accepting conventional model VGG-16 pretrained based on a large dataset and generating a set of smaller models with different number of groups replaced.”  ¶ 1, § VI-A: “All the CNN models evaluated here are built, trained and evaluated using the latest TensorFlow (v1.6). Pre-trained models are downloaded directly from TF-Slim. The experimental FPGA platform is Stratix-V 5SGSD8”. ¶ 1, § VI-B: “We first evaluate the performance of three popular efficient CNN models: ResNet-50, MobileNet V1 and V2 generated by our framework”.  § III-B-(3): “Winograd Transformation”; and Equations (7)-(9): “Let Tk be the Winograd tile size (m + K −1), (7), (8), (9) illustrate the configurations and interfaces of the transformation modules for input feature map BTdB, weights GdGT, and output ATXA respectively.”
The examiner first notes that Zhao’s TuRG and/or the FPGA teaches the claimed means to execute.  The examiner also notes that Zhao’s building and evaluating the FPGA in § VI cited above teaches executing its FPGA design.  The examiner further notes that Zhao’s transformed parameters (e.g., the input feature map BTdB, weights GdGT” from the input feature map (d) and r2 kernel in § III-B-(3) cited above) that are generated by Zhao’s TuRF and further fine-tuned teach a second set of parameters for Zhao’s FPGA implementation.)
transmit, with the FPGA, the second set of parameters to the first neural networks to update the first neural networks. (Zhao at Abstract: “Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which commonly-used CNN models pre-trained on general datasets may not be efficient enough. his paper presents TuRF, an end-to-end CNN acceleration framework to efficiently deploy domain-specific applications on FPGA by transfer learning that adapts pre-trained models to specific domains, replacing standard convolution layers with efficient convolution blocks, and applying layer fusion to enhance hardware design performance.” FIG. 1 (see FIG. 1 reproduced above) and §§ IV-VI.
The examiner notes that Zhao’s explicitly teaching in FIG. 1 of its iteratively updating its “Hardware Generation” in § IV and “Layer-wise Model Optimization” in § V to output an “optimized model” for the originally pre-trained CNN model and the “FPGA design” with “transfer learning” teaches transmitting one or more sets of parameters (e.g., the second set of parameters) from the FPGA implementation design to update the first neural networks.)
Zhao does not appear to explicitly teach 
A non-transitory computer readable storage medium comprising instructions which, when executed, cause a machine to at least: 

the one or more computation modules to train the first neural networks using data associated with the corresponding users;
In the same field of endeavor, Chaplot does, however, teach:
each of the one or more computation modules associated with a corresponding user, (Chaplot: at Abstract: “Adaptive learning is the core technology behind intelligent tutoring systems, which are responsible for estimating student knowledge and providing personalized instruction to students based on their skill level. In this paper, we present a new adaptive learning system architecture, which uses Artificial Neural Network to construct the Learner Model, which automatically models relationship between different concepts in the curriculum and beats Knowledge Tracing in predicting student performance.”) (¶ 1, Introduction: “Adaptive learning refers broadly to a learning process where the content taught or the way such content is presented changes, or “adapts,” based on the responses of the individual student [5]. It is the core technology for intelligent tutoring systems having 3 major components: model of content to be learned (Content Model), model to estimate student proficiency (Learner Model) and a model to present content to the student in a personalized fashion based on his proficiency (Instructional Model).”)
the one or more computation modules to train the first neural networks using data associated with the corresponding users; (Chaplot at ¶ 1, right-hand column, p. 2: “Learner Model” ¶ 1, left-hand column, p. 3: “ESTIMATING STUDENT LEARNING RATE”: “We create an individualized neural network for each student Tk, which is trained only on transactions by that student.” ¶ 1, right-hand column, p. 2: “Learner Model”: “Student performance data contains student-item transactions, each containing Student ID T = Tk, Item ID M = Mj set of concepts involved in item Mj (denoted by Sj), Current Opportunity Count(s) (OC) [2] of concept(s) in set Sj and student response Xt (1 for correct, 0 for incorrect). The OC(s) of concept(s) involved in the item and corresponding student response are used as input and output, respectively, for training the Neural Network as shown in Figure 2.”

    PNG
    media_image3.png
    146
    180
    media_image3.png
    Greyscale

The examiner notes that Chaplot’s “learner model” teaches one or more computation modules”, that Chaplot’s “transactions by that student” and/or “corresponding student response” teaches “data associated with the corresponding users, and that Chaplot’s Learner Model’s training its neural network with the aforementioned data teaches the above limitation in its entirety.)
Zhao and Chaplot are analogous art because both references pertain to training neural networks for domain specific applications.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to combine Zhao’s system for training a first neural network with Zhao at Abstract and §§ I-II) with personalized adaptive learner model optimized by user-specific data and responses  (Chaplot at Abstract; ¶ 1, Introduction, p. 1; ¶ 1, right-hand column, p. 2, ¶ 1, left-hand column, p. 3; p. 2, right-hand column, ¶ 1, supra).  The modification not only overcomes shortcomings of conventional approaches that only provide predefined, non-user-specific application that can be inappropriate for individual, different users (Chaplot at p. 4, left-hand column, ¶ 2: “The proposed adaptive learning system overcomes two important shortcomings of existing adaptive learning systems: (1) inability of Learner Model to handle multi-concept problems and (2) inability of Instructional Model to systematically select problems of appropriate difficulty for the student to maximize learning gain. We propose a new adaptive learning system architecture as shown in Figure 1, based on Artificial Neural Networks, which overcomes these shortcomings”) but also to maximizes the gain of the domain-specific application for each individual user (Chaplot at ¶ 2, left-hand column, p. 4: “we use Maximizing Learning Gain and Maximizing Personalized Learning Gain in idealized setting using real values of parameters (item difficulty and student learning rate), which were used to generate data. As the real values will not be available in practice, we also maximize LG and PLG using parameter estimates from Neural Networks. Results in Table 3 show that maximizing PLG (ideal) reduces items required to achieve mastery by 26.5% over pre-defined curriculum sequence policy.  Max PLG (NN) is able to achieve learning efficiency comparable to the ideal scenario.”)
Zhao modified by Chaplot does not appear to explicitly teach a non-transitory computer readable storage medium comprising instructions which, when executed, cause a machine to at least. 
In the same field of endeavor, Wierzynski does, however, teach:  
A non-transitory computer readable storage medium comprising instructions which, when executed, cause a machine to at least: (Wierzynski at ¶ [0012]: “a non-transitory computer readable medium with non-transitory program code recorded thereon. The program code is executed by a processor and includes program code to receive second data.”)
Zhao, Chaplot, and Wierzynski are analogous art because all three references pertain to training neural networks for machine learning.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to combine Zhao’s CNN framework to deploy neural network domain-specific applications for FPGAs (Zhao at Title, Abstract, and §§ I-II) modified by personalized adaptive learner model optimized by user-specific data and responses  (Chaplot at Abstract; ¶ 1, Introduction, p. 1; ¶ 1, right-hand column, p. 2, ¶ 1, left-hand column, p. 3; p. 2, right-hand column, ¶ 1, supra) with Wierzynski’s “non-transitory computer Wierzynski at ¶ [0012], supra). The modification enables the storage of software modules (e.g., Zhao’s CNN framework to deploy neural network domain-specific applications modified by Chaplot’s personalized adaptive learner model) or instructions therefor to facilitate storage and execution of the software modules (Wierzynski at ¶ [0012]: “The computer program product has a non-transitory computer-readable medium with non-transitory program code recorded thereon. The program code is executed by a processor and includes program code to receive second data. The program code also includes program code to generate, via a first network, second labels for the second data. In one configuration, the first network has been previously trained on first labels for first data.”)

With respect to claim 9, it is substantially similar to claim 2 and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 10, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 11, it is substantially similar to claim 4 and is rejected in the same manner, the same art and reasoning applying. 



With respect to claim 14, it is substantially similar to claim 7 and is rejected in the same manner, the same art and reasoning applying. 

8.            Claims 6-7 and 20 stand rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al., Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA (September 4, 2018) (hereinafter Zhao) in view of Chaplot et al., Personalized Adaptive Learning using Neural Networks (April 2016) (hereinafter Chaplot) and further in view of Abdelfattah et al., DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration (July 13, 2018) (hereinafter Abdelfattah).
          With respect to claim 6, Zhao modified by Chaplot teaches a system of claim 1, and Zhao further teaches:
wherein the FPGA includes: a model optimizer to generate a first intermediate representation based on one of the first neural networks; (Zhao at FIG. 4:

    PNG
    media_image5.png
    327
    451
    media_image5.png
    Greyscale

¶ 2, § V-A, right-hand column: “As such, we propose a heuristic, greedy algorithm to explore model design space. It starts with a pre-trained model and tries to replace layers from the top. In each iteration, this algorithm fine-tunes the model candidate for a fixed number of steps.” 
The examiner notes that Zhao’s receiving a pre-trained CNN model and Inputs (see FIG. 1) at its Input Buffers for the pre-trained CNN model and processing the inputs by its “Winograd weights transfer” block, “Winograd input transfer” block, the “arithmetic module”, and/or the “Winograd output transfer” block teaches a model optimizer.  The examiner further notes that Zhao’s iteratively replacing a layer in the input pre-trained CNN model in each iteration teaches a first intermediate representation.  Therefore, Zhao’s § V-A and FIG. 4 teach the above limitation.)
an assembler to generate an output based on the second intermediate representation, the output to be executed at runtime. (Zhao at FIG. 1 “Outputs: - Optimised model – FPGA design”.  § III and FIG. 4 reproduced immediately above.  The examiner notes that Zhao’s “outputs” include the “optimizsed model” that denotes an optimized version of the input, pre-trained CNN model and is thus to be executed at runtime.)
          Zhao does not appear to explicitly teach:
an inference engine to adjust the intermediate representation; 
a high-graph compiler to generate a second intermediate representation; and 
          In the same field of endeavor, Abdelfattah does, however, teach:
an inference engine to adjust the intermediate representation; (Abdelfattah at § II and FIG. 1: “neural network inference accelerator (DLA)”.  ¶1,  § II-A: “To implement a NN on DLA, our graph compiler breaks it into units called “subgraphs” that fit within the overlay’s buffers and compute elements. For example, with convolutional neural networks (CNNs), a subgraph is typically a single convolution with an optional pooling layer afterwards. We deliver new VLIW instructions for each subgraph to program DLA correctly for the subgraph execution.” The examiner notes that Abdelfattah’s DLA that performs inferences on input teaches an inference engine, and that Abdelfattah’s neural network inference accelerator’s (DLA) processing input neural networks (e.g., the aforementioned first intermediate representation) teaches the above limitation.)
a high-graph compiler to generate a second intermediate representation; and (Abdelfattah at ¶ 2, right-hand column, § I, p. 1: “On the software side, we introduce an architecture-aware graph compiler that efﬁciently maps a NN to the overlay. This both maximizes the hardware efﬁciency when running the design and simpliﬁes the usability of the end application, where users are only required to enter domain speciﬁc deep learning languages, such as Caffe or Tensorﬂow, to program the overlay.” The examiner notes that Abdelfattah’s graph compiler teaches the claimed high-graph compiler, and that Abdelfattah’s graph compiler’s mapping a neural network (e.g., the aforementioned first intermediate representation) to an overlay teaches the above limitation.)
          Zhao, Chaplot, and Abdelfattah are analogous art because all three references pertain to using neural networks in applications for domain specific applications.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to combine Zhao’s  CNN framework to deploy neural network domain-specific applications for FPGAs (Zhao at Abstract, and §§ I-II, supra) modified by Chaplot’s adaptive learner model optimized by user-specific data (Chaplot at Abstract; ¶ 1, Introduction, p. 1; ¶ 1, right-hand column, p. 2, ¶ 1, left-hand column, p. 3; p. 2, right-hand column, ¶ 1, supra) with Abdelfattah’s inference engine and graph compiler that respectively adjusts an input neural network and generates a second intermediate representation by breaking a first intermediate representation into subgraphs for FPGAs (see Abdelfattah, supra). The modification optimizes and significantly boosts the performance of such domain-specific, user-specific neural network applications on FPGAs (Abdelfattah at Abstract: “In this paper, we tailor an overlay to a specific application domain, and we show how we maintain its full programmability without paying for the performance overhead traditionally associated with overlays. Specifically, we introduce an overlay targeted for deep neural network inference with only ~1% overhead to support the control and reprogramming logic using a lightweight very-long instruction word (VLIW) network. Additionally, we implement a sophisticated domain specific graph compiler that compiles deep learning languages such as Caffe or Tensorflow to easily target our overlay. We show how our graph compiler performs architecture-driven software optimizations to significantly boost performance of both convolutional and recurrent neural networks (CNNs/RNNs) – we demonstrate a 3X improvement on ResNet-101 and a 12X improvement for long short-term memory (LSTM) cells, compared to naïve implementations. Finally, we describe how we can tailor our hardware overlay, and use our graph compiler to achieve ~900 fps on GoogLeNet on an Intel Arria 10 1150 – the fastest ever reported on comparable FPGAs.”)

          With respect to claim 7, Zhao modified by Chaplot and Abdelfattah teaches a system of claim 6, and Zhao further teaches:
wherein the output is a hardware configuration of the FPGA or machine readable instructions. (Zhao at FIG. 1 “Outputs: - Optimised model – FPGA design”

    PNG
    media_image4.png
    256
    445
    media_image4.png
    Greyscale

FIG. 4 and § V-A reproduced for claim 6 immediately above.   ¶ 1, § VI-B: “We first evaluate the performance of three popular efficient CNN models: ResNet-50, MobileNet V1 and V2 generated by our framework and the results are shown in Table III.” The examiner notes that Zhao’s “optimizsed model” included in Zhao’s “outputs” (as shown in FIG. 1) for performance evaluation in § VI teaches machine readable instructions.  The examiner further notes that Zhao’s outputs that include “FPGA design” (also as shown in FIG. 1 above) teaches a hardware configuration of the FPGA.)

With respect to claim 20, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying. 

Conclusion
9.         The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
            Y. Ma, Y. Cao, S. Vrudhula and J. -s. Seo, "An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks," 2017 27th International Conference on Field Programmable Logic and Applications (FPL), 2017, pp. 1-8, doi: 10.23919/FPL.2017.8056824 teaches an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization.
(b)            Dey S., Shao Y., Chugg K.M., Beerel P.A. (24 October 2017) Accelerating Training of Deep Neural Networks via Sparse Edge Processing. In: Lintas A., Rovetta S., Verschure P., Villa A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science, vol 10613. Springer, Cham. teaches a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements. The overall effect is to reduce network complexity by factors up to 30x and training time by up to 35x relative to GPUs, while maintaining high fidelity of inference results.
(c)            Lang et al. U.S. US PGPub 2019/0258953 with the effective filing date of Jan. 23, 2018 teaches adaptive hardware using FPGA and Systems on a Chip (SOC) technologies, and a standard software architecture, to quickly build a broad range of testing systems—ranging from a small, inexpensive device for vulnerabilities assessment by non-experts—to laboratory systems for 
10.         Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERICH C. TZOU whose telephone number is (571)272-9852.  The examiner can normally be reached on Monday-Friday 7:30AM-5:00PM EST with alternative Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer 
 
 
/E.C.T./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126