DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 9/28/2022.
Applicant arguments/remarks made in amendment filed 9/28/2022.

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 9/28/2022 has been entered.
 
Claims 1, 11, and 18 are amended.
Claims 8 and 10 are cancelled.
Claims 1-5, 7, 11-15, and 17-19 are presented for examination. 
Response to Arguments
Applicant’s arguments that the cited prior fails to disclose the amended limitations are moot in view of new grounds of rejection necessitated by amendment. 
Applicant argues that “the claim language … clearly refers to ‘trained model parameters’ (referring to various model kernels, weights, biases, and the like as indicated by in ¶[0042] of the published specification as ‘operation parameter’..”. (Remarks, page 10, paragraph 2, line 6.)  Though the argument is made moot by the amendment, it is relevant for the interpretation of the new amendment.  Therefore, Examiner addresses the interpretation. Examiner notes there is no support in the specification for “converting the trained model parameters of each network layer into a preset format different from their original format” of claim 1, as interpreted by Applicant.  Nowhere is “converting trained model parameters” mentioned in the specification.  Nowhere is “converting model parameters” mentioned in the specification. Paragraph [0042] recites “The operation parameter of each network layer of the initial network model may be considered as a weight of each network layer of the initial neural network model. The operation parameter of each network layer of the initial neural network model determines functions of the initial neural network model. During the process of training the neural network model, what is learned and adjusted is mainly the operation parameters of each network layer of the neural network model.”.   (Specification, paragraph [0042], line 2.)  It should be noted that “operation parameter” has two meanings in this paragraph.  One meaning is singular and is a summary parameter of each network layer.  The other mention, which could also be interpreted as singular, describes the “operation parameters” that are adjusted during training.  However, the claimed invention is directed to deploying a trained neural network, not training a neural network.  (See claim 1, line 1 “A neural network model deployment method…”. This is further supported by the fact that the only other mentions in the specification, such as “convert the operation parameter of each network layer into a predetermined format”, are singular. (Specification of the instant application, paragraph [0050].) Therefore, “operation parameter” is interpreted as a summary parameter. 
	Since there is no mention of “converting trained model parameters” in the specification, Examiner interprets “converting trained model parameters” as  “convert the operation parameter”  as recited in the specification.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claim 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	The amended limitation of “reducing a quantity of looping in convolution calculation from six or more to three ” in claim 1 is indefinite because the claim appears to recite an active step of reducing the quantity of loops, but the claimed invention is directed to deploying a neural network with a defined algorithm, not a method of transforming one algorithm to another.  
	The specification describes ”For example, original six loop layers in implementation of the convolution layer may be reduced to three loop layers.” (paragraph [0085], line 4.)  Then, the specification further explains how the algorithm of FIG. 7 is better than the algorithm of FIG. 6: “Using stride = 1 as an example, six loop layers are required if output is calculated by using a conventional convolution calculation method, and pseudocode may be as shown in FIG. 6. In the embodiment of this application, kernel function pseudocode of the GPU may be as shown in FIG. 7.  Original three loop layers: loop 1, loop 2, and loop 3, are calculated and scheduled in parallel by the GPU, leaving only three loop layers of calc_output (oc, y, and x).” (Paragraph [0087], line 3.) Examiner notes that this is a comparison between two predefined algorithms  (as depicted in FIG(s) 6 and 7, respectively) and is not an active transformation from one algorithm to another as part of the deployment method. Therefore it is unclear whether the claim should be interpreted as transforming an algorithm as could be interpreted from the claims, or using a predefined algorithm as described in the specification. (See MPEP 2173.03 Correspondence Between Specification and Claims [R-08.2017])  Further, the reduction is pseudocode and only for illustration purposes “For example, original six loop layers in implementation of the convolution layer may be reduced to three loop layers.” (Paragraph [0085], line 4.)
	For purposes of examination, this limitation has been interpreted as only requiring an algorithm that reduces the typical number of loops required for a convolution from 6 to 3, through parallel scheduling, or through the use of a convolution kernel algorithm as depicted in FIG. 7, and as recited in the example of paragraph [0087].
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7, 11-15, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al (From High-Level Deep Neural Models to FPGAs, herein Sharma), Jia et al (Caffe: Convolutional Architecture for Fast Feature Embedding, herein Jia), Wang et al (PipeCNN: An Open CL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks, herein Wang), and Kim et al (Performance Analysis of CNN Frameworks for GPUs, herein Kim).
Regarding claim 1,
	Sharma teaches a neural network model deployment method, applied to a terminal device, the method comprising (Sharma, page 1, column 1, paragraph 2, line 1 “We use DNNWEAVER to generate accelerators for a set of eight different DNN models and three different FPGAs, Xilinx, Zynq, Altera Stratix V, and Altera Aria 10.” In other words, DNNWEAVER is a neural network model deployment method, and FPGA is a terminal device.):
	reading an initial neural network model (Sharma, page 2, column 2, paragraph 1, line 12 “The input to DNNWEAVER is a high-level specification of the DNN in “Berkeley Caffe Format” [1]. In other words, input is reading, and high-level specification of the DNN is an initial neural network model.) , wherein
	the initial neural network model comprises a convolutional neural network (CNN) model (Sharma, Figure 2. “Type: CONVOLUTION”  In other words, the convolution layer shows that the initial neural network model can be a convolutional neural network.) ;
	obtaining a layer definition of each network layer of the initial neural network model (Sharma, see above mapping, and Figure 2. 

    PNG
    media_image1.png
    223
    544
    media_image1.png
    Greyscale

In other words, specification is initial neural network model, and from Figure 2, DNN programming interface shows a layer definition of a network layer of the specification.);
	obtaining trained model parameters of each network layer of the initial neural network model (Sharma, Figure 2. “convolution_param{ num_output: 20” and “pooling_param{ pool: MAX”.  Examiner notes, there is no mention of a “trained model parameter” in the specification of the instant application. The specification recites “The Net class is preset with a Layer class identifier (var Layers, configured to indicate a connected Layer class), a layer loading method (add Layer), a model parameter loading method (load model),…” (Specification, paragraph [0068], line 6.).  Later, the specification recites “..the initial neural network model trained by using the Tensorflow learning framework may be read, to obtain a model definition file (model.para) and a model parameter file (model.bin).” (Specification, paragraph [0070], line 2.).  However, there is no description that distinguishes a “trained model parameter” from a parameter.  In addition, Sharma cites Caffe which provides pretrained models which includes parameters that have resulted from training.  For the purpose of examination, Examiner is interpreting “trained model parameter” as a parameter. In other words, 20 and MAX are examples of model parameters of each network layer of the initial neural network model.) ;
	executing, by using a [layer class], a target network layer corresponding to each network layer in the terminal device separately according to the layer definition of each network layer, so that each target network layer is inherited from the [layer class] (Sharma, Figure 4, and page 2, column 2, paragraph 2, line 6 “The Translator converts the DNN’s specification to our macro dataflow instruction set architecture (ISA).  Each instruction in the ISA represents a node in the macro dataflow graph of the DNN model.”

    PNG
    media_image2.png
    432
    533
    media_image2.png
    Greyscale

In other words, ISA instructions of the macro dataflow are target network layers corresponding to each network layer, and instructions are a layer class that each specific creation of a target network layer inherits from.) ;
	applying relational connections amongst the target network layers using a [net class] (Sharma, page 2, column 2, paragraph 3, line 1 “Design Planner accepts the instructions representing the macro dataflow graph of the DNN…,” and line 4 “The Design Planner then partitions the computation of each layer to groups of operations that share and reuse data.  We refer to each group’s output as a slice.  The slice is spilled to memory after computation and the accelerator proceeds to compute the next slice.” In other words, partitions the computations of each layer… output as a slice… the slice is spilled to memory after computation and the accelerator proceeds to compute the next slice is applying relational connections amongst the target network layers, and groups of operations is a net class.) ;
converting the trained model parameters of each network layer into a preset format different from their original format (As described above, there is no mention of “trained model parameters” in the specification.  As such, there is no mention of “converting trained model parameters”.  The specification does recite converting the operation parameter (singular). Examiner interprets “converting the trained model parameters” as “converting the operation parameter”. (See paragraph 7.a. above.) Sharma, Figure 4, and,  page 2, column 2, paragraph 2, line 6 “The Translator converts the DNN’s specification to our macro dataflow instruction set architecture (ISA).” In other words, Translator converts is converting the operation parameter of each network layer, macro dataflow instruction set architecture is a preset format different from the format of the original DNN specification, and Figure 4 shows the breakdown of the converted parameters by bit position is a preset format different from their original format.);
	obtaining target model parameters of each network layer based on the preset format (Sharma, Figure 4, and, page 4, column 1, paragraph 3, line 1 “This instruction type performs the convolution operation by sliding a window over its inputs.  The dimensions of the window and the sliding stride are encoded in bits 15-0, with six bits for the width and the height of the window, and four bits for the sliding stride.  If these bits are not enough for specifying the window dimensions, bit 16 is set to 1 and IWORD2 is used to specify the structure of the window.  The other fields in the IWORD1 are similar to the input instruction.  After the IWORDs, an array of immediate values is used to specify the weights for the convolution operation.” In other words, dimensions of windows, sliding stride, and weights are target model parameters, from the above mapping, Translator converts the DNN specification to our macro dataflow instruction set architecture is obtain a target parameter of each network layer, and Figure 4 shows the breakdown of the converted parameters by bit position is a preset binary format.) ;
	loading corresponding target model parameters in the target network layer corresponding to each network layer separately according to the target t model parameters of each network layer (Sharma, from above mapping, dimensions of the window and the sliding stride are encoded in bits 15-0 is loading a corresponding target model parameter in the target network layer corresponding to each network layer according to the target model parameter of each network layer.);
	obtaining a target neural network model deployed in the terminal device based on the target model parameters of each network layer (Sharma, Figure 3. 

    PNG
    media_image3.png
    209
    1155
    media_image3.png
    Greyscale

In other words, loading the Synthesizable Accelerator into the FPGA and the DNN Model Layout into the memory is obtaining a target neural network model deployed in the terminal device based on the target operation parameter of each network layer.); and
	[reducing a quantity of looping in convolution calculation from six or more to three by scheduling],
	[during execution of the target neural network model in a graphics processing unit (GPU) of the terminal device, parallel processing of three dimensions of width, height, and output channel of the target neural network model.]
Sharma performs all of the recited method steps, but is silent regarding performing the method steps by “using a layer class” or “using a net class”.  However, Sharma teaches using Caffe (page 1, column 1, paragraph 1, line 11 “DNNWEAVER, a framework that automatically generates a synthesizable accelerator for a given (DNN, FPGA) pair from a high-level specification in Caffe [1]”) as does the claimed invention (specification of the instant application, paragraph [0069] “In the embodiment of this application, by using a framework body shown in FIG. 4, for an initial neural network model trained by using a Tensorflow learning framework, an initial network model trained by using Caffe learning framework (which is a convolutional neural network framework), and the like, general deployment of the neural network model into the terminal device can be implemented.”).  Berkeley Caffe is written in C++ which is an object oriented language.  As such, everything written in Caffe is a class or a method of a class.  Therefore, it would appear that Sharma, which uses Caffe, teaches classes. Nevertheless, Jia provides a more complete description of Berkeley Caffe Format and is combined into Sharma.	Jia teaches Berkeley Caffe Format; (Jia, page 675, column 1, paragraph 1, line 2 “The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.” And, Figure. 1, and, page 677, column 1, paragraph 3, line 1 “Caffe does all the bookkeeping for any directed acyclic graph of layers, ensuring correctness of the forward and backward passes.”

    PNG
    media_image4.png
    215
    1161
    media_image4.png
    Greyscale

	Both Jia and Sharma are directed to simplifying the deployment of deep neural networks.  In view of the teaching of Sharma, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Jia into Sharma.  This would result in being able create, train and deploy general purpose deep neural networks using a common object-oriented programming interface.
	One of ordinary skill in the art would be motivated to do this in order to have an easy to use, open source framework for developing deep neural networks. (Jia, page 1, column 2, paragraph 3, line 1 “While deep neural networks have attracted enthusiastic interest within computer vision and beyond, replication of published results can involve months of work by a researcher or engineer. Sometimes researchers deem it worthwhile to release trained models along with the paper advertising their performance. But trained models alone are not sufficient for rapid research progress and emerging commercial applications, and few toolboxes offer truly off-the-shelf deployment of state-of-the-art models – and those that do are often not computationally efficient and thus unsuitable for commercially deployment.  To address such problems, we present Caffe, a fully open-source framework that affords clear access to deep architectures.”). 
	Thus far, the combination of Sharma and Jia does not explicitly teach reducing a quantity of looping in convolution calculation from six or more to three by scheduling.
	Wang teaches reducing a quantity of looping in convolution calculation from six or more to three by scheduling (Wang, Fig. 3. See paragraph 9. Examiner is interpreting “reducing a quantity of looping…” as using an algorithm that performs in 3 loops per convolution what the typical algorithm requires 6 loops to accomplish, such as the convolution kernel depicted in FIG. 7 of the claimed invention in the example described in paragraph [0087] of the specification.  

    PNG
    media_image5.png
    593
    673
    media_image5.png
    Greyscale

In other words, Fig. 3 depicts pseudo-code for a convolution kernel that uses only 3 loops to perform a convolution.), during
	Both Wang and the combination of Sharma and Jia are directed to accelerating neural network inference, among other things.  The combination of Sharma and Jia disclose a neural network deployment model but do not explicitly teach a convolution kernel algorithm that can perform a convolution in 3 loops. Wang discloses a convolution kernel algorithm that performs a convolution in 3 loops.  In view of the teaching of the combination of Sharma and Jia, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching Wang into the combination of Sharma and Jia.  This would result in a neural network model deployment method that has a convolution kernel algorithm.
	One of ordinary skill in the art would be motivated to do this in order to improve performance and reduce resource requirements. (Wang, page 1, Abstract, line 14 “In this work, we propose an FPGA accelerator with a new architecture of deeply pipelined Open CL kernels.  Data reuse and task mapping techniques are also presented to improve design efficiency. The proposed schemes are verified by implementing two representative large-scale CNNs, AlexNet and VGG on Altera Stratix-V A7 FPGA.  We have achieved a similar peak performance of 33.9 GOPS with a 34% resource reduction on DSP blocks compared to previous work.”).
	Thus far, the combination of Sharma, Jia and Wang does not explicitly teach execution of the target neural network model in a graphics processing unit (GPU) of the terminal device, parallel processing of three dimensions of width, height, and output channel of the target neural network model
	Kim teaches execution of the target neural network model in a graphics processing unit (GPU) of the terminal device, parallel processing of three dimensions of width, height, and output channel of the target neural network model (Kim, page 59, column 1, paragraph 2, line 1 “Interestingly, the computational complexity of this matrix multiplication method is the same as that of the direct convolution.  However, matrix multiplication can be easily parallelized using highly efficient BLAS libraries [31]. Moreover, it enables exploiting the GPU local memory that has low latency and high bandwidth. cuDNN performs matrix multiplication by applying tiling to the matrices in the GPU local memory.  This method scales well with a small batch size and can be used on all types of convolution layers.” And, page 59, column 1, paragraph 4, line 1 “Winograd convolution is based on GEMM [31]. It reduces the complexity using Winograd’s minimal filtering algorithm [17].  It is similar to the well-known Strassen’s algorithm [32] for matrix multiplication.  It reduces multiplication operations significantly when the kernel size is fixed.  Thus, a different kernel size requires a different minimal filtering algorithm.  The minimal filtering algorithm for 4X4 tiled matrix can reduce 12 multiplications to 6.  Nesting the minimal filtering algorithm twice would reduce the algorithm complexity by a factor of 4 [17].” And, page 58, column 1, paragraph 3 line 1 “Since multiple 2D feature maps (or images) are typically processed in a layer at a time, the feature map data in a CNN can be treated as four-dimensional tensors < N, C, H, W >, where N, C, H, and W are the number of images in a batch, the number of channels (i.e., feature maps), the height of a feature map (or an image), and the width of a feature map (or an image), respectively.” And, specification, paragraph [0097], line 12 “Further, the embodiment of this application can optimize the implementation of the convolution layer based on GPU parallel operating, thus improving the speed of forward prediction of a CNN model.” In other words, matrix multiplication can be easily parallelized (e.g. by Winograd’s algorithm) is reducing a quantity of loop layers by scheduling, exploiting the GPU local memory is during execution of the target neural network model in a graphics processing unit (GPU), and matrix multiplication can be easily parallelized is parallel processing, and four-dimensional tensors is three dimensions of width, height, and output channel.)
	Both Kim and the combination of Sharma, Jia, and Wang are directed to improving the performance of convolutional neural networks (CNNs), among other things.  In view of the teaching of the combination of Sharma, Jia, and Wang it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Kim into the combination of Sharma, Jia, and Wang.  This would result in increasing the speed of inference by making use of parallel processing through the use of scheduling and graphics processing units (GPUs).
	One of ordinary skill in the art would be motivated to do this to improve performance by improving the implementation of the CNN model. (Kim, page 55, column 2, paragraph 2, line 1 “Users choose a deep learning framework to build their CNN models base on language interfaces, operating system support, performance, ease of use, etc..  Ideally, the execution time of the same CNN model should be the same across all the frameworks when the input is the same.  However, in reality, this is not true; a CNN model built with a framework delivers more than twice the performance compared to the same model built with a different framework [19]. [20].”)
Regarding claim 2,
the combination of Sharma, Jia, Wang, and Kim teaches the neural network model deployment method according to claim 1, wherein
	a layer definition of a network layer comprises: name information, class information, and initialization information of the network layer; and (Sharma, Figure 2. In other words, name is name, type is class information, and bottom, top, and params are initialization information.)
	wherein the executing, by using the layer class, the target network layer corresponding to each network layer in the terminal device separately according to the layer definition of each network layer, so that each target network layer is inherited from the layer class comprises: (Sharma, Figure 4.  Instructions represent layers, these instructions can be considered classes because they are generic until the specific target parameters are loaded into the appropriate bits, thus customizing the layer for the specific DNN.  Therefore, the specific target layer of the DNN inherits its structure from the predefined instruction.)
	adjusting, for any network layer of the initial neural network model and by using the layer class as a base class in the terminal device, a preset name attribute of the layer class according to the name information of the network layer; (Sharma, Figure 4.  And, page 3, column 2, paragraph 4, line 6 “Since the ISA is a dataflow architecture, each instruction is assigned a unique 24-bit static ID and none of the instructions include source operands.”
And, page 4, column 1, paragraph 2, line 2 “Translation from Caffe to this ISA is straightforward since the instructions match the DNN layers.” In other words, conv is a preset name attribute of the layer class, convolution is name information of the network layer, and translation of a convolutional layer in Caffe to the conv instruction of the ISA is adjusting a preset name attribute of the layer class according to the name information of the network layer.)
	adjusting a preset layer class attribute of the layer class according to the class information of the network layer; and (Sharma, page 2, column 2, paragraph 2, line 5 “The Translator converts the DNN’s specification to our macro dataflow instruction set architecture (ISA).”  In other words, Translator converts is adjusting the operation parameter of each network layer, and our macro dataflow instruction set architecture has preset layer class attributes.)
	performing, according to the initialization information of the network layer, an initialization operation by using a preset initialization method of the layer class, to obtain the target network layer corresponding to each network layer, so that each obtained target network layer is a derived class of the layer class.  (Sharma, Figure 2, and page 2, column 2, paragraph 1, line 12 “The input to DNNWeaver is a high-level specification of the DNN in Berkeley Caffe format[1].  Caffe is a widely used open-source deep learning framework that takes the DNN specification as input and computes the given model on CPUs and GPUs.  The code snippet in Figure 2 shows how two DNN layers, convolution and pooling, are described and connected in Caffe.  Section 3 describes the functionality of DNN layers in detail.” page 2, column 2, paragraph 2, line 6 “The Translator converts the DNN’s specification to our macro dataflow instruction set architecture (ISA).  Each instruction in the ISA represents a node in the macro dataflow graph of the DNN model.  Note, that the accelerator does not directly execute these instructions. DNNWEAVER compiler statically maps these instructions to control signals in the accelerator and creates an execution schedule.  We choose this abstraction to provide a unified hardware-software interface and enable layer-specific optimizations in the accelerator microarchitecture without exposing them to the software.  As Figure 3 illustrates, DNNWEAVER automatically transforms the programmer-provided DNN model to an accelerator by generating FPGA synthesizable Verilog code.”

    PNG
    media_image6.png
    216
    538
    media_image6.png
    Greyscale

 In other words, convert’s the DNN specification is performing, according to the initialization information, an initialization operation, converting the operation parameters of each network layer (see Figure 2) to the target network layer is using a preset initialization method of the layer class, macro dataflow instruction set architecture is a preset format, and the architecture is a class from which each network layer instruction is derived from.)
Regarding claim 3,
the combination of Sharma, Jia, Wang, and Kim teaches the neural network model deployment method according to claim 1, wherein
the loading the corresponding target model parameters in the target network layer corresponding to each network layer separately according to the target model parameters of each network layer comprises: (Sharma, Figure 4, See mapping in claim 1.):
	loading the corresponding target model parameters in the target network layer corresponding to each network layer by using a preset parameter loading method of the layer class separately according to the target model parameters of each network layer  (Sharma, Figure 3. In other words, loading the Synthesizable Accelerator into the FPGA and the DNN Model Layout into the memory is loading the corresponding target model parameters in the target network layers, for each layer according to the target model parameter of each network layer.). 
Regarding claim 4,
the combination of Sharma, Jia, Wang, and Kim teaches the neural network model deployment method according to claim 1, wherein
the connecting the target network layers by using the net class comprises: connecting the target network layers by using the net class according to a connecting structure of the network layers of the initial neural network model, so that a connecting structure of the target network layers corresponds to the connecting structure of the network layers of the initial neural network model (Sharma, page 2, column 2, paragraph 2, line 6 “The Translator converts the DNN’s specification to our macro dataflow instruction set architecture (ISA).  Each instruction in the ISA represents a node in the macro dataflow graph of the DNN model.”  In other words, DNN specification is the initial neural network model, the converted DNN specification in the macro dataflow instruction set architecture is the target network layers, and converts the DNN’s specification to the macro dataflow instruction set architecture is connecting structure of the target network layers corresponds to the connecting structure of the network layers of the initial neural network model.).
Regarding claim 5,
the combination of Sharma, Jia, Wang, and Kim teaches the neural network model deployment method according to claim 4, wherein,
adding a network layer to the target neural network model by using a preset network layer adding method of the net class, wherein the added network layer is inherited from the layer class; and/or reading and loading, by using a preset parameter reading method of the net class, a parameter required for operating the target neural network model; and/or operating the target neural network model by using a preset prediction method, to perform forward prediction (Sharma, Figure 3, Figure 4, and page 2, column 2, paragraph 2, line 1 “The Translator converts the DNN’s specification to our macro dataflow instruction set architecture (ISA). Each instruction in the ISA represents a node in the macro dataflow graph of the DNN model.  Note that the accelerator does not directly execute these instructions.  DNNWEAVER compiler statically maps these instructions to control signals in the accelerator and creates an execution schedule.” And, page 4 column 1, paragraph 1, line 2 “Instead, a part of instruction opcode encodes the unique static ID of the destination instruction that will receive the results.”  And, page 4, column 1, paragraph 9, line 1 “The output instruction does not have a destination instruction.  It writes the outputs of the DNN to the memory address specified in the immediate values.” In other words, instruction in the ISA is network layer, converts the DNN specification is adding network layer, from Figure 4, instruction inherits from the eight predefined instructions, and, from Figure 3, Execution Schedule and Resource Allocation is preset prediction method to perform forward prediction.).
Regarding claim 7,
the combination of Sharma, Jia, Wang, and Kim teaches the neural network model deployment method according to claim 1, further comprising:
increasing a quantity of target points continuously calculated in a loop (Sharma, page 5, column 2, paragraph 3, line 1 “Convolution operations within an output feature map have no data dependencies on each other, and can be executed in parallel. These parallel calculations are performed by the PEs within a PU.”  In other words, within an output feature map is target points, and executed in parallel is increasing a quantity… calculated in a loop.);
reading, during calculation of a first target point, all pixels corresponding to the first target point (Sharma, page 5, column 1, paragraph 1, line 3 “The Design Planner also generates a static schedule for the Data Buffer to fetch data from the external memory and feed the PUs through the inter-PU bus.” In other words, data is pixels, and fetch data from the external memory is reading all pixels corresponding to the first target point.); and
reusing pixels the same as those in a previously calculated target point and re-reading pixels different from those in the previously calculated target point during calculation of a non-first target point  (Sharma, page 5, column 2, paragraph 5, line 1 “Figure 11 shows another optimization that reduces remote data transfer through re-use.  Convolution windows that produce adjacent outputs share input elements.  Therefore, PEs computing adjacent outputs elements use partially shared data.  We add a dedicated unidirectional link between adjacent PEs to forward these shared input elements.  The arrows in Figure 11 show this data forwarding to re-use data for convolution.” 

    PNG
    media_image7.png
    343
    560
    media_image7.png
    Greyscale

In other words, data is pixels, and re-use data for convolution is re-using pixels from those in the previously calculated target point.).
Claims 11-15, and 17 are neural network deployment apparatus claims corresponding to neural network model deployment method claims 1-5, and 7 respectively.  Otherwise, they are the same.  It is implicit that a neural network model deployment method requires a neural network deployment apparatus in order to be deployed.  Therefore, claims 11-15, and 17 are rejected for the same reasons as claims 1-5, and 7, respectively.
Claim 18 is a non-transitory computer-readable medium claim corresponding to the neural network model deployment method claim 1.  Otherwise, they are the same.  It is implicit that a neural network model deployment method requires at least one non-transitory computer-readable medium in order to deploy.  Therefore, claim 18 is rejected for the same reasons as claim 1.
Regarding claim 19,
the combination of Sharma, Jia, Wang, and Kim teaches the non-transitory computer readable medium of claim 18, wherein 
a layer definition of a network layer comprises: name information, class information, and initialization information of the network layer (Sharma, Figure 2, In other words, name is name, type is class information, and convolution_param and pooling_param are initialization information of the network layers.) .
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124