DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are pending.
Claim 11 is objected to for minor informalities.
Claims 1-20 are rejected under 35 USC 103.

Information Disclosure Statement
The information disclosure statements (IDS) were submitted on 10/17/2017 and 02/20/2019. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claim 11 is objected to because of the following informalities:  
Claim 11 recites “processing completing”. This appears to be a typographical error for “. 
 Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 1-2, 4-6, 9-11, 13-14, 16-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over “Temam” (US 2019/0050717 A1) in view of “Woo” (US 2017/0220352 A1, incorporated into Temam by reference), further in view of “Chakradhar” (A Dynamically Configurable Coprocessor for Convolutional Neural Networks).

	Regarding claim 1, Temam teaches
	A method comprising: writing by a host computer system, a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator;  (Abstract describes a neural network accelerator comprising memory banks. Figure 1, element 102 shows a controller comprising memories 104 and 106 which receives data from a host via an interface 108). Elements 104 and 106, taken together, are a “memory shared with a neural network accelerator” or a “shared memory” since the host may dictate what is written there and the neural network accelerator may access it. [0035] indicates that the host may provide instructions which are written to memory. This is further described at [0056]. In particular, the host may provide multidimensional arrays (i.e., matrices) including data including weights.)
	assembling … instructions into an instruction package by the host computer system, ([0056] goes on to indicate that the host may provide instructions for executing a the neural network. 
	…respective offsets of weight matrices in a shared memory; ([0076] indicates that the tensor data (e.g., weight matrices) may be stored by determining an offset of the tensors. This is further described in Woo (US 2017/0220352 A1), incorporated by reference in Temam. Woo, [0004] describes using an offset to identify a memory location. [0007-0009] describes storing a  value at a location. [0038-0039] indicates that the host may control the memory addresses.)
	writing input data and the instruction package by the host computer system to the shared memory; ([0035, 0056] describe writing instructions and input activations (i.e., input) to the memory. [0007] indicates that the first memory bank 104 may store input activations (i.e., input data). The inputs are further described at [0028-0029].)
	reading the instruction package from the shared memory by the neural network accelerator; and processing the plurality of per-layer instructions of the instruction package by the neural network accelerator (Figure 9, element 906 shows the accelerator processing the inputs to determine an output. [0034-0035] indicates that the accelerator may receive and execute instructions received from the host. The instructions are further described at [0040-0041]. The controller necessarily reads the instructions from the memory to execute them.) 
	Temam does not appear to explicitly teach 
	assembling a plurality of per-layer instructions into an instruction package by the host computer system, each per-layer instruction specifying processing of a respective layer of the plurality of layers of the neural network, and
	However, Chakradhar—directed to analogous art—teaches
	assembling a plurality of per-layer instructions into an instruction package by the host computer system, each per-layer instruction specifying processing of a respective layer of the plurality of layers of the neural network, and (Abstract describes implementing a neural network using dynamically configurable hardware. Figure 3 provides an overview of the system, including a host in communication with the neural network accelerator. Section 5.1 gives an overview of the system. In 
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify Temam and Woo to include the neural network acceleration processing and architecture taught by Chakradhar as described above because this system performs between 1.2x and 3.5x faster than similar architectures not employing the techniques taught by Chakradhar as described at the end of the Conclusion, section 8.
	
	Regarding claim 2, the rejection of claim 1 is incorporated herein. Furthermore, Temam teaches
	wherein the writing of the plurality of weight matrices includes writing all of the plurality of weight matrices to the shared memory before the processing of the plurality of per-layer instructions. (Figure 9, element 902 and [0117] indicates that the weights are loaded (as described above) prior to the NN being executed (i.e., before the instructions are executed). Shen teaches the plurality of per-layer instructions as described in the rejection of claim 1.)

 	Regarding claim 4, the rejection of claim 1 is incorporated herein. Furthermore, Temam teaches
	further comprising communicating from the host computer system to the neural network accelerator a parameter indicative of a base address in the shared memory of the weight matrices. ([0038-0039] indicates that the host may control addressing of the memory. Moreover, [0076] indicates that determining an address may include determining a base address. In the embodiment described at [0038-0039] in which this is controlled by the host, the host communicates the address, including the base and offset.)

	Regarding claim 5, the rejection of claim 1 is incorporated herein. Temam does not appear to explicitly teach, but Chakradhar teaches
	wherein the processing the plurality of per-layer instructions includes: processing a first per-layer instruction followed in succession by processing a second per-layer instruction of the instruction package; (Abstract describes implementing a neural network using dynamically configurable hardware. Figure 3 provides an overview of the system, including a host in communication with the neural network accelerator. Section 5.1 gives an overview of the system. In particular, the second paragraph indicates that the CNN compiler runs on the host and translates the neural network into a parallel microprogram (i.e., a sequence of low-level VLIW instructions) which are then to be executed by the coprocessor (i.e., accelerator). The configurations are described in section 6.2. In particular, the second to last paragraph indicates that the instructions may be configured for each layer of the neural network. Section 2 indicates that the layers may be performed sequentially.)
	reading input data from a first portion of the shared memory and writing output data to a second portion of the shared memory in processing the first per-layer instruction; and reading input data from the second portion of the shared memory and writing output data to the first portion of the shared memory in processing the second per-layer instruction. (Section 5.3. describes the memory subsystem. In particular, the last paragraph indicates that an input may be read from a first portion (e.g., Bank 1) when processing a first instruction and an output written to a second portion (e.g., Bank 2). The second portion (e.g., Bank 2) may then be read in a subsequent instruction and an output written to the first portion (e.g., Bank 1). )
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 1. Moreover, Section 5.3., first paragraph indicates that the memory sub-system selected by Chakradhar may outperform similar systems with higher aggregate bandwidth and storage.

	Regarding claim 6, the rejection of claim 1 is incorporated herein. Furthermore, Temam teaches
	communicating from the host computer system to the neural network accelerator a first parameter indicative of an address in the shared memory of a first portion of a shared buffer and a second parameter indicative of an offset in the shared buffer of a second portion of the shared buffer; ([0038-0039] indicates that the host may control addressing of the memory. Moreover, [0076] indicates that determining an address may include determining a base address. In the embodiment described at [0038-0039] in which this is controlled by the host, the host communicates the address, including the base and offset.)
	Temam does not appear to explicitly teach, but Chakradhar—directed to analogous art—teaches
	wherein the processing the plurality of per-layer instructions includes: 
processing a first per-layer instruction followed in succession by processing a second per-layer instruction of the instruction package; (Abstract describes implementing a neural network using dynamically configurable hardware. Figure 3 provides an overview of the system, including a host in communication with the neural network accelerator. Section 5.1 gives an overview of the system. In particular, the second paragraph indicates that the CNN compiler runs on the host and translates the neural network into a parallel microprogram (i.e., a sequence of low-level VLIW instructions) which are then to be executed by the coprocessor (i.e., accelerator). The configurations are described in section 6.2. In particular, the second to last paragraph indicates that the instructions may be configured for each layer of the neural network. Section 2 indicates that the layers may be performed sequentially.)
	reading input data from the first portion of the shared buffer and writing output data to the second portion of the shared buffer in processing the first per-layer instruction; and reading input data from the second portion of the shared buffer and writing output data to the first portion of the shared buffer in processing the second per-layer instruction. (Section 5.3. describes the memory subsystem. In particular, the last paragraph indicates that an input may be read from a first portion (e.g., Bank 1) when processing a first instruction and an output written to a second portion (e.g., Bank 2). The second portion (e.g., Bank 2) may then be read in a subsequent instruction and an output written to the first portion (e.g., Bank 1). )
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given 

	Regarding claim 9, the rejection of claim 1 is incorporated herein. Furthermore, Temam teaches
	wherein a first per-layer instruction and a second per-layer instruction of the plurality of per-layer instructions specify different sets of neural network operations.  ([0013, 0060, 0085] indicate that the neural network layers may differ, requiring different operations to be performed.)

	Regarding claim 10, the rejection of claim 1 is incorporated herein. Furthermore, Temam teaches
	includes processing the plurality of … instructions in the instruction package in order of appearance in the instruction package. ([0044] indicates that the instructions are executed in the same order in which they arrived on the instruction bus (i.e., the order in which they appeared in the instruction package. In the combination with Chakradhar described above, the instructions may be per-layer instructions. Examiner notes that the claim does not appear to specify any particular order of appearance to be used. The BRI of appearance includes an order in which they were added to the package, an order in which they appear from the package (i.e., are implemented) or an order related to a particular (unclaimed) format of the package.)
	Temam does not appear to explicitly teach, but Chakradhar teaches
	wherein the processing the plurality of per-layer instructions includes processing the plurality of per-layer instructions (Abstract describes implementing a neural network using dynamically configurable hardware. Figure 3 provides an overview of the system, including a host in communication with the neural network accelerator. Section 5.1 gives an overview of the system. In particular, the second paragraph indicates that the CNN compiler runs on the host and translates the neural network into a parallel microprogram (i.e., a sequence of low-level VLIW instructions) which are then to be executed by the coprocessor (i.e., accelerator). The configurations are described in section 6.2. In particular, the second to last paragraph indicates that the instructions may be configured for each layer of the neural network.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 1.

	Regarding claim 11, the rejection of claim 1 is incorporated herein. Furthermore, Temam teaches
	processing completing execution of instruction i before commencing execution of instruction i+1 for n instructions in instruction package and 1 <= i <= n. ([0044] indicates that the instructions are executed in the same order in which they arrived on the instruction bus (i.e., the order in which they appeared in the instruction package. In the combination with Chakradhar described above, the instructions may be per-layer instructions. There are necessarily an finite number of instructions (i.e., n instructions). We may take one of these to an ith instruction which is executed before the next instruction (i.e., an i+1th instruction).)
	Temam does not appear to explicitly teach
	wherein the processing the plurality of per-layer instructions includes
	However, Chakradhar teaches
wherein the processing the plurality of per-layer instructions includes processing completing execution of instruction i before commencing execution of instruction i+1 for n instructions in instruction package and 1 <= i <= n. (Abstract describes implementing a neural network using dynamically configurable hardware. Figure 3 provides an overview of the system, including a host in communication with the neural network accelerator. Section 5.1 gives an overview of the system. In particular, the second paragraph indicates that the CNN compiler runs on the host and translates the neural network into a parallel microprogram (i.e., a sequence of low-level VLIW instructions) which are then to be executed by the coprocessor (i.e., accelerator). The configurations are described in section 6.2. In particular, the second to last paragraph indicates that the instructions may be configured for each layer of the neural network. Section 2, second paragraph indicates that the output of a previous layer may be required for the performance of the next layer, in which case the preceding layer is completed before the following layer proceeds.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 1.

	Claims 13-14, 16-17, and 20 are substantially similar to claims 1-2, 4-5, and 10, and are rejected with the same rationale in view of Temam teaching, from claim 13:
	A neural network processing system, comprising: a shared memory; a host computer system coupled to the shared memory, wherein the host computer system is configured with instructions that when executed cause the host computer system to: (Figure 1 provides an overview of the system, including a shared memory elements 104 and 106, a host computer (not shown, but the system includes an interface to the host). [0118] indicates that functions may be performed via computer program instructions.) 

	Claims 3 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over “Temam” (US 2019/0050717 A1) in view of “Woo” (US 2017/0220352 A1, incorporated into Temam by reference), further in view of “Chakradhar” (A Dynamically Configurable Coprocessor for Convolutional Neural Networks) further in view of “Paul” (US 2018/0107919 A1).

	Regarding claim 3, the rejection of claim 1 is incorporated herein. Furthermore, Woo (incorporated into Temam by reference) teaches
	wherein the writing of the plurality of weight matrices… before the processing of the plurality of per-layer instructions. (Figure 9, element 902 and [0117] indicates that the weights are loaded (as described above) prior to the NN being executed (i.e., before the instructions are executed). Shen teaches the plurality of per-layer instructions as described in the rejection of claim 1.)
	The combination of Temam and Woo does not appear to explicitly teach
	includes writing all of the plurality of weight matrices to contiguous address space in the shared memory
	However, Paul—directed to analogous art—teaches
includes writing all of the plurality of weight matrices to contiguous address space in the shared memory ([0031] describes storing weights for a neural network contiguously in memory.)
	 It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify the combination above to store the weights in contiguous address spaces in the memory as taught by Paul because this improves the compactness in size of the memory as described by Paul at [0031].

	Claim 15 is substantially similar to claim 3 and is rejected with the same rationale in view of the rejection of claim 13.

	Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over “Temam” (US 2019/0050717 A1) in view of “Woo” (US 2017/0220352 A1, incorporated into Temam by reference), further in view of “Chakradhar” (A Dynamically Configurable Coprocessor for Convolutional Neural Networks), further in view of “Tucker” (US 2017/0124452 A1).
	
	Regarding claim 7, the rejection of claim 6 is incorporated herein. The combination of Temam, Woo, and Chakradhar does not appear to explicitly teach, but Tucker—directed to analogous art—teaches
	determining by the host computer system from a specification of the neural network, a size of the shared buffer based on a maximum of sizes of input matrices and output matrices referenced in the plurality of layers of the neural network. (Figure 2, elements 208 and 210, described at [0041 -0045], show partitioning the graph representing the network into subgraphs and assigning these to processing devices. [0045] indicates that the dimension of the tensors (i.e., inputs and output matrices) on each directed edge to and from each node of a subgraph is determined to determine a size of memory necessary to perform the operation. [0045] describes assigning each subgraph to a device that has memory capable of storing the largest tensor. Assigning a subgraph to a device is the same as allocating that device’s memory to the subgraph.).)


	Claim 18 is substantially similar to claims 6 and 7, and is rejected with the same rationale in view of the rejection of claim 13.

	Claims 8, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over “Temam” (US 2019/0050717 A1) in view of “Woo” (US 2017/0220352 A1, incorporated into Temam by reference), further in view of “Chakradhar” (A Dynamically Configurable Coprocessor for Convolutional Neural Networks), further in view of “Deisher” (US 2018/0121796 A1).

	Regarding claim 8, the rejection of claim 1 is incorporated herein. Furthermore, Temam teaches
	wherein the assembling the plurality of per-layer instructions includes specifying in one or more of the per-layer instructions, configuration parameters for…an activation function. ([0068] indicates that the information necessary for processing the neural network via the neural network accelerator may include data specifying an activation function (i.e., configuration parameters for the activation function).
	Temam does not appear to explicitly teach, but Deisher—directed to analogous art—teaches 
	wherein the assembling the plurality of per-layer instructions includes specifying in one or more of the per-layer instructions, configuration parameters for scaling, maxpool dimensions, and an activation function. (Abstract describes a neural network accelerator. [0048] indicates that in processing each layer of a neural network, data indicative of scale factors, weights, and other data may be retrieved. [0122-0133] provides further examples of the parameter to be specified including NPoolElements (i.e., the dimensions of a pool window) for a maxpool (see [0131]) along with activation value parameters (see [0127-0128])


	Regarding claim 12, the rejection of claim 1 is incorporated herein. The combination of Temam, Woo, and Chakradhar does not appear to explicitly teach, but Deisher teaches
	wherein the processing the plurality of per-layer instructions includes evaluating a finite state machine transition table for a state machine defined by the instruction package. ([0050] indicates that the registers (which may be considered a state machine) include the layer description including the data for executing the layer. This data is consulted when executing a layer and may control when execution proceeds to a next layer. This data may be included in a single external memory. That is, the instructions may all be part of the same package. The evaluation of the state machine/register is understood to be an evaluation of the state machine table (i.e., a logical embodiment of the state data).)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify the combination described above to include configuration parameters as taught by Deisher and described above because implementing the parallel logic structure (encompassing the finite state machine) may increase the efficiency of processing the network as described by Deisher at [0033] and [0044].

	Regarding claim 19, the rejection of claim 13 is incorporated herein. Furthermore, Temam teaches
	wherein the assembling the plurality of per-layer instructions includes specifying in one or more of the per-layer instructions, configuration parameters for…an activation function. ([0068] indicates that the information necessary for processing the neural network via the neural network accelerator may include data specifying an activation function (i.e., configuration parameters for the activation function).

	wherein the instructions that cause the host computer system to assemble the plurality of per-layer instructions include instruction that cause the host computer system to specify in one or more of the per-layer instructions, configuration parameters for convolution, matrix multiplication, scaling, maxpool dimensions, and an activation function. (Abstract describes a neural network accelerator. [0048] indicates that in processing each layer of a neural network, data indicative of scale factors, weights, and other data may be retrieved. [0122-0133] provides further examples of the parameter to be specified including NPoolElements (i.e., the dimensions of a pool window) for a maxpool (see [0131]) along with activation value parameters (see [0127-0128]). [0133] includes convolution configuration (e.g., NConvFilterElements) and matrix multiplication configuration (e.g., WeightArrayPtr, the weight array to use, and also [0124-0125], the elements to be used in the weight arrays).)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify the combination described above to include configuration parameters as taught by Deisher and described above because implementing the parallel logic structure (encompassing the particular configuration parameters) may increase the efficiency of processing the network as described by Deisher at [0033] and [0044].

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yu (US 2018/0046913 A1) – teaches optimizing a neural network using an accelerator in which an external memory is stored in communication with a CPU (i.e., a host) and the accelerator as shown in Figure 2. Could likely be used as an alternative primary reference.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Markus A Vasquez whose telephone number is (303)297-4432.  The examiner can normally be reached on Monday to Friday 9AM to 4PM MT.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on (571) 272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/M.A.V./Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121