DETAILED ACTION

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-4, 6-8, and 13-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Dally et al. (US Pub. No. 2018/0046900 A1)
 Regarding claim 1, Dally discloses, an inference apparatus, comprising: a plurality of PEs (Processing Elements); (See Dally ¶39, “At step 105, a first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a three-dimensional (3D) space are received. In one embodiment, the first vector is received from a memory. In one embodiment, the first vector is received from a memory. In one embodiment, the first vector is received by a processing element (PE) within a SCNN accelerator, such as the SCNN accelerator 200 described in conjunction with FIG. 2A.”)
and a control part that operates a convolution operation in a convolutional neural network using each of a plurality of pieces of input data and a weight group including a plurality of weights corresponding to each of the plurality of pieces of input data by controlling the plurality of PEs; (See Dally ¶55, “The core operation in a CNN layer is a two-dimensional sliding-window convolution of an R×S element filter over a W×H element input activation plane to produce a W×H element output activation plane. There can be multiple (C) input activation planes, which are referred to as input channels. A distinct filter is applied to each input activation channel, and the filter output for each of the C channels are accumulated together element-wise into a single output activation plane.  Multiple filters (K) can be applied to the same body of input activations to produce K output channels of output activations. Finally, a batch of length N of groups of C channels of input activation planes can be applied to the same volume of filter weights.”)
wherein each of the plurality of PEs executes a computation including multiplication of a single piece of the input data by a single weight and also multiplication included in the convolution operation using an element with a non-zero value included in each of the plurality of pieces of input data.  (See Dally ¶41, “At step 115, each one of the non-zero weight values is multiplied with every one of the non-zero input activation values, within a multiplier array, to produce a third vector of products.”)

Regarding claim 2, Dally discloses, the inference apparatus according to Claim 1, wherein the control part controls the plurality of PEs so as to perform in parallel some of multiplications between each of the plurality of pieces of input data and each weight corresponding to each of the plurality of pieces of input data.  (See Dally ¶44, “In one embodiment, the SCNN 200 is a processor and the PEs 210 are parallel processing units. ¶112 To increase parallelism beyond a single PE 210, multiple PEs 210 can be operated in parallel with each working on a disjoint three-dimensional tile of input activations.”)

Regarding claim 3, Dally discloses, the inference apparatus according to Claim 2, wherein the control part controls the plurality of PEs so as to execute in parallel some of multiplications between each of the plurality of pieces of input data and each of the plurality of weights included in a plurality of the weight groups, (See Dally Fig 2B where a plurality of Weights groups 230(1) …. 23(K) are multiplied by Input Activation 235(1).
See Dally ¶112, “To increase parallelism beyond a single PE 210, multiple PEs 210 can be operated in parallel with each working on a disjoint three-dimensional tile of input activations.”)
where the multiplications use weights belonging to different weight groups, but corresponding to the same input data.  (See Dally ¶55, “A distinct filter is applied to each input activation channel, and the filter output for each of the C channels are accumulated together element-wise into a single output activation plane.  Multiple filters (K) can be applied to the same body of input activations to produce K output channels of output activations. Finally, a batch of length N of groups of C channels of input activation planes can be applied to the same volume of filter weights.”)

Regarding claim 4, Dally discloses, the inference apparatus according to Claim 3, further comprising an input data division section that divides the plurality of pieces of input data and supplies the divided pieces of input data to the PEs that perform the some of multiplications in parallel.  (See Dally ¶55, “The core operation in a CNN layer is a two-dimensional sliding-window convolution of an R×S element filter over a W×H element input activation plane to produce a W×H element output activation plane.”)

Regarding claim 6, Dally discloses, the inference apparatus according to Claim 3, further comprising a result addition section that adds up the results of multiplications between each of the plurality of pieces of input data and a plurality of weights included in a weight group. (See Dally ¶125, “At step 125, the third vector is transmitted to an accumulator array, where each one of the products in the third vector is transmitted to an adder in the accumulator array that is configured to generate an output activation value at the position associated with the product.”)

Regarding claim 7, Dally discloses, the inference apparatus according to Claim 1, wherein each of the plurality of PEs comprises: an input data processing section that identifies a non-zero element having a non- zero value and the position of the non-zero element in received input data; (See Dally ¶74, “Similarly, the input activations buffer 310 and position buffer 320 deliver a vector of I non-zero input activations and the associated positions (e.g., coordinates) within the W×H.sub.t region, respectively.”)
a weight readout section that reads a value at the position corresponding to the position of the non-zero element out of weight elements to be multiplied by the received input data from a storage medium that stores the weight groups; (See Dally ¶74, “At each access, the weight buffer 305 and the position buffer 315 deliver a vector of F non-zero filter weights along with the associated positions (e.g. coordinates) within the K.sub.c×R×S region, respectively.”)
and a multiplication/addition section that multiplies the non-zero element by the value read by the weight readout section;  (See Dally ¶74, “Similar to the PTIS-dense dataflow, the F×I multiplier array 325 computes the full cross-product of F×I partial sum outputs, with no extraneous computations.”)
wherein the multiplication/addition section adds up the multiplication results when multiplications with respect to non-zero elements included in the received input data are completed. (See Dally ¶76, “The F×I arbitrated crossbar 335 routes F×I products to an array of A accumulator units based on the output positions associated with each product. …The address space is distributed across the A accumulator units and each accumulator unit includes a bank of addressable storage and an adder to accumulate a partial sum.”)

Regarding claim 8, Dally discloses, the inference apparatus according to Claim 7, wherein each of the plurality of PEs further comprises a computation result storage section that stores the addition results computed by the multiplication/addition section. (See Dally ¶78,  “Finally, when the computation for the output-channel group has been completed, the accumulator array 340 is drained and the compressed output activations are stored into the output activations buffer 350 and the output coordinates are stored into the indices buffer 355.”)

Regarding claim 13, Dally discloses, a convolution operation execution method for an inference apparatus that includes a plurality of PEs (Processing Elements) each executing a computation including multiplication of a single piece of input data by a single weight and that operates a convolution operation in a convolutional neural network using each of a plurality of pieces of input data and a weight group including a plurality of weights corresponding to each of the plurality of pieces of input data, the convolution operation execution method including: a step of causing each of the plurality of PEs to identify an element with a non- zero value included in each of the plurality of pieces of input data; and a step of causing each of the plurality of PEs to execute multiplication included in the convolution operation using the identified elements. (See the rejection of claim 1 as it is equally applicable for claim 13 as well.)

Regarding claim 14, Dally discloses, a non-transitory computer-readable storage medium storing a program causing a computer to execute: (See Dally claim 20, “A non-transitory, computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform steps.”)
identifying an element with a non-zero value included in each of the plurality of pieces of input data; and   executing multiplication included in the convolution operation using the identified elements, the computer being provided in an inference apparatus that includes a plurality of PEs (Processing Elements) each executing a computation including multiplication of a single piece of input data by a single weight and that operates a convolution operation in a convolutional neural network using each of a plurality of pieces of input data and a weight group including a plurality of weights corresponding to each of the plurality of pieces of input data. (See the rejection of claim 1 as it is equally applicable for claim 14 as well.)

Regarding claim 15, Dally discloses, the convolution operation execution method according to Claim 13, including: controlling the plurality of PEs so as to perform in parallel some of multiplications between each of the plurality of pieces of input data and each weight corresponding to each of the plurality of pieces of input data.  (See the rejection of claim 2 as it is equally applicable for claim 15 as well.)

Regarding claim 16, Dally discloses, the convolution operation execution method according to Claim 15, including: controlling the plurality of PEs so as to execute in parallel some of multiplications between each of the plurality of pieces of input data and each of the plurality of weights included in a plurality of the weight groups, where the multiplications use weights belonging to different weight groups, but corresponding to the same input data. (See the rejection of claim 3 as it is equally applicable for claim 16 as well.)

Regarding claim 17, Dally discloses, the convolution operation execution method according to Claims 16, including: dividing the plurality of pieces of input data, and supplying the divided pieces of input data to the PEs that perform the some of multiplications in parallel. (See the rejection of claim 4 as it is equally applicable for claim 17 as well.)

Regarding claim 18, Dally discloses, the non-transitory computer-readable storage medium storing a program according to Claim 14, causing the computer to execute: controlling the plurality of PEs so as to perform in parallel some of multiplications between each of the plurality of pieces of input data and each weight corresponding to each of the plurality of pieces of input data. (See the rejection of claim 2 as it is equally applicable for claim 18 as well.)

Regarding claim 19, Dally discloses, the non-transitory computer-readable storage medium storing a program according to Claim 18, causing the computer to execute: controlling the plurality of PEs so as to execute in parallel some of multiplications between each of the plurality of pieces of input data and each of the plurality of weights included in a plurality of the weight groups, where the multiplications use weights belonging to different weight groups, but corresponding to the same input data. (See the rejection of claim 3 as it is equally applicable for claim 19 as well.)

Regarding claim 20, Dally discloses, the non-transitory computer-readable storage medium storing a program according to Claim 18, causing the computer to execute: dividing the plurality of pieces of input data, and supplying the divided pieces of input data to the PEs that perform the some of multiplications in parallel. (See the rejection of claim 4 as it is equally applicable for claim 20 as well.)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US Pub. No. 2018/0046900 A1) in view of Goulding et al. (US Pub. No. 2019/0087708 A1).
Regarding claim 5, Dally discloses, the inference apparatus according to Claim 3, but he fails to disclose the following limitations.
However Goulding discloses, further comprising a standby section in which the computation stands by until the plurality of PEs that perform the some of multiplications in parallel complete.  (See Goulding ¶20, “The PEs 165 within the input layer 180 may then work in parallel to perform a computation on the weighted inputs and output a result. … All the PEs 165 of the second layer 185 may wait for all the PEs 165 of the input layer 180 to complete their computations prior to beginning computation.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the waiting for parallel PE computation to complete as suggested by Goulding to Dally’s PE multiplications using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is to ensure the next level input activation data is complete and all the data is available before beginning additional computations.

	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US Pub. No. 2018/0046900 A1) in view of Ito et al. (US Pub. No. 2010/0214936 A1).
Regarding claim 9, Dally discloses, the inference apparatus according to Claim 3, but he fails to disclose the following limitations.
However Ito discloses, wherein the control part causes each of the plurality of PEs to execute a plurality of processes of multiplying a piece of input data by weights included in different weight groups in time division.  (See Ito ¶173,  “By repeating the processes in steps S2701 to S2710, the calculations of feature planes based on the predetermined CNN network are executed while executing time-divisional processing for respective lines. Upon completion of all the sequences, the CNN processing unit 63 generates an interrupt to the CPU 68 in step S2711.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the time division convolutions computations as suggested by Ito to Dally’s convolutions by the PEs using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to efficiently perform PE computations.

	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US Pub. No. 2018/0046900 A1) in view of Brothers et al. (US Pub. No. 2018/0082181 A1).
Regarding claim 10, Dally discloses, the inference apparatus according to Claim 1, but he fails to disclose the following limitations.
However Brothers discloses, wherein the control part comprises: an evaluation index calculation section that calculates an evaluation index for evaluating the weight groups; (See Brothers ¶22, “In one embodiment, the neural network reordering may be selected to introduce an ordering to the weights to increase the ability to compress the weights (i.e., reduce the amount of data that is used to represent the NN). By reordering network layers, an ordering can be introduced to the weights that are selected to provide better weight compression. One option is to perform the reordering to improve compression by introducing a structure to the weights that aids in compressing them. For example, weights may be grouped or ordered by value. Still another option is to perform the reordering based on characteristics of a coding technique used for compression, such as Huffman coding or Golomb-Rice coding. As an example, feature maps can be reordered so that frequency distributions are sharper in a particular localized area. Additionally, the reordering may be selected to improve prediction accuracy in the encoding. As another example, network feature maps can be reordered so that weight values tend to increase or the number of zero value weights increase.”)
and a weight group sorting section that sorts the order of the weight groups to be multiplied by input data on the basis of the evaluation index. (See Brothers ¶19, “An optional optimization 404 of the trained network may be performed. The feature maps and/or weights are reordered to generate 405, a reordered version of the trained neural network.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the sorting of weights as suggested by Brothers to Dally’s neural network weights using known engineering techniques, with a reasonable expectation of success. The motivation for doing so as disclosed by Brothers is in order to optimize processing done by the neural network.


	Allowable Subject Matter
Claim 11 and 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Regarding claim 11, the inference apparatus according to Claim 10, wherein the control part further comprises an order information notification section that notifies other layers of order information regarding the sorting of the weight groups.  (The disclosed prior art of record fails to disclose the limitations of this claim.)

Regarding claim 12, the inference apparatus according to Claim 11, wherein the control part further comprises: an order information acquisition section that acquires the notified order information; and a weight sorting section that changes associations between input data and weights included in the weight groups on the basis of the order information.  (The disclosed prior art of record fails to disclose the limitations of this claim.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID PERLMAN whose telephone number is        (571) 270-1417. The examiner can normally be reached on Monday - Friday; 10:00am - 6:30pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAVID PERLMAN/Primary Examiner, Art Unit 2662