DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s amendment filed on February 17, 2021 has been considered and entered.
Accordingly, claims 1-10 are pending in this application. Claims 1-3 and 6-8 are currently amended; claims 4-5 and 9-10 are previously presented.
Specification
The following document is incorporated by reference: Song Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network, ISCA 2016:243-254”. Pursuant to MPEP 608.01(p), examiner is requiring the applicant to provide a copy of the incorporated reference. If the applicant wants this reference to be printed in the front page of a potential patent publication, a submission of an information disclosure statement is required. 
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
Convolution and pooling unit for performing a convolution and pooling operation […] in claim 1.
Full connection unit for performing a full connection calculation […] in claim 1.
Control unit for determining and sending […] in claim 1. 
Convolution unit for performing a multiplication operation […] in claim 2.
Adder tree unit for accumulating output results […] in claim 2.
Nonlinear unit performing a nonlinear processing […] in claim 2. 
Pooling unit for performing a pooling operation […] in claim 2. 
Input vector buffer unit for buffering […] in claim 3. 
Pointer information buffer unit for buffering […] in claim 3. 
Weight information buffer unit for buffering […] in claim 3. 
Output buffer unit for buffering […] in claim 3. 
Activation function unit for performing an activation function operation […] in claim 3. 

The following is the interpretation of the 112(f) limitations:
a.	Convolution and pooling unit (claim 1): See Fig. 4 and paragraph [0046-0052].
b.	Full connection unit (claim 1): See Fig. 5 and paragraph [0054-0061].
c.	Control unit (claim 1): See Fig. 3 reference symbol Controller and paragraph [0043] lines 16-20.
d.	Convolution unit (claim 2) See Fig. 4 reference symbol convolver and paragraph [0049].
e.	Adder tree unit (claim 2) See Fig. 4 reference symbol adder tree and paragraph [0050].
f.	Nonlinear unit (claim 2) See Fig. 4 reference symbol Nonlinear and paragraph [0051].
g.	Pooling unit (claim 2): See Fig. 4 reference symbol pooling and paragraph [0052].
h.	Input vector buffer unit (claim 3): See Fig. 5 reference symbol ActQueue and paragraph [0056].
i.	Pointer information buffer unit (claim 3): See Fig. 5 reference symbol PtrRead and paragraph [0057].
j.	Weight information buffer unit (claim 3): See Fig. 5 reference symbol SpmatRead and paragraph [0058].
k.	Output buffer unit (claim 3): See Fig. 5 reference symbol ActBuffer and paragraph [0060].
l.	Activation function unit (claim 3): See Fig. 5 reference symbol Function and paragraph [0061].
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites “in the units above and their state machines” in the last line. It is unclear what the “units above” and “their” refers to. For example, it is unclear whether the “units above” include each of the convolution and pooling unit in the plurality of convolution and pooling units and the full connection unit or whether it only includes the plurality of convolution and pooling units. Claims 2-5 inherit the same deficiency as claim 1 by reason of dependence.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over Guo1 et al. (NPL - "From Model to FPGA: Software-Hardware Co-Design for Efficient Neural Network Acceleration"), hereinafter Guo. The rejection is based on the combination of features from different embodiments in the publication, and in view of Dally et al. (US 20180046906 A1), hereinafter Dally.
Regarding claim 1, Guo teaches a convolutional neural network (CNN) accelerator comprising:
a control unit (Guo page 15 controller); a convolution and pooling unit (Guo page 15 PE and page 16 shows the structure of one PE in page 15);
	Although the Aristotle architecture for CNN acceleration of Guo does not teach a full connection unit, the Descartes architecture of Guo teaches a different processing element for a full connection operation (Guo page 19 figure on the right).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify the Aristotle architecture of Guo in view of the Descartes architecture, to configure the processing elements by cascading the Aristotle processing element and the Descartes processing element. The output buffer of the Aristotle processing element shown in page 16 will serve as the input buffer to the Descartes processing element. By doing so, the control unit will also be configured to control the operation and functionality of the different processing elements. Further, the accelerator will also be configured to perform compression on the weight matrix 
The motivation to do so is to make a more versatile accelerator. The Aristotle processing element is more suitable for image or object recognition while the Descartes processing element is more suitable for speech recognition. The combination of the two processing elements will provide a more adaptable accelerator depending on specific applications such as image and speech recognition.
Therefore, the combination of the two neural network accelerators of Guo teaches a full connection unit.
Accordingly, Guo as modified teaches a neural network accelerator comprising:
a […] convolution and pooling unit, […] for performing a convolution and pooling operation, for a first iteration number of times, on a first plurality of sub-blocks of input data in parallel in accordance with convolution parameter information to obtain an input vector of a sparse neural network;
a full connection unit for performing a full connection calculation, for a second iteration number of times, on the input vector in accordance with weight matrix position information of a full connection layer to finally obtain a calculation result of the sparse convolutional neural network, wherein each input vector comprises a second plurality of sub-blocks, and wherein the full connection unit performs a full connection operation on the second plurality of sub-blocks in parallel; and
a control unit for determining and sending the convolution parameter information and the weight matrix position information of the full connection layer to the convolution and pooling unit and the full connection unit respectively, and controlling reading of the input vectors on respective iterative levels in the units above and their state machines.
Guo as modified does not explicitly teach a plurality of convolution and pooling unit, each for performing a convolution and pooling operation.

Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify the accelerator of Guo using Dally and configure multiple processing elements of the Aristotle architecture shown in Guo page 15 to have the same structure as the processing element shown in page 16 of Guo which comprises a convolver unit, an adder tree unit, a non-linear processing (ReLU) unit, and a pooling unit consistent with the teaching of Dally where multiple processing elements are configured to include the same structure for performing the same functionality.
The motivation to do so is to increase parallelism beyond a single processing element (PE) by using multiple PEs that are operated in parallel working on different tiles/sub-blocks of input activations/data (Dally paragraph [0104]).
Therefore, the combination of Guo as modified in view of Dally teaches a plurality of convolution and pooling unit, each for performing a convolution and pooling operation.

Regarding claim 2, Guo as modified in view of Dally teaches all the limitations of claim 1 as stated above. Further, Guo as modified in view of Dally teaches 
wherein each of the convolution and pooling units comprises:
a convolution unit for performing a multiplication operation of the first plurality of sub-blocks of input data and a convolution parameter (Guo page 16 convolution unit – Convolvers, input data – ;
an adder tree unit for accumulating output results of the convolution to complete a convolution operation (Guo page 16 adder tree – Adder Tree);
a nonlinear unit for performing a nonlinear processing on the output results of the convolution unit (Guo page 16 nonlinear unit – ReLu); and
a pooling unit for performing a pooling operation on the output results of the convolution unit after the nonlinear processing to obtain the input data on the next iterative level or finally obtain the input vector of the sparse neural network (Guo page 16 pooling unit – Pool; obtain the input data on the next iterative level – intermediate data being fed back to the adder tree for next iterative level, finally obtain the input vector of the sparse neural network – output of the Output Buffer). The motivation to combine is the same as claim 1.

Regarding claim 3, Guo as modified by Dally teaches all the limitations of claim 1 as stated above. Further, Guo teaches a sparse convolutional neural network accelerator wherein the full connection unit further comprises:
an input vector buffer unit for buffering the input vector of the sparse neural network (Guo page 19 input vector buffer unit - Act_0);
a plurality of process elements, each process element comprising (Guo page 19 shows a plurality of process element. The figure on the right in page 19 shows M PEs comprising the same structure):
a pointer information buffer unit for buffering compressed pointer information of the sparse neural network in accordance with the weight matrix position information of the full connection layer (Guo page 19 pointer information buffer unit – PtrRead_0);
a weight information buffer unit for buffering compressed weight information of the sparse neural network in accordance with the compressed pointer information of the sparse neural network (Guo page 19 weight information buffer – SpmatRead_0);
an arithmetic logic unit (ALU) for performing a multiplication-accumulation calculation in accordance with the compressed weight information and the input vector of the sparse neural network (Guo page 19 arithmetic logic unit – Mac unit is used for multiplication-addition calculation in accordance with compressed weight information from SpmatRead_0 buffer and the input vector from the Act_0 buffer); and
an output buffer unit for buffering an intermediate calculation result and a final calculation result of the ALU (Guo page 19 output buffer unit – ActBuf_0); and
an activation function unit for performing an activation function operation on the final calculation result in the output buffer unit to obtain the calculation result of the sparse convolutional neural network (Guo page 19 activation function unit – sigmoid).

Regarding claim 4, Guo as modified by Dally teaches all the limitations of claim 2 as stated above. Further, Guo teaches the sparse convolutional neural network accelerator wherein the adder tree unit further adds a bias in accordance with the convolution parameter information, in addition to accumulating output results of the convolution unit (Guo page 16 adder tree takes in a bias as input).

Regarding claim 5, Guo as modified by Dally teaches all the limitations of claim 3 as stated above. Further, Guo teaches a sparse convolutional neural network accelerator wherein the compressed weight information of the sparse neural network comprises a position index value and a weight value, and (compressed weight information comprises weight value and index value)
the ALU is further configured to:
perform a multiplication operation of the weight value and a corresponding element of the input vector (Mac unit is used for multiplication calculation in accordance with compressed weight information from SpmatRead_0 buffer which comprise a weight value and a corresponding element of the input vector from the Act_0 buffer),
read data in a corresponding position in the output buffer unit in accordance with the position index value, and add the data to the calculation result of the multiplication operation above (Guo page 19 data from the ActBuf_0 is read by the Mac unit as shown by the data line coming from the ActBuf_0 going to the Mac unit specified by the index), and
write the calculation result of the addition into the corresponding position in the output buffer unit in accordance with the position index value (Guo page 19 result of the Mac unit calculation is written to the ActBuf_0 specified by the index). 
Further, Dally also teaches a compressed weight information comprising a position index and a weight value (Fig. 3A 305 and 315 and paragraphs [0066 and 0068]); an ALU configured to perform a multiplication operation of the weight value and a corresponding element of the input vector (Fig. 2A ALU – FxI multiplier array 325 and accumulator array 340 which performs a MAC operation similar to MAC unit of Guo); reading data in a corresponding position in the output buffer unit in accordance with the position index value, and adding the data to the calculation result of the multiplication operation above and write the calculation result of the addition into the corresponding position in the output buffer unit in accordance with the position index value (Dally Fig. 3D and paragraph [0093]). The motivation to combine is to exploit weight and/or activation sparsity to reduce energy consumption and improve processing throughput (Dally paragraph [0030]).

.

Response to Arguments
Pursuant to Applicant’s Comments with respect to claim construction under 35 USC 112f, Applicant requests Examiner apply the broadest reasonable interpretation whether under 35 USC 112f or not under 35 USC 112f.  Examiner maintains interpretation of certain claim elements under 35 USC 112f, as discussed above. Because 35 USC 112f has been invoked, the broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) being applied is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
In view of claim amendments, the rejection under 35 USC 112a and 35 USC 112b made in the office action dated 11/18/20 is withdrawn.  However, see new rejection under 35 USC 112b made in this office action.
Applicant’s arguments, see remarks pages 9-10, filed 02/17/2021, with respect to the 35 U.S.C. 112(d) of claims 2-3 have been fully considered and are persuasive.  The 35 U.S.C. 112(d) rejection of claims 2-3 has been withdrawn.
Applicant’s arguments, see remarks pages 10-11, filed 02/17/2021, with respect to the rejections of claims 1-10 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground of rejection is made in view of claim amendments and in view of newly found prior art reference.
In response to applicant’s arguments with respect to the 35 U.S.C. 103 rejection of claims 1-10 over Guo et al. (NPL – “From Model to FPGA: Software-Hardware Co-Design for Efficient Neural Network Acceleration”), applicant amended claim 1 to recite “a plurality of convolution and pooling units, each for performing a convolution and pooling operation, for a first iteration number of times, on a first plurality of sub-blocks of input data in parallel in accordance with convolution parameter information to obtain an input vector of a sparse neural network”. Applicant argued that Guo only teaches one processing element (convolution and pooling unit) as shown in slide 16, therefore, Guo does not teach a plurality of convolution and pooling units because the internal structure of the other processing elements in slide 15 of Guo were not shown. Examiner agrees. However, Guo teaches a plurality of processing elements as shown in slide 15. On the other hand, Dally discloses a neural network accelerator comprising of multiple processing elements having the same structure and function to increase parallelism. It would be obvious to a person of ordinary skill in the art to modify the other processing elements of Guo using Dally by configuring multiple processing elements of Guo to have the same architecture as the one processing element shown in slide 16 with the motivation of increasing parallelism by having multiple processing elements operating in parallel.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARLO C WAJE whose telephone number is (571)272-5767.  The examiner can normally be reached on 7:30-4:30 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/C.W./
Carlo WajeExaminer, Art Unit 2182                                                                                                                                                                                                        (571)272-5767




/EMILY E LAROCQUE/Examiner, Art Unit 2182                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 The publication being relied upon as ground of rejection is a PowerPoint presentation, and the presentation mostly contains figures without providing sufficient details to every structure shown in the figures. A different publication by the same author with a similar title published after the filing date of the application provides a more detailed explanation of the structure and operation of the neural network accelerators presented in the reference being relied upon.