DETAILED ACTION
This action is in response to the application filed on 2/5/2020 for application 16/782,972. Claim 1 – 20 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objection
Claims  1 and 11 are objected to because of the following informalities:  claim 1 recites the term “a corresponding one of the data sets” in line 12 – 13. Based on the context of the paragraph, it is clear that the term refer to “a corresponding one of the data subsets”. Claim 11 recites the same phrase . Appropriate correction is required.

 Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“a receive-end component configured to” in Claim 1
“a weight storage unit configured to” in Claim 1
“sense amplifiers respectively configured to” in Claim 10 and 20

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 – 10 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim limitation “weight storage unit” in Claim 1 and “sense amplifier” in Claim 10 and 20 invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Dependent claims Claim 2 – 10 are also rejected for inheriting the deficiencies of the claims upon which they depend.

Claim 1 and 11 recite the limitation "the partial weight storage unit".  There is insufficient antecedent basis for this limitation in the claim.
Claim 1 and 11 recite “a weight storage unit“. It is not clear if “the partial weight storage unit” refer to the mentioned weight storage unit that store a part of weight or refer to a part of the mentioned weight storage unit. Thus, the scope of the claim is unclear. For the examination purpose, this term is interpreted to refer to the weight storage unit. 

Dependent claims Claim 2 – 10 and 12 – 20 are also rejected for inheriting the deficiencies of the claims upon which they depend.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1 – 6, 10 – 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ghodrati et. Al, (hereinafter Ghodrati), Mixed Signal Charge Domain Acceleration of Deep Neural Network through Interleaved Bit-Partitioned Arithmetic, arXiv, 2019 in view of Chen, et.al, (hereinafter Chen) US_20200050918.

Regarding Claim 1, Ghodrati discloses: An artificial intelligence accelerator, configured to receive a binary input data set and a selected layer of a plurality of layers of an overall weight pattern to perform a convolution operation, wherein the input data set is divided into a plurality of data subsets (Ghodrati, fig. 1b, where 
    PNG
    media_image1.png
    45
    268
    media_image1.png
    Greyscale
 includes binary input data                         
                            X
                        
                     and weight W for convolution operation (convolution operations includes vector multiplications and accumulation), input data X is divided into subsets ), 
the artificial intelligence accelerator comprises: a plurality of processing tiles (Ghodrati, fig. 1c & fig. 2a, where multiple bit parallel multiplication accumulation units BPMACC), wherein each of the processing tiles comprises: a receive-end component, configured to receive one of the data subsets; a weight storage unit, configured to store a part of the overall weight pattern, wherein the partial weight storage unit comprises a plurality of weight blocks, and each of the weight blocks stores a block part of the partial weight pattern in order of bits, wherein a cell array structure of the weight storage unit, with respect to a corresponding one of the data sets, configured to perform a convolution operation on the data subset with each block part respectively to obtain a plurality of sequential weight operation values (Examiner’s BRI: the processing unit has input logic to receive input data and storage logics to store a part of overall weights and the memory logics is an array of cells to store bits, the processing unit perform convolution of input data subset and partial weights; Ghodrati, fig. 1c & 2a, where each BPMACC units receives input data and has circuits to store weight data, the memory is an array by each bits w1 - wn; each BPMACC performs vector multiplication (convolution) to generate a part of the convolution output; the output is a sequence of bit values; weight vector W is a row (layer) within the overall 2D weight matrix                         
                            
                                
                                    W
                                
                                →
                            
                        
                     as described in fig. 2b & sec. 3.2);
a summation output circuit, comprising a plurality of shifters and a plurality of adders, and configured to sum up the plurality of weight output values through a multistage shifting and adding operation, so as to obtain a sum value expected from a direct convolution operation performed on the input data set with the overall weight pattern (Ghodrati, fig. 1c & sec. 3.2, para. 1, where “each WAGG also perform necessary shift operations to combine the lowe-bitwidth results from its 16 MS-BPMACCs. By aggregating the partial results of each MS-BPMACC, the MS-WAGG unit generates a scalar output which is stored on its output register”; i.e., multiple shifters (multi stage) and adders between each BPMACC units to shift and sum up the outputs; fig. 2b & sec. 3.3, para. 3, “As the outputs of MS-WAGGs flow down the columns they get accumulated to generate the output”; i.e., the sum up value of the input dataset with the overall weight pattern).
Ghodrati does not explicitly disclose: 
a block-wise output circuit, comprising a plurality of shifters and a plurality of adders, and configured to sum up the plurality of weight operation values through a multistage shifting and adding operation, so as to obtain a weight output value expected from a direct convolution operation performed on the data subset with the partial weight pattern;
Chen explicitly disclose: 
a block-wise output circuit, comprising a plurality of shifters and a plurality of adders, and configured to sum up the plurality of weight operation values through a multistage shifting and adding operation, so as to obtain a weight output value expected from a direct convolution operation performed on the data subset with the partial weight pattern (Chen fig. 7 & para. 0082, where the multiplier has N bits, each n bits are multiplied with multiplicand in each cycle, each multiplication (multi stage) results are shifted and added to reach the final results);
Ghodrati and Chen both teach accelerator for convolutional operation and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Ghodrati’s disclosure of using multi-processor to perform bit partition with Chen’s disclosure of bit serial operation on multiplier to achieve a system that partition input data bits among multiple processors as the following figure and multiply the weight W using the basic multiplier S0, S1, S2 … of Chen to achieve the claimed invention.
 
    PNG
    media_image2.png
    561
    957
    media_image2.png
    Greyscale
 
One of the ordinary skill in the art would have motivated to make this modification to take the advantage of strong flexibility, high configurability (Chen abstract).

Regarding Claim 2, Ghodrati in view of Chen further disclose:
the input data set comprises i bits, and is divided into p data subsets, i and p are integers, and each of the data subsets comprises i/p bits (Examiner’s BRI: input data is evenly divided into groups of bits; Ghodrati, fig. 1b & 1c, where input data X is evenly divided into groups of n bits (i/p))

Regarding Claim 3, Ghodrati in view of Chen further disclose:
the input data set comprises i bits, and the quantity of the plurality of processing tiles is p, the input data set is divided into p data subsets, i and p are integers greater than or equal to 2, i is greater than p, and each of the data subsets comprises i/p bits (See examiner’s figure in Claim 1 rejection, input X is evenly divided among processing units, each process n bits (i/p) ).

Regarding Claim 4, Ghodrati in view of Chen further disclose:
the quantity of the plurality of weight blocks comprised in the partial weight storage unit is q, q is an integer greater than or equal to 2, the partial weight storage unit comprises j bits, j and q are integers greater than or equal to 2, j is greater than q, and each of the weight blocks comprises j/q memory cells (Examiner’s BRI: the convolution process on weights are split into multiple processes of equal number of bits; Chen, fig. 7 & para. 0082, where N bits (q) of multiplier , the multiplier is processed per n bits (j/q)).

Regarding Claim 5, Ghodrati in view of Chen further disclose:
the block-wise output circuit comprises at least one shifter and at least one adder in each stage of the shifting and adding operation (Examiner’s BRI: in each stage of shifting and adding at least a shifter and an adder is used. Chen, fig. 7, where each multiplication of the n bits of weight, the second shifter is shifted by n bits and the output is pushed to the adder tree);
two adjacent input values of a plurality of input values in each stage are one processing unit, after passing through the shifter, one input value in a higher bit location is added by the adder to the other input value in a lower bit location, and is output to a next stage, and in a last stage, a single value is output and used as the weight output value corresponding to the processing tile (Chen, fig. 7 & para. 0082, where the multiplication result of the current n bits of multiplier are to be accumulated with the accumulated result of the prior processed n bits which are adjacent to current n bits. The current multiplication results are shifted in the second shifter, which results a higher bits location, and accumulated (processed) with the prior results ).

Regarding Claim 6, Ghodrati in view of Chen further disclose:
a shift amount of the shifter in a first stage is j/q memory cells, and a shift amount of the shifter in a next stage is twice that of the shifter in a previous stage (Chen, fig. 7, where every multiplication of n bits add the second shifter additional n bit, i.e., at the cycle that shift by n bit (j/q), the following cycle shift 2n bits which is twice of the prior cycle).

Regarding Claim 10, Ghodrati in view of Chen further disclose:
the processing circuit comprises a plurality of sense amplifiers, respectively configured to sense each block part to perform a convolution operation to obtain a plurality of sensed values as the plurality of weight operation values (Examiner’s BRI: the processing circuit has logic to collect/receive input data for convolution operation and logic to create the result of the computation; Ghodrati, fig. 2a, where BPMACC circuitry has logic to receive input data and logic to compute output results).

Regarding Claim 11 – 16 and 20, Claim 11 – 16 and 20 are the method claim corresponding to Claim 1 – 6 and 10. Claim 11 – 16 and 20 are rejected with the same reason as Claim 1 – 6 and 10. 

Claim 7 – 8 and 17 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Ghodrati et. al., (hereinafter Ghodrati), Mixed Signal Charge Domain Acceleration of Deep Neural Network through Interleaved Bit-Partitioned Arithmetic, arXiv, 2019 in view of Chen, et.al, (hereinafter Chen) US_20200050918 and further in view of Llamocca et. al. (hereinafter Llamocca), Partial Reconfigurable FIR Filtering System Using Distributed Arithmetic, International Journal of Reconfigurable Computing, 2010.

Regarding Claim 7, Ghodrati in view of Chen did not explicitly disclose:
the summation output circuit comprises at least one shifter and at least one adder in each stage of the shifting and adding operation;
two adjacent input values of a plurality of input values in each stage are one processing unit, after passing through the shifter, one input value in a higher bit location is added by the adder to the other input value in a lower bit location, and is output to a next stage, and in a last stage, a single value is output and used as the sum value.
Llamocca explicitly disclose: 
the summation output circuit comprises at least one shifter and at least one adder in each stage of the shifting and adding operation; two adjacent input values of a plurality of input values in each stage are one processing unit, after passing through the shifter, one input value in a higher bit location is added by the adder to the other input value in a lower bit location, and is output to a next stage, and in a last stage, a single value is output and used as the sum value (Llamocca, fig. 3, where the adder tree of inputs at different bit location are pairwise added. The inputs of adjacent bit location are processed with shifter and adder. The left of the pair is shifted, which results in higher bit location, and added with the right of the pair).
Ghodrati (in view of Chen) and Llamocca both teach parallel, distributed arithmetic and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to replace Ghodrati (in view of Chen)’s disclosure of shifter and adder for accumulating results of multipliers, with Llamocca’s bit partitioned adder tree to achieve the claimed invention. Since the input data on each processor is n bit long, the input pair on each of the first stage has n bits apart so the shift amount of the higher input data is n, the input pair on each of the second stage are 2n bits apart thus the shift amount on the left input is 2n, same logic apply to the rest of the stages.   
    PNG
    media_image3.png
    816
    613
    media_image3.png
    Greyscale

One of the ordinary skill in the art would have motivated to make this modification to maintain high through put of the system (Llamocca, abs. ).

Regarding Claim 8, Ghodrati in view of Chen and Llamocca further discloses: 
a shift amount of the shifter in a first stage is i/p bits, and a shift amount of the shifter in a next stage is twice that of the shifter in a previous stage (See the figure in prior art rejection of Claim 7, in the first stage the shifter amount is n bits (i/q), the second stage shift 2n bits which is twice of the first stage).

Regarding Claim 17 – 18, Claim 17 – 18 are the method claim corresponding to Claim 7 – 8. Claim 17 – 18 are rejected with the same reason as Claim 7 – 8. 

Claim 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ghodrati et al., (hereinafter Ghodrati), Mixed Signal Charge Domain Acceleration of Deep Neural Network through Interleaved Bit-Partitioned Arithmetic, arXiv, 2019 in view of Chen, et.al, (hereinafter Chen) US_20200050918 and further in view of Kaul et al.(hereinafter Kaul), US20180315399.

Regarding Claim 9, Ghodrati in view of Chen did not explicitly disclose:
further comprising: a normalization processing circuit, configured to normalize the sum value to obtain a normalization sum value; and a quantization processing circuit, configured to quantize the normalization sum value into an integer value by using a base number.
Kaul explicitly discloses: 
further comprising: a normalization processing circuit, configured to normalize the sum value to obtain a normalization sum value; and a quantization processing circuit, configured to quantize the normalization sum value into an integer value by using a base number (Kaul, fig. 15A and para 202 where step 1515 performs normalization shift to the multiplication accumulation results and send to round logic 1516 (quantization into integer value). The normalization is based on the location prediction (base number) by the leading zero anticipator 22a LXA module).
Ghodrati (in view of Chen) and Kaul both teach tensor convolution calculation and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Ghodrati (in view of Chen)’s disclosure of bit parallel convolution processor, with Kaul’s disclosure of normalization and quantization steps to achieve the claimed invention. One of the ordinary skill in the art would have motivated to make this modification to support the floating-point calculation without increase memory footprint (Kaul, para. 0190). 

Regarding Claim 19, Claim 19 is the method claim corresponding to Claim 9. Claim 19 is rejected with the same reason as Claim 9. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354. The examiner can normally be reached Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




Shien Ming Chou
Examiner, Art Unit 2122


/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122