Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	Specification
1.	This action is responsive to communications filed April 03, 2020.  Claims 1-20 are presented for examination.  
2. 	Applicant is reminded of the duty to fully disclose information under 37 CFR 1.56.

3. 	The submission of Information Disclosure Statements filed 04/03/2020, 05/06/2020, 07/31/2020, 04/12/2021, 05/20/2022 and 07/13/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, they have been reviewed and considered by the Examiner. 

Claim Interpretation
4. 	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

5. 	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations (means for processing, claim 20) in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

6. 	Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Double Patenting
7.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

8.	Claims 1, 14, and 20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 11, and 20 of copending Application No. 16/446,610.  Claims 1, 14, and 20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 11, and 20 of copending Application No. 16/552,619.  Claims 1, 14, and 20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 11, and 20 of copending Application No. 16/552,850.  Claims 1, 14, and 20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 12 and 20 of copending Application No. 16/552,945.  Although the claims at issue are not identical, they are not patentably distinct from each other because the instant application claims are broader in every aspect than the copending application claims and are therefore an obvious variant thereof, see table below.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Current application
Application 16/446,610
1. A processor, comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.

1. A processor, comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the first queue comprising a first register and a second register adjacent to the first register, the first register being an output register of the first queue, the first tile being configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue, wherein, in the first state, a first adder is configured to be connected to an output of the first multiplier, and wherein, in the second state, a second adder is configured to be connected to the output of the first multiplier.
14. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

11. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the first queue comprising a first register and a second register adjacent to the first register, the first register being an output register of the first queue, the method comprising: in a first state: multiplying, by the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: multiplying, by the first multiplier, the first weight by an activation from the second register of the first queue. wherein, in the first state, a first adder is configured to be connected to an output of the first multiplier, and wherein, in the second state, a second adder is configured to be connected to the output of the first multiplier.
20. A method for calculating with a means for processing, the means for processing comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

20. A method for calculating with a means for processing, the means for processing comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the first queue comprising a first register and a second register adjacent to the first register, the first register being an output register of the first queue, the method comprising: in a first state: multiplying, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: multiplying, in the first multiplier, the first weight by an activation from the second register of the first queue; wherein, in the first state, a first adder is configured to be connected to an output of the first multiplier, and wherein, in the second state, a second adder is configured to be connected to the output of the first multiplier.


Current application
Application 16/552,619
1. A processor, comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.

1. A processor, comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the first tile being configured to perform a convolution of an array of activations with a kernel of weights, the performing of the convolution comprising, in order: forming a tensor product of the kernel with a first subarray of the array of activations; forming a tensor product of the kernel with a second subarray of the array of activations, the second subarray being offset from the first subarray by n array elements in a first direction, n being a positive integer; and forming a tensor product of the kernel with a third subarray of the array of activations, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction, wherein the second subarray and the third subarray are spaced apart from an end of a row of the array of activations.
14. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

11. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the method comprising performing a convolution of an array of activations with a kernel of weights, the performing of the convolution comprising, in order: forming a tensor product of the kernel with a first subarray of the array of activations; forming a tensor product of the kernel with a second subarray of the array of activations, the second subarray being offset from the first subarray by n array elements in a first direction, n being a positive integer; and forming a tensor product of the kernel with a third subarray of the array of activations, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction, wherein the second subarray and the third subarray are spaced apart from an end of a row of the array of activations.
20. A method for calculating with a means for processing, the means for processing comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

20. A method for calculating with a means for processing, the means for processing comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the method comprising performing a convolution of an array of activations with a kernel of weights, the performing of the convolution comprising, in order: forming a tensor product of the kernel with a first subarray of the array of activations; forming a tensor product of the kernel with a second subarray of the array of activations, the second subarray being offset from the first subarray by n array elements in a first direction, n being a positive integer; and forming a tensor product of the kernel with a third subarray of the array of activations, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction, wherein the second subarray and the third subarray are spaced apart from an end of a row of the array of activations.


Current application
Application 16/552,850
1. A processor, comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.

1. A processor, comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the processor being configured to perform a first convolution of an array of activations with a first kernel of weights, the performing of the first convolution comprising: broadcasting a first subarray of the array of activations to: the first tile, and the second tile; forming a first tensor product, the first tensor product being a tensor product of a first subarray of the first kernel of weights with the first subarray of the array of activations; storing the first tensor product in the memory; broadcasting a second subarray of the array of activations to: the first tile, and the second tile; forming a second tensor product, the second tensor product being a tensor product of a second subarray of the first kernel of weights with the second subarray of the array of activations; and adding the first tensor product and the second tensor product.
14. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

11. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the method comprising performing a first convolution of an array of activations with a first kernel of weights, the performing of the first convolution comprising: broadcasting a first subarray of the array of activations to: the first tile, and the second tile; forming a first tensor product, the first tensor product being a tensor product of a first subarray of the first kernel of weights with the first subarray of the array of activations; storing the first tensor product in the memory; broadcasting a second subarray of the array of activations to: the first tile, and the second tile; forming a second tensor product, the second tensor product being a tensor product of a second subarray of the first kernel of weights with the second subarray of the array of activations; and adding the first tensor product and the second tensor product.
20. A method for calculating with a means for processing, the means for processing comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

20. A method for calculating with a means for processing, the means for processing comprising: a first tile,
a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the method comprising performing a first convolution of an array of activations with a first kernel of weights, the performing of the first convolution comprising: broadcasting a first subarray of the array of activations to: the first tile, and the second tile; forming a first tensor product, the first tensor product being a tensor product of a first subarray of the first kernel of weights with the first subarray of the array of activations; storing the first tensor product in the memory; broadcasting a second subarray of the array of activations to: the first tile, and the second tile; forming a second tensor product, the second tensor product being a tensor product of a second subarray of the first kernel of weights with the second subarray of the array of activations; and adding the first tensor product and the second tensor product. 


Current application
Application 16/552,945
1. A processor, comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.

1. A processor, comprising: a first tile, a second tile, a memory, an input bus, and an output bus, the input bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the first tile being configured to perform a first convolution of an array of activations with a kernel of weights; the memory comprising: a first memory bank set, and a second memory bank set; the input bus comprising: a first segmented bus for data propagating in a first direction, and a second segmented bus for data propagating in a second direction, opposite the first direction; the first segmented bus comprising: a first switch block, and a second switch block; the first switch block being connected to: the first tile, and the first memory bank set; the second switch block being connected to: the second tile, and the second memory bank set; the second segmented bus comprising: a third switch block, and a fourth switch block; the third switch block being connected to: the first tile, and the first memory bank set; the fourth switch block being connected to: the second tile, and the second memory bank set; an input of the first switch block being connected to an output of the second switch block; and an output of the third switch block being connected to an input of the fourth switch block.
14. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

12. A method for calculating with a processing circuit, the processing circuit comprising: a first tile, a second tile, a memory, an input bus, and an output bus, the input bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the first tile being configured to perform a first convolution of an array of activations with a kernel of weights; the memory comprising: a first memory bank set, and a second memory bank set; the input bus comprising: a first segmented bus for data propagating in a first direction, and a second segmented bus for data propagating in a second direction, opposite the first direction; the first segmented bus comprising: a first switch block, and a second switch block; the first switch block being connected to: the first tile, and the first memory bank set; the second switch block being connected to: the second tile, and the second memory bank set; the second segmented bus comprising: a third switch block, and a fourth switch block; the third switch block being connected to: the first tile, and the first memory bank set; the fourth switch block being connected to: the second tile, and the second memory bank set; an input of the first switch block being connected to an output of the second switch block; and an output of the third switch block being connected to an input of the fourth switch block, the method comprising: in a first bus state, connecting, by the first switch block, the first memory bank set to the first tile, and connecting, by the second switch block, the second memory bank set to the second tile.
20. A method for calculating with a means for processing, the means for processing comprising: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; and performing a convolution of a kernel with one of the two-dimensional arrays.

20. A method for calculating with a means for processing, the means for processing comprising: a first tile, a second tile, a memory, an input bus, and an output bus, the input bus being connected to: the memory, the first tile, and the second tile, the first tile comprising: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier, the first tile being configured to perform a first convolution of an array of activations with a kernel of weights; the memory comprising: a first memory bank set, and a second memory bank set; the input bus comprising: a first segmented bus for data propagating in a first direction, and a second segmented bus for data propagating in a second direction, opposite the first direction; the first segmented bus comprising: a first switch block, and a second switch block; the first switch block being connected to the first tile, and the first memory bank set; the second switch block being connected to the second tile, and the second memory bank set; the second segmented bus comprising: a third switch block, and a fourth switch block; the third switch block being connected to the first tile, and the first memory bank set; the fourth switch block being connected to the second tile, and the second memory bank set; an input of the first switch block being connected to an output of the second switch block; and an output of the third switch block being connected to an input of the fourth switch block, the method comprising: in a first bus state, connecting, by the first switch block, the first memory bank set to the first tile, and connecting, by the second switch block, the second memory bank set to the second tile.



 	  			Rejections - 35 USC § 103
9. 	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall notbe negated by the manner in which the invention was made. 

10.	Claims 1-2, 6-8, 14-15 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ross (USPN: 10,521,488) in view of Pohlack et al. (USPN: 10,706,147).
	As per Claim 1, Ross teaches the invention as claimed including a processor (col. 1, lines 57 et seq.) comprises a first tile, a second tile, for example, Ross teaches a first number of activation registers in a first group of cells and a second, lesser number of activation registers in a second group of cells (e.g. see column 10, lines 54-58).  The further limitation of a bus wherein the bus connected to the memory, the first tile and the second tile is taught by Ross to the extent that it is being claimed, for example, Ross teaches a memory (210) (e.g. see figure 2), and the memory sends the weights and activations to the first tile and the second tile (dynamic memory 210 can send the sets of weight inputs and the sets of activation inputs to the matrix computation unit 212, the matrix computation unit 212 may be a two-dimensional systolic array of cells, col. 6, line 63-col. 7, line 2; col. 10, lines 54-58), this concept clearly shows a bus must be presented in Ross’s system and being connected to the memory, the first tile, and the second tile as being claimed in order for the memory to send the weights and activations to the first tile and the second tile (col. 6, line 63-col. 7, line 2; col. 10, lines 54-58).  Ross further discloses the first tile comprises a first weight register, a second weight register; for example Ross teaches each cell of the first plurality of cells includes a weight register configured to store a weight input (e.g. see col. 2, lines 17-19); an activations buffer as being equivalent to activation registers within cell (e.g. see col. 9, lines 48-51); Ross further discloses an activations buffer, a first multiplier, and a second multiplier; for example, Ross teaches each cell of the first plurality of cells including multiple activation registers/buffers, each activation register of the multiple activation registers/buffers configured to store a corresponding activation input, multiplexer circuitry communicatively coupled to the multiple activation registers/buffers and configured to select, from the multiple activation registers, one of the activation input as a selected activation input, and multiplication circuitry communicatively coupled to the weight register and to the multiplexer, in which the multiplication circuitry is configured to output a product of the weight input and the selected activation input (e.g. see col. 2, lines 17-28); cell may include multiple activation registers (506a, 506b, 506c) that store activation inputs (e.g. see col. 10, lines 40-41, Fig. 5).  Ross further discloses the first tile being configured to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; for example Ross teaches kernels identify particular image contours, shapes, or colors. Kernels can be represented as a matrix structure of weight inputs. Each convolutional layer can also process a set of activation inputs. The set of activation inputs can also be represented as a matrix structure (e.g. see col. 1, lines 29 et seq.); also see col. 9, lines 28-35, 47-56; col. 11, lines 33-35; col. 5, lines 5-6; Fig. 5; col. 1, lines 31-32), and to perform a convolution of a kernel with one of the two-dimensional arrays (e.g. see col. 6, lines 30-31; col. 9, lines 54-56; col. 1, lines 31-32).  Ross discloses the invention as claimed, Ross however does not particularly teach a shuffler and an activations cache, wherein the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible.
Pohlack in his teaching of mitigating side-channel attacks via shared cache, discloses the missing elements that are known to be required in the system of Ross in order to arrive at Applicant’s current invention wherein Pohlack discloses (a) a shuffler 126 configured to implement a shuffle operation that changes the host memory page that is mapped to a VM memory page in the memory map 122, the shuffle operation is triggered by detected condition by the memory usage monitor 124 or simply be performed periodically as part of the routine operation of the VMM 120 (e.g. see col. 9, lines 24-30; figure 1); and (b) an activation cache as being equivalent to cache 134, wherein the shuffler can select new host memory page that has the least number of cache collisions with the host memory pages that are current in use by the VMs 112 (e.g. see column 9, lines 48 et seq.), noting that Pohlack clearly teaches cache 134 includes a plurality of independent lanes/lines, each of the independent lanes being randomly accessible (e.g. see col. 3, lines 6-22).  Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the current invention to look into the invention of Pohlack and to implement the (a) a shuffler, and (b) activations cache, wherein the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible as taught by Pohlack for that of Ross’s invention.  By doing so, it would shuffle the memory mapping between the main memory and activation cache in a way that ensures the old and the new host memory pages and map to different locations in the activation cache, such as a different cache set or lanes/lines (Pohlack, col. 4, lines 45-50), which results to enhancing of data coherency and overall system performance, therefore being advantageous.
 	As per claim 2, see arguments with respect to claim 1, in addition, Pohlack further discloses the shuffler 126 is connected to an input/output of the activations cache 146 (e.g. see figure 1, arrow-line cross dotted-line). By this rationale, it would be obvious for the same reasons as being set forth in the rejection for claim 1.
 	As per claims 6 and 7, see arguments with respect to claim 1, it should be noted that Pohlack clearly discloses the shuffler 126 is connected to an input/output of the activations cache 146 (e.g. see figure 1, arrow-line cross dotted-line), wherein the activation cache can be implemented with variable size of S sets of W lines/lanes (e.g. see col. 3, lines 6 et seq.), yielding the shuffler 126 (e.g. see figure 1) must be a system dependent feature and can be implemented with variable size which has a granularity of four lanes (claim 6) or a granularity of one lane (claim 7).  Accordingly, it would have been further obvious to one having ordinary skill in the art before the effective filing date of the current invention to implement the shuffler having variable size of either one lane or four lanes as for the same reasons set forth above in claim 1, in addition, having variable size shuffler would further increase system adaptability, therefore being advantageous.
 	As per claim 8, see arguments with respect to claim 1, in addition, Pohlack further discloses the shuffler 126 is connected to an input/output of the activations cache 146 (e.g. see figure 1, arrow-line cross dotted-line). By this rationale, it would be obvious for the same reasons as being set forth in the rejection for claim 1.
	As per claim 14, Ross teaches the invention as claimed including a method for calculating with a processing circuit (e.g. see col. 1, lines 57 et seq.), wherein the processing circuit comprises a first tile, a second tile, for example, Ross teaches a first number of activation registers in a first group of cells and a second, lesser number of activation registers in a second group of cells (e.g. see column 10, lines 54-58).  The further limitation of a bus wherein the bus connected to the memory, the first tile and the second tile is taught by Ross to the extent that it is being claimed, for example, Ross teaches a memory (210) (e.g. see figure 2), and the memory sends the weights and activations to the first tile and the second tile (dynamic memory 210 can send the sets of weight inputs and the sets of activation inputs to the matrix computation unit 212, the matrix computation unit 212 may be a two-dimensional systolic array of cells, col. 6, line 63-col. 7, line 2; col. 10, lines 54-58), this concept clearly shows a bus must be presented in Ross’s system and being connected to the memory, the first tile, and the second tile as being claimed in order for the memory to send the weights and activations to the first tile and the second tile (col. 6, line 63-col. 7, line 2; col. 10, lines 54-58).  Ross further discloses the first tile comprises a first weight register, a second weight register; for example Ross teaches each cell of the first plurality of cells includes a weight register configured to store a weight input (e.g. see col. 2, lines 17-19); an activations buffer as being equivalent to activation registers within cell (e.g. see col. 9, lines 48-51); Ross further discloses an activations buffer, a first multiplier, and a second multiplier; for example, Ross teaches each cell of the first plurality of cells including multiple activation registers/buffers, each activation register of the multiple activation registers/buffers configured to store a corresponding activation input, multiplexer circuitry communicatively coupled to the multiple activation registers/buffers and configured to select, from the multiple activation registers, one of the activation input as a selected activation input, and multiplication circuitry communicatively coupled to the weight register and to the multiplexer, in which the multiplication circuitry is configured to output a product of the weight input and the selected activation input (e.g. see col. 2, lines 17-28); cell may include multiple activation registers (506a, 506b, 506c) that store activation inputs (e.g. see col. 10, lines 40-41, Fig. 5).  Ross further discloses the first tile being configured to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; for example Ross teaches kernels identify particular image contours, shapes, or colors. Kernels can be represented as a matrix structure of weight inputs. Each convolutional layer can also process a set of activation inputs. The set of activation inputs can also be represented as a matrix structure (e.g. see col. 1, lines 29 et seq.); also see col. 9, lines 28-35, 47-56; col. 11, lines 33-35; col. 5, lines 5-6; Fig. 5; col. 1, lines 31-32), and to perform a convolution of a kernel with one of the two-dimensional arrays (e.g. see col. 6, lines 30-31; col. 9, lines 54-56; col. 1, lines 31-32).  Ross discloses the invention as claimed, Ross however does not particularly teach a shuffler and an activations cache, wherein the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible.
Pohlack in his teaching of mitigating side-channel attacks via shared cache, discloses the missing elements that are known to be required in the system of Ross in order to arrive at Applicant’s current invention wherein Pohlack discloses (a) a shuffler 126 configured to implement a shuffle operation that changes the host memory page that is mapped to a VM memory page in the memory map 122, the shuffle operation is triggered by detected condition by the memory usage monitor 124 or simply be performed periodically as part of the routine operation of the VMM 120 (e.g. see col. 9, lines 24-30; figure 1); and (b) an activation cache as being equivalent to cache 134, wherein the shuffler can select new host memory page that has the least number of cache collisions with the host memory pages that are current in use by the VMs 112 (e.g. see column 9, lines 48 et seq.), noting that Pohlack clearly teaches cache 134 includes a plurality of independent lanes/lines, each of the independent lanes being randomly accessible (e.g. see col. 3, lines 6-22).  Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the current invention to look into the invention of Pohlack and to implement the (a) a shuffler, and (b) activations cache, wherein the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible as taught by Pohlack for that of Ross’s invention.  By doing so, it would shuffle the memory mapping between the main memory and activation cache in a way that ensures the old and the new host memory pages and map to different locations in the activation cache, such as a different cache set or lanes/lines (Pohlack, col. 4, lines 45-50), which results to enhancing of data coherency and overall system performance, therefore being advantageous.
 	As per claim 15, see arguments with respect to claim 1, in addition, Pohlack further discloses the shuffler 126 is connected to an input/output of the activations cache 146 (e.g. see figure 1, arrow-line cross dotted-line). By this rationale, it would be obvious for the same reasons as being set forth in the rejection for claim 1.
 	As per claim 19, see arguments with respect to claim 1, it should be noted that Pohlack clearly discloses the shuffler 126 is connected to an input/output of the activations cache 146 (e.g. see figure 1, arrow-line cross dotted-line), wherein the activation cache can be implemented with variable size of S sets of W lines/lanes (e.g. see col. 3, lines 6 et seq.), yielding the shuffler 126 (e.g. see figure 1) must be a system dependent feature and can be implemented with variable size which has a granularity of four lanes.  Accordingly, it would have been further obvious to one having ordinary skill in the art before the effective filing date of the current invention to implement the shuffler having variable size of either one lane or four lanes as for the same reasons set forth above in claim 1, in addition, having variable size shuffler would further increase system adaptability, therefore being advantageous.
	As per claim 20, Ross teaches the invention as claimed including a method for calculating with a means for processing  (e.g. see col. 1, lines 57 et seq.), wherein the means for processing comprises a first tile, a second tile, for example, Ross teaches a first number of activation registers in a first group of cells and a second, lesser number of activation registers in a second group of cells (e.g. see column 10, lines 54-58).  The further limitation of a bus wherein the bus connected to the memory, the first tile and the second tile is taught by Ross to the extent that it is being claimed, for example, Ross teaches a memory (210) (e.g. see figure 2), and the memory sends the weights and activations to the first tile and the second tile (dynamic memory 210 can send the sets of weight inputs and the sets of activation inputs to the matrix computation unit 212, the matrix computation unit 212 may be a two-dimensional systolic array of cells, col. 6, line 63-col. 7, line 2; col. 10, lines 54-58), this concept clearly shows a bus must be presented in Ross’s system and being connected to the memory, the first tile, and the second tile as being claimed in order for the memory to send the weights and activations to the first tile and the second tile (col. 6, line 63-col. 7, line 2; col. 10, lines 54-58).  Ross further discloses the first tile comprises a first weight register, a second weight register; for example Ross teaches each cell of the first plurality of cells includes a weight register configured to store a weight input (e.g. see col. 2, lines 17-19); an activations buffer as being equivalent to activation registers within cell (e.g. see col. 9, lines 48-51); Ross further discloses an activations buffer, a first multiplier, and a second multiplier; for example, Ross teaches each cell of the first plurality of cells including multiple activation registers/buffers, each activation register of the multiple activation registers/buffers configured to store a corresponding activation input, multiplexer circuitry communicatively coupled to the multiple activation registers/buffers and configured to select, from the multiple activation registers, one of the activation input as a selected activation input, and multiplication circuitry communicatively coupled to the weight register and to the multiplexer, in which the multiplication circuitry is configured to output a product of the weight input and the selected activation input (e.g. see col. 2, lines 17-28); cell may include multiple activation registers (506a, 506b, 506c) that store activation inputs (e.g. see col. 10, lines 40-41, Fig. 5).  Ross further discloses the first tile being configured to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; for example Ross teaches kernels identify particular image contours, shapes, or colors. Kernels can be represented as a matrix structure of weight inputs. Each convolutional layer can also process a set of activation inputs. The set of activation inputs can also be represented as a matrix structure (e.g. see col. 1, lines 29 et seq.); also see col. 9, lines 28-35, 47-56; col. 11, lines 33-35; col. 5, lines 5-6; Fig. 5; col. 1, lines 31-32), and to perform a convolution of a kernel with one of the two-dimensional arrays (e.g. see col. 6, lines 30-31; col. 9, lines 54-56; col. 1, lines 31-32).  Ross discloses the invention as claimed, Ross however does not particularly teach a shuffler and an activations cache, wherein the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible.
Pohlack in his teaching of mitigating side-channel attacks via shared cache, discloses the missing elements that are known to be required in the system of Ross in order to arrive at Applicant’s current invention wherein Pohlack discloses (a) a shuffler 126 configured to implement a shuffle operation that changes the host memory page that is mapped to a VM memory page in the memory map 122, the shuffle operation is triggered by detected condition by the memory usage monitor 124 or simply be performed periodically as part of the routine operation of the VMM 120 (e.g. see col. 9, lines 24-30; figure 1); and (b) an activation cache as being equivalent to cache 134, wherein the shuffler can select new host memory page that has the least number of cache collisions with the host memory pages that are current in use by the VMs 112 (e.g. see column 9, lines 48 et seq.), noting that Pohlack clearly teaches cache 134 includes a plurality of independent lanes/lines, each of the independent lanes being randomly accessible (e.g. see col. 3, lines 6-22).  Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the current invention to look into the invention of Pohlack and to implement the (a) a shuffler, and (b) activations cache, wherein the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible as taught by Pohlack for that of Ross’s invention.  By doing so, it would shuffle the memory mapping between the main memory and activation cache in a way that ensures the old and the new host memory pages and map to different locations in the activation cache, such as a different cache set or lanes/lines (Pohlack, col. 4, lines 45-50), which results to enhancing of data coherency and overall system performance, therefore being advantageous.

Allowable subject matter 
11. 	Claims 3, 9 and 16 objected to as being dependent upon rejected based claims 1 and 14 but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims 2, 8 and 15.  The prior arts of record do not teach nor disclose the first tile comprises a plurality of multipliers including the first multiplier and the second multiplier, arranged in a plurality of columns and a plurality of lanes, the lanes being arranged in groups of four, each group of lanes including an adder tree for summing outputs of the multipliers (claim 3), nor does the prior arts of record teach or suggest wherein the first tile comprises a plurality of multipliers including the first multiplier and the second multiplier, arranged in a columns and a plurality of lanes, the lanes being arranged in groups of four, each group of lanes including an adder tree for summing outputs of the multipliers (claim 9), and the first tile comprises a plurality of multipliers including the first multiplier and the second multiplier, arranged in a plurality of columns and a plurality of lanes, the lanes being arranged in groups of four, each group of lanes including an adder tree for summing outputs of the multipliers (claim 16).  
 	Claims 4-5, 10-13 and 17-18 are also allowable since they are depended upon indicated allowable claims 3, 9 and 16 respectively.


Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

13.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to TUAN V THAI whose telephone number is (571)-272-4187.  The examiner can normally be reached Monday-Friday 8am-4pm
 	Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
 	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sanjiv Shah can be reached on 571-272-4098.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-9300.  
 	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

October 04, 2022
/TUAN V THAI/Primary Examiner, Art Unit 2135