DETAILED ACTION
This Office Action is in response to the amendment and arguments filed February 1st, 2022 for Application No. 16/233,876 filed on December 27, 2018. Claims 1-20 are presented for examination and are currently pending.

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2/1/2022 has been received. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to claim 19 have been fully considered and are persuasive.  The objection for claim 19 of November 3rd, 2021 has been withdrawn. 

Applicant's arguments filed February 1st, 2022 in regards to the patentability of the claims over the prior art have been fully considered but they are not persuasive. 

The examiner respectfully disagrees. In regards to the plurality of processing units, Vantrease teaches the plurality of processing units as seen here: [ (abstract) “A processing element (PE) of a systolic array can perform neural networks computations in parallel” ]. Examiner notes that a tensor array is simply a multi-dimensional array with a uniform data type. The applicant’s specification defines the tensor array as holding processing units capable of performing multiplication and addition operations [ (¶0073), (¶0074) and (¶0138) (Abstract) ]. Vantrease’s systolic arrays also carry these processing elements that are capable of said operations as seen here [ (¶0002) “As part of the processing, each processing element can perform a set of arithmetic operations such as, for example, floating-point multiplications and additions,” ]. This shows the cited systolic arrays and processing elements from Vantrease are equivalent to the tensor arrays and processing units from applicant’s claimed language. Vantrease also shows the variability of the input channels to be determined by the number of input channels with: [ (¶0025) “the systolic array may vary based on a size (e.g., a number of bits) of the input data” ] and further strengthened with [ (¶0087) “According to an embodiment, two external sequential input elements may be fed simultaneously to the PE 00 every cycle using a first interface (e.g., the row input data bus 816)” ] which is equivalent to an input channel and shows multiple embodiments with different input channel sizes are possible. Vantrease teaches both the variation of size for the systolic array based on input size and there being multiple embodiments with different amount of input data streams. Vantrease teaches the claimed limitation. In regards to the memory cells, applicant’s argument is not represented in the claim language. Applicant’s claim reads: “wherein individual ones of the plurality of processing units each includes: a number of the tensor arrays determined based on the number of input channels, and a number of memory cells corresponding to the number of output maps” the claim language does not say that the tensor arrays correspond to the number of memory cells. In regards to the number of memory cells corresponding to the number of output maps, Vantrease is relied on to teach the memory output corresponding to the number of output maps. This is shown from the citation in the previous office action: [ (¶0055) “The memory 614 may also be configured to store outputs of the neural network processor 602 (e.g., one or more image recognition decisions on the input images in the form of output data sets).” ]. Which teaches the memory storing the output data sets/maps and further strengthened by [ (¶0039) “The convolution outputs may correspond to an output feature map” ]. Examiner notes that Vantrease is not relied upon for teaching the memory cells as that and some of the other claimed limitations are taught by Delaye as shown in the previous office action. In this instance specifically, Vantrease is relied upon for the operation data being processed by the processing elements being output to memory in the form of data maps and further all of the data maps stored in the memory via an interconnect. Broadest reasonable interpretation of the claim limitation does not specify anything more than the memory being able to store the output data which Vantrease teaches. The argument is non-persuasive.
Applicant’s next argument (Pg. 11 of remarks) is that Vantrease fails to teach “a number of pixel arrays corresponding to the number of output maps”. Examiner respectfully disagrees. This argument is similar to the one above and applicant’s specification recites: [ (¶0114) “Some implementations are used for processing images, and so the memory arrays can be referred to as pixel arrays.” ] therefore showing it is just another term for memory but specifically with image data being stored. Vantrease does teach this as shown [ (¶0039) “(e.g., the convolution output 410 b, etc.) may correspond to the output of a PE of the layer 304. The convolution outputs may correspond to an output feature map indicating the result of processing an input feature map comprising the pixel data” ]. The argument is non-persuasive. 
Respectfully, the examiner disagrees. Applicant’s claimed structure is a processor that is further divided into the tensor arrays, memory cells and the interconnects that connect the two and contains individual data processing units. Vantrease’s processing structure, which is what is referenced for the individual processing elements, contains a processor which is broken up into systolic arrays which are interconnected to the memory via interconnects and contain the individual processing elements [ Vantrease (¶0054) ]. Therefore both applicant and Vantrease contain the same structure.
Further, applicant argues that Delaye does not teach the claimed limitation of transforming the chip into multiple tensor arrays containing a plurality of processing units. The claim limitation that is being referenced by the applicant is: “A processing chip including: a first arrangement of a plurality of tensor arrays;” Delaye was not relied upon for teaching the plurality of tensor arrays but rather the aspect that the tensor arrays was included on the processing chip directly. This is evidenced by Vantrease teaching the plurality of tensor arrays [ Vantrease (¶0026) “Embodiments of the disclosed technologies can provide systems and methods for efficient utilization of the systolic arrays.” ]. Applicant’s arguments are not persuasive. 
Further, the applicant makes arguments (pg. 12 of remarks) that dependent claims 3, 13, and 19 are not taught by the references cited. Applicant cites that the claimed tensor arrays within each processing unit send output data to the memory cells or pixel arrays via interconnects. Applicant further cites that the citation from Vantrease that shows the neural network processor being connected to the memory via an interconnect does not teach the claimed limitation due to difference in structure. As applicant has acknowledged in the remarks, (pg. 12,  Lines 19-21 of remarks) Vantrease does teach the processing elements passing to an buffer. One of ordinary skill in the art would consider a buffer to be considered a form of memory. As stated above, the specific memory cells are taught by Delaye and the function of 
Applicant is reminded that cited prior art references must be considered in their entirety and not only the cited sections [ MPEP 2141.02(VI) ]. 




Claim Objections
Claim 11 is objected to because of the following informalities:  The claim reads: “tensor arrays and a plurality memory cells, wherein…”. The correct grammar would read “tensor arrays and a plurality of memory cells, wherein…”. Appropriate correction is required.
Claim 18 is objected to because of the following informalities: The claim is presented as amended but there are no visible changes to the claim. Further the claim seems to be identical to the originally presented claim. Appropriate correction is required. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 6-14, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Vantrease (US 20190236049 A1) in view of Delaye (US 20190114499 A1).
Regarding claim 1, Vantrease teaches the following: 
A device for performing computations of a convolutional neural network, the device comprising: 
determine, for a particular layer of the convolutional neural network, a number of input channels and a number of output maps generated for particular ones of a plurality of pixels;
[ (¶0037) “the pixel data in the input image 404 may be referred to as input feature map elements of an input feature map, and may indicate that the pixels are processed by the same filter (or same sets of filters) corresponding to certain feature(s). An output feature map may represent convolution outputs between the filter 402 and the input feature map.”
This citation from Vantrease shows the input channels (as input feature map) and output maps (as convolution outputs) for the pixels. ]
wherein a particular processing unit performs computations associated with a particular one of the plurality of pixels;
[ (abstract) “A processing element (PE) of a systolic array can perform neural networks computations in parallel” ]
[ (¶0021) “Input data (e.g., pixels for an image) and the weights may be received from a host server. Each PE may be capable of performing concurrent arithmetic operations including additions and multiplications on the input data and the weights.”
This citation has PE (which stands for processing element) being able to perform computations for each individual pixel in the plurality of pixels. ]
and wherein individual ones of the plurality of processing units each includes: a number of the tensor arrays determined based on the number of input channels,
[ (¶0025) “the systolic array may vary based on a size (e.g., a number of bits) of the input data”
This citation shows that the systolic array (which houses the plurality of tensor arrays and processing elements) may vary in size based on the size of the input data. Under broadest reasonable interpretation, the input data could include more channels as there is more data (three channels for color input as an example for image convolution) ]
and a number of memory cells corresponding to the number of output maps;
[ (¶0055) “The memory 614 may also be configured to store outputs of the neural network processor 602 (e.g., one or more image recognition decisions on the input images in the form of output data sets).”
This citation from Vantrease shows the memory storing the output data sets and the possibility of there being a plurality of data sets to store which would correspond to the amount of memory needed to store it. ]
and assign the plurality of processing units to perform computations of the particular layer.

In this reference, computation engine encloses the array of processing elements. ]
	What is not explicitly disclosed by Vantrease is: 
	A processing chip including: a first arrangement of a plurality of tensor arrays;
	A second arrangement of a plurality of memory cells;
	A plurality of interconnects connecting a particular ones of the tensor arrays to particular ones of the memory cells;
	a computer-readable memory storing instructions for configuring the processing chip to perform computations of the convolutional neural network;
	and a controller configured by the instructions to:
	configure a portion of the processing chip into a plurality of processing units,
These are however taught by Delaye as seen below:
	A processing chip including: a first arrangement of a plurality of tensor arrays;
[ (¶0060) “In an example, the processor 606 includes a systolic array of data processing units (DPUs)” ]
[ (¶0019) “FIG. 8B illustrates convolution in terms of a two-dimensional matrix multiplication operation.”
The first citation above shows the systolic array of data processing units which are portioned from the processor, and the second citation shows that the DPUs process multi-dimensional matrices and vectors which are synonymous with tensor arrays. ]
A second arrangement of a plurality of memory cells;
[ (¶0039) “The microprocessor 212 can include one or more cores and associated circuitry (e.g., cache memories, memory management units (MMUs),” 
This citation from Delaye teaches about the memory being able to be divided into particular units as seen with the MMUs]
A plurality of interconnects connecting a particular ones of the tensor arrays to particular ones of the memory cells;
[ (¶0051) “The logic cells and the support circuits 31 can be interconnected using the programmable interconnect” 
This citation connects the logic cells, which are the processing elements, to the support circuits, which is the memory, via an interconnect. ]
a computer-readable memory storing instructions for configuring the processing chip to perform computations of the convolutional neural network;
[ (¶0040) “The system memory 216 is a device allowing information, such as executable instructions and data, to be stored and retrieved.”
This citation teaches the executable instructions and data while (¶0059) in general talks about the processor executing those instructions. ]
and a controller configured by the instructions to:
[ (¶0007) “coupled to the memory controller, “ ]
configure a portion of the processing chip into a plurality of processing units,
[ (¶0060) “In an example, the processor 606 includes a systolic array of data processing units (DPUs)”
This citation teaches the processor including the data processing units. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the methods and limitations of concurrent operations in a processing element as taught by Vantrease with the hardware elements as taught by Delaye. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that an improved architecture for computing convolutions in a neural network is favorable [ Delaye (¶0005) ]. This would facilitate generalized improvements to speed, resource management, efficiency and accuracy depending on the respective embodiments chosen. 


In regards to claim 2, The device of claim 1, is taught by Vantrease/Delaye as in the rejection for claim 1 above. Vantrease teaches the following for the claim: 
wherein the device is configured to provide input into the processing chip,
[ (¶0018) “Each PE of the input layer may receive an element of an input data set”
PE in this citation stands for processing element which is derived from the processor. ]
	wherein a corresponding one of the processing units are configured to perform the computations of the particular layer for individual ones of the plurality of pixels using the number of tensor arrays
[ (¶0062) “In some implementations, the computing engine 604 can be operated to perform computations for a particular neural network layer,”
This citations shows that computations can be focused onto a particular layer. ]
[ (¶0021) “Each PE may be capable of performing concurrent arithmetic operations including additions and multiplications on the input data and the weights.”
This citation shows that each individual PE can be assigned to do the computations. ]
[ (¶0017) “A systolic array may include a plurality of processing elements (PEs), typically arranged in a 2-dimensional grid.”
Lastly, this citation demonstrates that the processing elements can work on multi-dimensional data sets which is the equivalent to tensor arrays. ]
and to store output using the number of memory cells,
[ (¶0055) “The memory 614 may also be configured to store outputs of the neural network processor” ]
	What is not explicitly taught by Vantrease is: the device further configured to provide an output of the processing chip as an output of the convolutional neural network.
	This is however taught by Delaye as seen below:

	Please refer to claim 1 for the motivation.


In regards to claim 3, The device of claim 1, is taught by Vantrease/Delaye as in the rejection for claim 1 above. Vantrease teaches the rest of the claim as seen below: 
wherein within each particular processing unit of the plurality of processing units, the tensor arrays of the number of tensor arrays are configured to send output data to corresponding ones of the number of memory cells over a subset of the plurality of interconnects.
[ (¶0054) “The apparatus 600 may include a neural network processor 602 coupled to memory 614, a host interface 616, and a direct memory access (DMA) controller 618 via an interconnect” 
This citation shows the connection between the processing unit, which includes tensor arrays configured into it, memory, and connection via an interconnect. ]


In regards to claim 4, The device of claim 1, is taught by Vantrease/Delaye as in the rejection for claim 1 above. Vantrease teaches the following for the claim: 
	wherein a default processing unit includes one tensor array and one memory cell connected to one another by one interconnect,
[ (¶0054) “The apparatus 600 may include a neural network processor 602 coupled to memory 614, a host interface 616, and a direct memory access (DMA) controller 618 via an interconnect 620.” 
This citation shows the memory, processing unit (as depicted by the neural network processor) all connected via an interconnect. The processing unit containing the tensor array. ]

	and wherein the default processing unit has capacity to process one channel and generate one output.
	This is however taught by Delaye as seen below:
[ (¶0068) “Each three-dimensional filter 804 1 . . . 804 OD is convolved with the input image data 802 to generate a respective channel of the output image data” ]
	Please refer to claim 1 for the motivation.

In regards to claim 6, The device of claim 4, is taught by Vantrease/Delaye as in the rejection for claim 4 above. Vantrease teaches the rest of the claim as seen below: 
wherein the controller is further configured by the instructions to: determine that the number of output maps generated for a particular one of the plurality of pixels in the particular layer exceeds the capacity of the default processing unit,
[ (¶0055) “The memory 614 may also be configured to store outputs of the neural network processor 602 (e.g., one or more image recognition decisions on the input images in the form of output data sets).”
This citation shows the amount of output maps/data sets that can be stored is variable. ]
[ (¶0039) “The convolution outputs may correspond to an output feature map indicating the result of processing an input feature map comprising the pixel data in the input image 404 with the filter 402. Each of the convolution output 410 a and the convolution output 410 b may be in the form of an output data set comprising respective output data elements.”
This citation further shows the output data revolving around pixel data. ]
and configure a corresponding one of the plurality of processing units to combine the memory cells of multiple default processing units while using less than all of the tensor arrays of the plurality of processing units.

This citation teaches that the device is capable of switching instructions based on the input size. This citation along with paragraphs (¶0061) and (¶0062) in general go over the ability for the device to combine multiple memory devices when the input size is too large. ]



In regards to claim 7, The device of claim 1, is taught by Vantrease/Delaye as in the rejection for claim 1 above. Vantrease teaches the rest of the claim as seen below: 
wherein at least one tensor array of the plurality of tensor arrays includes circuitry to perform a single multiplication operation.
[ (¶0002) “As part of the processing, each processing element can perform a set of arithmetic operations such as, for example, floating-point multiplications and additions, etc.”
This citation shows that any of the processing elements is capable of performing multiplication operations. ]



In regards to claim 8, The device of claim 1, is taught by Vantrease/Delaye as in the rejection for claim 1 above. Vantrease teaches the rest of the claim as seen below: 
wherein at least one tensor array of the plurality of tensor arrays includes circuitry to perform a plurality of multiplication operations.

This citation shows the processing elements can perform a plurality of multiplication operations. ]



In regards to claim 9, The device of claim 1, is taught by Vantrease/Delaye as in the rejection for claim 1 above. Delaye teaches the rest of the claim as seen below: 
wherein the controller is further configured by the instructions to configure the processing chip into a plurality of processing units that collectively perform the computations of multiple layers of the convolutional neural network.
[ (¶0060) “In an example, the processor 606 includes a systolic array of data processing units (DPUs)”
This citation shows that the processor includes a plurality of data processing units. ]
[ (¶0059)
	This paragraph goes into detail about the processor performing the computations of multiple layers of the convolutional network. It discusses the order and different types of computations being performed by the processor. ]
	Please refer to claim 1 for the motivation.



In regards to claim 10, The device of claim 9, is taught by Vantrease/Delaye as in the rejection for claim 9 above. Vantrease teaches the rest of the claim as seen below: 
wherein the plurality of processing units form an array,

	wherein the array comprises a plurality of systolic transfer structures
[ (¶0017) “Embodiments of the disclosed technologies can provide systems and methods for efficient utilization of the systolic arrays for neural network computations” ]
	to systolically transfer output maps generated by a first subset of the processing units
[ (¶0021) “The PEs may then pass the input data and the weights to other elements in the systolic array for further processing,” ]
[ (¶0039) “The convolution outputs may correspond to an output feature map indicating the result of processing”
Here the first citation talks about the processing elements sending various forms of data in the systolic array, the second citation shows that one of those pieces of data maybe the output maps generated by the processing elements. ]
	for one layer of the convolutional neural network to a second subset of processing units assigned to a next layer of the convolutional neural network.
[ (¶0018) “The PEs in the intermediate layers may combine the scaled elements received from each PE of the input layer to compute a set of intermediate outputs.”
This citation shows that the processing elements do pass information from one layer to processing elements within another layer. ]



Regarding claim 11, Vantrease teaches the following: 
A method for performing computations of a neural network, the method comprising:
determining, for a particular layer of the neural network, a number of input channels and a number of output maps generated for particular ones of a plurality of pixels;
[ (¶0037) “the pixel data in the input image 404 may be referred to as input feature map elements of an input feature map, and may indicate that the pixels are processed by the same filter (or same sets of filters) corresponding to certain feature(s). An output feature map may represent convolution outputs between the filter 402 and the input feature map.”
In this citation from Vantrease, it teaches the input feature map and the output map both being generated from the pixels or the convolution operations from said pixels. ]
	wherein a particular processing unit performs computations associated with a particular one of the plurality of pixels;
[ (¶0021) “Input data (e.g., pixels for an image) and the weights may be received from a host server. Each PE may be capable of performing concurrent arithmetic operations including additions and multiplications on the input data and the weights.”
This citation shows that the processing elements (PE) are able to perform computations on input data (and therefore pixels) ]
	and wherein individual ones of the plurality of processing units each includes: a number of the tensor arrays determined based on the number of input channels,
[ (¶0025) “the systolic array may vary based on a size (e.g., a number of bits) of the input data”
This citation shows that the systolic array (which houses the plurality of tensor arrays and processing elements) may vary in size based on the size of the input data. Under broadest reasonable interpretation, the input data could include more channels as there is more data (three channels for color input as an example for image convolution) ]
and a number of memory cells corresponding to the number of output maps;

This citation from Vantrease shows the memory storing the output data sets and the possibility of there being a plurality of data sets to store which would correspond to the amount of memory needed to store it. ]
and assigning the plurality of processing units to perform computations of the particular layer.
[ (¶0062) “In some implementations, the computing engine 604 can be operated to perform computations for a particular neural network layer,”  ]
What is not explicitly disclosed by Vantrease is:
configuring a portion of a processing chip into a plurality of processing units,
wherein the processing chip includes a plurality of tensor arrays and a plurality memory cells,
This is however taught by Delaye as seen below:
configuring a portion of a processing chip into a plurality of processing units,
[ (¶0060) “In an example, the processor 606 includes a systolic array of data processing units (DPUs)”
This citation shows the processor including the plurality of processing units. ]
	wherein the processing chip includes a plurality of tensor arrays and a plurality memory cells,
[ (¶0033) “The architecture allows for implementation of image processing, such as convolution, using a large systolic array,” 
This citation teaches the tensor arrays as they are housed within the plurality of the systolic array. ]

This citation teaches the memory units or memory cells. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the methods and limitations of concurrent operations in a processing element as taught by Vantrease with the hardware elements as taught by Delaye to generate a method for performing computations of a neural network. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that an improved architecture for computing convolutions in a neural network is favorable [ Delaye (¶0005) ]. This would facilitate generalized improvements to speed, resource management, efficiency and accuracy depending on the respective embodiments chosen and improve the method for computations of a neural network.



In regards to claim 12, The method of claim 11, is taught by Vantrease/Delaye as in the rejection for claim 11 above. Vantrease teaches the following for the claim:
further comprising: providing input into the processing chip to perform the computations of the particular layer for individual ones of the plurality of pixels using the number of tensor arrays
[ (¶0021) “Input data (e.g., pixels for an image) and the weights may be received from a host server.”
	Providing input into the processing chip and processing elements. ]
[ (¶0062) “In some implementations, the computing engine 604 can be operated to perform computations for a particular neural network layer,”
This citations shows that computations can be focused onto a particular layer. ]

This citation shows that each individual PE can be assigned to do the computations. ]
[ (¶0017) “A systolic array may include a plurality of processing elements (PEs), typically arranged in a 2-dimensional grid.”
Lastly, this citation demonstrates that the processing elements can work on multi-dimensional data sets which is the equivalent to tensor arrays. ]
and to store output using the number of memory cells,
[ (¶0055) “The memory 614 may also be configured to store outputs of the neural network processor” ]
	What Vantrease does not explicitly teach is the following:
	and providing an output of the processing chip as an output of the neural network.
	This is however taught by Delaye, as seen below:
[ (¶0059) “The processor 606 generates output image data as a result of the processing.” ]
	Please refer to claim 11 for the motivation.



In regards to claim 13, The method of claim 11, is taught by Vantrease/Delaye as in the rejection for claim 11 above. Vantrease teaches the rest of the claim as seen below: 
wherein within each processing unit of the plurality of processing units, the method further comprises sending, by the tensor arrays of the number of tensor arrays, output data to corresponding ones of the number of memory cells over a subset of a plurality of interconnects connecting particular ones of the tensor arrays to particular ones of the memory cells.

This citation shows that the various components for the processing unit are all joined together via an interconnect. ]
[ (¶0055) “The memory 614 may also be configured to store outputs of the neural network processor” 
	This citation teaches that the memory can be the component configured to store the output data of the neural network. ]


In regards to claim 14, The method of claim 11, is taught by Vantrease/Delaye as in the rejection for claim 11 above. Vantrease (US 20190236049 A1) teaches the following for the claim:
wherein a default processing unit includes one tensor array and one memory cell connected to one another by one interconnect connecting the one tensor array to the one memory cell,
[ (¶0054) “The apparatus 600 may include a neural network processor 602 coupled to memory 614, a host interface 616, and a direct memory access (DMA) controller 618 via an interconnect 620.” 
This citation shows that the various components for the processing unit are all joined together via an interconnect. ]
	What is not explicitly taught by Vantrease is the following: and wherein the method further comprises processing one channel and generating one output using the default processing unit.
This is however taught by Delaye, as seen below:

This citation teaches the input data being processed and generating an output via one channel. ]
Please refer to claim 11 for the motivation.



In regards to claim 16, The method of claim 14, is taught by Vantrease/Delaye as in the rejection for claim 14 above. Vantrease teaches the following parts of the claim as seen below: 
wherein the method further comprises: determining that the number of output maps generated for a particular one of the plurality of pixels in the particular layer exceeds a capacity of the default processing unit,
[ (¶0039) “The convolution outputs may correspond to an output feature map indicating the result of processing an input feature map comprising the pixel data in the input image 404 with the filter 402. Each of the convolution output 410 a and the convolution output 410 b may be in the form of an output data set comprising respective output data elements.”
This citation from Vantrease teaches the output maps being generated for pixel data from the convolution of a particular layer. ]
[ (¶0026)
This paragraph in the reference talks about the possibility of capacity being exceeded for the processing unit and what different operations can be done to minimize that or what operations can be done once it has happened. ]
What is not explicitly taught by Vantrease is the following:
and configuring a corresponding one of the plurality of processing units to combine the memory cells of multiple default processing units while using less than all of the tensor arrays of the plurality of processing units.
This is however taught by Delaye, as seen below:
[ (¶0081) and (¶0082)
	The two paragraphs above in the reference teach memory units being dynamic and expanding resources as needed to be a fitting storage space for the data without affecting the data processing units. ]
	With respect to Claim 16, it is substantially similar to Claim 11 and is rejected in the same manner, the same art and reasoning applying. Please refer to claim 11 for the motivation.



Regarding claim 17, Vantrease teaches the following: 
A controller comprising one or more processors configured to: determine, for a particular layer of a neural network, a number of input channels and a number of output maps generated for particular ones of a plurality of pixels;
[ (¶0037) “the pixel data in the input image 404 may be referred to as input feature map elements of an input feature map, and may indicate that the pixels are processed by the same filter (or same sets of filters) corresponding to certain feature(s). An output feature map may represent convolution outputs between the filter 402 and the input feature map.”
This citation from Vantrease shows the input channels (as input feature map) and output maps (as convolution outputs) for the pixels. ]
wherein the processing chip includes one or more tensor arrays and one or more pixel arrays,

This citation shows the processing chip with processing elements which include multi-dimensional arrays which is synonymous with tensor arrays.]
[ (¶0046) “Each set of input data corresponds to the entries of a pixel array.”
And this second citation shows the pixel array.]
wherein a particular processing unit performs computations associated with a particular one of the plurality of pixels;
[ (¶0030) “The artificial neural network may include a plurality of processing elements, with each processing element configured to process a portion of the input pixel data,” ]
	and wherein individual ones of the plurality of processing units each includes: a number of the tensor arrays determined based on the number of input channels,
[ (¶0025) “the systolic array may vary based on a size (e.g., a number of bits) of the input data”
This citation shows that the systolic array (which houses the plurality of tensor arrays and processing elements) may vary in size based on the size of the input data. Under broadest reasonable interpretation, the input data could include more channels as there is more data (three channels for color input as an example for image convolution) ]
and a number of pixel arrays corresponding to the number of output maps;
[ (¶0039) “(e.g., the convolution output 410 b, etc.) may correspond to the output of a PE of the layer 304. The convolution outputs may correspond to an output feature map indicating the result of processing an input feature map comprising the pixel data”
This citation from Vantrease teaches how the convolution outputs corresponds to the output map and can comprise the pixel data which could be in the form of an array. ]
and assign the plurality of processing units to perform computations of the particular layer.

	What is not explicitly taught by Vantrease is the following: configure a portion of a processing chip into a plurality of processing units,
	This is however taught by Delaye, as seen below: 
[ (¶0060) “In an example, the processor 606 includes a systolic array of data processing units (DPUs)” ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the methods and limitations of concurrent operations in a processing element as taught by Vantrease with the hardware elements as taught by Delaye to generate a controller for performing computations of a neural network. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that an improved architecture for computing convolutions in a neural network is favorable [ Delaye (¶0005) ]. This would facilitate generalized improvements to speed, resource management, efficiency and accuracy depending on the respective embodiments chosen and improve the computations of a neural network.



In regards to claim 18, The controller of claim 17, is taught by Vantrease/Delaye as in the rejection for claim 17 above. Vantrease teaches the following parts of the claim as seen below:
wherein the one or more processors are further configured to: provide input into the processing chip to perform the computations of the particular layer for individual ones of the plurality of pixels using the number of tensor arrays and to store output using the number of pixel arrays,

This citation teaches performing computations on a particular layer. ]
[ (¶0017) “A systolic array may include a plurality of processing elements (PEs), typically arranged in a 2-dimensional grid.”
The 2-dimensional grid shows that the systolic array and processing elements work with tensor arrays. ]
[ (¶0018) “Each PE of the input layer may receive an element of an input data set”
This citation shows that the processing element may receive an individual piece of the input data set. ]
[ (¶0039) “(e.g., the convolution output 410 b, etc.) may correspond to the output of a PE of the layer 304. The convolution outputs may correspond to an output feature map indicating the result of processing an input feature map comprising the pixel data”
This citation from Vantrease teaches how the convolution outputs corresponds to the output map and can comprise the pixel data which could be in the form of an array. ]
What is not taught by Vantrease, is the following: and provide an output of the processing chip as an output of the neural network.
This is however taught by Delaye, as seen below:
[ (¶0059) “The processor 606 generates output image data as a result of the processing.” ]
	Please refer to claim 17 for the motivation.



In regards to claim 19, The controller of claim 17, is taught by Vantrease/Delaye as in the rejection for claim 17 above. Vantrease teaches the following parts of the claim as seen below:
output data to corresponding ones of the number of pixel arrays over a subset of a plurality of interconnects connecting particular ones of the tensor arrays to particular ones of the pixel arrays.
[ (¶0046) “Each convolution output array may correspond to convolving one set (of the M sets) of filters with the input pixel arrays.”
This citation teaches output data corresponding with the pixel arrays. ]
[ (¶0054) “The apparatus 600 may include a neural network processor 602 coupled to memory 614, a host interface 616, and a direct memory access (DMA) controller 618 via an interconnect 620.” 
This citation teaches that all of the pieces will be connected via an interconnect. ]
What is not explicitly taught by Vantrease is the following: wherein within each processing unit of the plurality of processing units, the one or more processors are further configured to cause send, by the tensor arrays of the number of tensor arrays,
	This is however taught by Delaye, as seen below:[ (¶0060) “In an example, the processor 606 includes a systolic array of data processing units (DPUs) 607. As described further below, convolution can be performed using matrix multiplication. The DPUs 607 execute multiply-accumulate operations based on the sample streams and the filter data to generate the output image data.”
This citation teaches the multiple processing units being able to compute. ]
[ (¶0068) “Each three-dimensional filter 804 1 . . . 804 OD is convolved with the input image data 802 to generate a respective channel of the output image data”
This citation shows that there are multiple dimensions for the data which would make them synonymous with the tensor arrays.]
	Please refer to claim 17 for the motivation.



Claims 5, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Vantrease/Delaye, in further view of Huang (US 20190179795 A1).
Regarding claim 5, Vantrease/Delaye teaches the following: The device of claim 1, as applied to claim 1 above, with Vantrease teaching the following:
wherein the controller is further configured by the instructions to: determine that the number of input channels for a particular one of the plurality of pixels in the particular layer exceeds the capacity of the default processing unit,
[ (¶0060) “In some embodiments, the computation controller 606 may determine an operating mode of the computing engine 604 based on the data type and the size of the input data set. For example, if the input data set is much larger (e.g., 2000 data elements) than the size of the systolic array (e.g., 16×16), the computation controller 606 may switch the operating mode of the computing engine 604 to an optimization mode. The optimization mode may enable the computing engine 604 to perform multiple computations in parallel for each input data set.”
This citation shows that if the size of the input is larger than the processor can handle and exceeds the capacity, then it can make necessary changes. ]
 [ (¶0021) “Input data (e.g., pixels for an image) and the weights may be received from a host server.” 
This citation shows that the input data can be pixels for/from an image.  ]
What Vantrease/Delaye fail to explicitly teach is the following: while using less than all of the memory cells of the plurality of processing units.
However, this is taught by Huang, as seen below:
[ (¶0124) “For example, the neural network processing engine 902 b may need less than all of the space in the memory subsystem 904 b to store the weights for a neural network”

	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the methods and systems of concurrent operations in a neural network as taught by Vantrease/Delaye with the ability to use less memory than the total amount of memory needed in an operation. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that reducing the use of memory when it is not needed would make the system more efficient and allow the processing engine to perform other computations with said resources [ Huang (¶0127) and (¶0126) ]. This would facilitate generalized improvements to speed, resource management, and efficiency depending on the respective embodiments chosen and improve the method for computations of a neural network.



Regarding claim 15, Vantrease/Delaye teaches the following: The method of claim 11, as applied to claim 11 above, with Vantrease teaching the following:
wherein the method further comprises: determining that the number of input channels for a particular one of the plurality of pixels in the particular layer exceeds a capacity of the default processing unit, and configuring a corresponding one of the plurality of processing units to combine tensor arrays of multiple default processing units
[ (¶0060) “In some embodiments, the computation controller 606 may determine an operating mode of the computing engine 604 based on the data type and the size of the input data set. For example, if the input data set is much larger (e.g., 2000 data elements) than the size of the systolic array (e.g., 16×16), the computation controller 606 may switch the operating mode of 
This citation shows that if the size of the input is larger than the processor can handle and exceeds the capacity, then it can make necessary changes. These changes include combining multiple default processing units and their tensor arrays. ]
 [ (¶0021) “Input data (e.g., pixels for an image) and the weights may be received from a host server.” 
This citation shows that the input data can be pixels for/from an image.  ]
What Vantrease/Delaye fail to explicitly teach is the following: while using less than all of the memory cells of the plurality of processing units.
However, this is taught by Huang, as seen below:
[ (¶0124) “For example, the neural network processing engine 902 b may need less than all of the space in the memory subsystem 904 b to store the weights for a neural network”
This citation from Huang teaches the neural network processing engine needing less than the full amount of memory cells available. ]
Please refer to claim 5 for the motivation.



Regarding claim 20, Vantrease/Delaye teaches the following: The controller of claim 17, as applied to claim 17 above, with Vantrease teaching the following:
and wherein the one or more processors are further configured to determine that the number of input channels for a particular one of the plurality of pixels in the particular layer exceeds a capacity of the default processing unit, and configure a corresponding one of the plurality of processing units to combine tensor arrays of multiple default processing units

This citation shows that if the size of the input is larger than the processor can handle and exceeds the capacity, then it can make necessary changes. These changes include combining multiple default processing units and their tensor arrays. ]
 [ (¶0021) “Input data (e.g., pixels for an image) and the weights may be received from a host server.” 
This citation shows that the input data can be pixels for/from an image.  ]
while using less than all of the pixel arrays of the plurality of processing units,
[ (¶0046) “Each convolution output array may correspond to convolving one set (of the M sets) of filters with the input pixel arrays.”
This citation shows using one set of a larger plurality of available sets (M Sets) of pixel arrays. ]
wherein the controller performs the operations by computer executable instructions stored in a non-transitory computer-readable medium.
[ (¶0107) “The instructions executed by the processing logic 1102 may be stored on a computer-readable storage medium, for example, in the form of a computer program. The computer-readable storage medium may be non-transitory.” ]
	What is not explicitly taught by Vantrease/Delaye is the following: wherein a default processing unit includes one tensor array and one pixel array connected to one another by one interconnect connecting the one tensor array to the one memory cell for processing one channel and generating one output,

[ (¶0086) “Input data 550 can arrive over the chip interconnect 520. The chip interconnect 520 can connect the neural processing engine 502 to other components of a neural network processor, such as a Direct Memory Access (DMA) engine that can obtain input data”
This citation shows that the interconnect brings together various components like the input data, processing channe, and memory. ]
[ (¶0086) “The input data 550 can be, for example one-dimensional data, such as a character string or numerical sequence, or two-dimensional data, such as an array of pixel values”
This citation shows that the input data can be the pixel array as seen in the claim and also shows that there is multi-dimensional data which is synonymous with tensor arrays. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the methods and systems of concurrent operations in a neural network as taught by Vantrease/Delaye with the interconnected system as taught by Huang. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that the combination would enable the neural network processing engine to increase throughput and produce results faster [ Huang (¶0116) ]. This would facilitate generalized improvements to speed, resource management, and efficiency depending on the respective embodiments chosen and improve the controller for computations of a neural network.



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20200327367 A1 – Look-up convolutional layer in convolutional neural network which teaches an array of processing units, vector/tensor usage, interconnects that connect various components of hardware, different memory tiers, data channels and output feature maps.
US 20200387798 A1 – Time invariant classification which teaches tensor arrays, memory cells, using parallel processing with said arrays, the use of pixel data, and data channels. 


THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL MERABI whose telephone number is (571)272-9685. The examiner can normally be reached Mon-Fri 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.A.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123