DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the amendment filed 8/10/2022. Claims 1-24 are pending. 

Response to Amendment
The amendment filed 8/10/2022 has been entered. In the amendment, claims 1, 12, 23 and 24 were amended by applicant, and no claims were cancelled or added. As such, claims 1-24 are pending and have been examined.

Response to Arguments
Applicant's arguments with respect to the rejections of claims 1-24 under 35 U.S.C. 103, have been fully considered, but they are not persuasive. 
With reference to amended independent claims 1, 12, 23 and 24, applicant states “Applicant hereby amends Claim 1 to recite ‘based on the determination of the one of the one or more processor precisions, transform the input data tensor from the input precision to the determined one of the one or more processor precisions." Applicant's ‘Independent Claims 12, 23, and 24 include similar amendments.” (applicant’s remarks, page 8). 
With continued reference to the above-noted amended claim language and reference to the primary Baum reference, applicant asserts “Baum does not teach or suggest ‘determining one of the one or more processor bit precisions,’ and ‘based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input bit precision to the determined one of the one or more processor bit precisions,’ as claimed.” (applicant’s remarks, page 12). With continued reference to Baum, applicant alleges, which examiner does not concede, that “Baum describes determining different bit widths for a model to be executed, such a determination is not disclosed as including any determination of a processor precision. In addition, in Baum, any modification of the model is not based on any determination of the processor precision. Baum' s disclosure of modifying an ANN model to provide ‘adaptations to different bit widths’ by the optimizer component of Baum is performed prior to the assignment of the model to be processed by appropriate hardware or the execution of the modified model on any particular hardware” before concluding “Baum does not teach or suggest a ‘determining one of the one or more processor bit precisions,’ and ‘based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input bit precision to the determined one of the one or more processor bit precisions,’ as required by Applicant's claims.” Id. 
With continued reference to the independent claims and Baum, applicant further asserts “Baum does not teach or suggest ‘dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the processor,’ as claimed.” Id (paraphrasing language of claim 1).
Regarding the secondary Esser reference, applicant generally asserts “Esser is cited as showing other aspects of Applicant's claims but does not make up for the deficiencies of Baum with regard to the rejection.” (applicant’s remarks, page 13).
Referencing amended independent claims 12, 23 and 24, applicant asserts “that Claims 12, 23, and 24 should be deemed allowable in light of the remarks above regarding Independent Claim 1.” Id. Therefore, applicant argues that independent claims 12, 23 and 24 have been amended along the lines of claim 1 and are believed (by applicant) to be allowable in view of applicant’s amendments and above-noted arguments regarding claim 1. 
Regarding the dependent claims, applicant generally asserts “that Dependent Claims 2-11 and 13-22 should be deemed allowable over the prior art of record at least by virtue of depending from an allowable base claim.” Id.
Accordingly, applicant apparently argues that the claim limitations recited in amended claim 1, “based on the determination of the one of the one or more processor precisions, transform the input data tensor from the input precision to the determined one of the one or more processor precisions”, are not disclosed or taught in the portions of the Baum and Esser references applied to claim 1 in the previous Office Action.
With regard to applicant’s characterizations of Baum and applicant’s assertions regarding Baum’s and Esser’s purported shortcomings vis-à-vis amended independent claims 1, 12, 23 and 24, the examiner disagrees and points applicant to the below discussion of Baum and Esser.
Regarding to the limitation “based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input bit precision to the determined one of the one or more processor bit precisions” recited in amended claims 1, 12, 23 and 24, as a preliminary matter, aside from merely repeating the “transforming” claim language in paragraphs 2-3 and stating “tensor formatted input is transformed into a specific vector, matrix, or tensor format, compatible with the underlying neural network hardware” and “the input data tensor is transformed from the input bit precision to one of the processor bit precisions” in paragraphs 55 and 57, applicant’s specification does not mention, much less define what is meant by the newly-added “based on the determination of the one of the one or more processor precisions” limitation added to each of independent claims 1, 12, 23 and 24. Therefore, “based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input bit precision to the determined one of the one or more processor bit precisions”, under the BRI, in light of the specification, is transforming or adapting input data to a processor’s or processing element’s bit precision or bit width. 
With continued reference to the above-noted “based on the determination of the one of the one or more processor precisions, transforming the input data” limitation recited in amended claims 1, 12, 23 and 24, using respective similar language, the examiner points to paragraphs 164-165, 167, 170, 237, 242 and 278 of Baum, which explicitly disclose that “The neurons of the ANN are implemented in the PE, … The processing element, generally referenced 450, comprises an input data representation circuit 452 … transformation/rounding circuit 456”, “This circuit is operative to transform the representation of the input data … from integer to floating point (FP) format”, “circuit 456 functions to perform rounding of the product before input to the accumulator.” [i.e., transform the input data from an input precision - integer to the processor’s floating point precision], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., input precision], “perform model level optimizations … model adjustments for performance, and numerical adaptations to different bit widths [i.e., bit precisions] … allocates and assigns physical resources (e.g., compute and memory elements, etc.) [i.e., including the NN processor] … perform bit exact numerical emulation of the NN processor”, “the NN processor is able to operate at any desired granularity of any subset of the input” [i.e., based on the NN processor’s bit width/precision, adapting/transforming input data granularity], “the flexible processing granularity of the NN processor and related memory … is shown in FIG. 23. … leveraging the data pipeline to … operate at low input domain granularity. Consider the example input tensor 932 including input data 938 … One of the network layers then applies an NN operation 934 to the input data”, “input tensor 932 including input data” [i.e., the input data is in input tensor 932] and “The SCALE factor is used to represent the size in bytes (i.e. the granularity) of each element” [i.e., based on the NN processor’s bit widths/precisions, transforming granularity/byte size/precision of input data tensor 932 to one of the NN processor’s bit widths/precisions].
	Regarding applicant’s above-noted assertions that “Baum does not teach or suggest ‘determining one of the one or more processor bit precisions’” and “Baum does not teach or suggest ‘dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the processor’ as claimed” (applicant’s remarks, page 12, paraphrasing language of claim 1), the examiner respectfully disagrees and points applicant to the below discussion of Baum.
	With reference to “determining one of the one or more processor bit precisions” recited, using respective similar language, in claims 1, 12, 23 and 24, the examiner points to FIGs. 6 and 7A of Baum, which depict multipliers 142, 454 that are components of processing elements 140, 450 [i.e., processor]. The examiner further points to paragraphs 118, 170 and 236 of Baum, which explicitly disclose “computation units that are organized into various aggregation levels or hierarchical levels, such as PEs, … NN cores … features of the compute fabric include: … flexibility of number representation, including integer and floating point as well as different bit widths”, “the quad multiplier of the PE … The quad multiplier, generally referenced 870, comprises four lower precision (e.g., 8-bit) multipliers” [i.e., processing element/PE/processor and NE cores/processors have determined bit widths/precisions] and “translator 772 functions to receive the user model and generate an intermediate format of the model. The optimizer 774 functions to perform model level optimizations, post-translation model adjustments for performance, and numerical adaptations to different bit widths [i.e., determining different bit widths/precisions for a model to be executed]. The resource allocator 778 allocates and assigns physical resources (e.g., compute and memory elements, etc.) in accordance with the intermediate model” [i.e., allocator determines/assigns physical resources such as compute elements/processors based on a processor bit width/precision].
	Regarding the limitation “dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor” recited, using respective similar language, in claims 1, 12, 23 and 24, as a preliminary matter, the examiner notes that, contrary to applicant’s above assertion, claim 1 recites “dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor”.
	With continued reference to the above-noted dividing limitation, the examiner points to FIG. 3 of Baum, which depicts input data 347 that is divided into blocks 346 that conform to/are allocated to subclusters 384. The examiner further points to paragraphs 109, 207, 237 and 265 of Baum, which explicitly disclose that “in an example NN processor embodiment [i.e., a neural network processor], a PE [processing element/processor] comprises P=16 neurons, a subcluster comprises N=64 PEs [processing elements] … and the NN core comprises L=8 clusters” [i.e., P/number of neurons and N/subcluster PEs are processor dimensions, and L/number of clusters is a feature dimension of the NN processor], “ANN input data 347 enters shared L3 memory, is read from allocated memory blocks, processed by the PEs [processing elements] in one or more subclusters, output to neighboring memory blocks”, “adjustments for performance, and numerical adaptations to different bit widths” [i.e., adjustments/adaptions conforming to one of the feature dimensions of the NN processor – number of clusters, bit widths] and “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., input data/tensor is allocated/divided into blocks conforming to subclusters/number of clusters/one of the feature dimensions of the NN processor].
Additionally, as discussed below, regarding the “input data … having feature dimensions at an input bit precision” limitation recited, using respective similar language, in claims 1, 12, 23 and 24, the examiner points to FIG. 1 of Esser, which depicts “Input” into “layers of a convolutional network” with “partitioning the feature dimension) at the same location (partitioning the spatial dimensions).” The examiner also points to pages 11442-11444 of Esser, which explicitly disclose that “Neurons within a layer are arranged in two spatial dimensions … and one feature dimension, … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., feature dimensions] and “Network inputs are typically represented with multibit channels [for example, eight-bit red, green, and blue (RGB) channels]. … each bit represents a different value” [i.e., input data having input bit precision].
As further discussed below, regarding the “feature dimensions at one or more processor bit precisions” limitation recited in claims 1, 12, 23 and 24, using respective similar language, the examiner points to pages 11441 and 11443 of Esser, which explicitly disclose “we demonstrate that neuromorphic computing … can implement deep convolution networks … This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors [i.e., neural network processors] … convolutional networks typically use high precision (≥32-bit) neurons and synapses … neuromorphic designs can use one-bit spikes to provide event-based computation … and can use low-precision synapses” [i.e., one or more processor bit precision] and “Neurons within a layer are arranged in two spatial dimensions … and one feature dimension, … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., feature dimensions].
Lastly, regarding the limitation “dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor” recited, using respective similar language, in claims 1, 12, 23 and 24, the examiner points to FIG. 1 of Esser, which shows “layers of a convolutional network” that “designate neurons (individual boxes) belonging to the same group (partitioning the feature dimension)” [i.e., dividing/partitioning input data matrix/tensor into blocks conforming to feature dimensions]. The examiner additionally points to pages 11441 and 11443 of Esser, which explicitly disclose that “neuromorphic processors [i.e., neural network processors] … neuromorphic systems can take advantage of blockwise connectivity”, “Connectivity between neurons follows a blockwise scheme: each neuron can connect to one input line of any core in the system” and “x ={xi,j,f} are the neuron’s input pixels or neurons, w = {xi,j,f} are the filter weights, i, j are over the topographic dimensions, and f is over the feature dimension or input channels. … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., each block of the input data matrix x/tensor conforms to one of the feature dimensions of the neuromorphic processor/core]).
As discussed in detail below, the combination of Baum and Esser (i.e., Baum in view of Esser) teaches the limitations of amended independent claims 1, 12, 23 and 24 and dependent claims 2, 4, 13 and 15. 
Regarding the dependent claims, applicant generally asserts “that Dependent Claims 2-11 and 13-22 should be deemed allowable over the prior art of record at least by virtue of depending from an allowable base claim.” (applicant’s remarks, page 13).
As discussed below, the combination of Baum, Esser and Chung (i.e., Baum in view of Esser and further in view of Chung) teaches the limitations of dependent claims 3 and 14, and the combination of Baum, Esser and Na (i.e., Baum in view of Esser and further in view of Na) teaches the limitations of dependent claims 5 and 16. As also detailed below, the combination of Baum, Esser and Kovvuri (i.e., Baum in view of Esser and further in view of Kovvuri) teaches the limitations of dependent claims 6, 7, 17 and 18. As further discussed below, the combination of Baum, Esser and Langhammer (i.e., Baum in view of Esser and further in view of Langhammer) teaches the limitations of dependent claims 8-11 and 19-22.
Applicant’s amendments have necessitated the claim rejections under 35 U.S.C. 103 discussed below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 4, 12, 13, 15, 23 and 24 are rejected under 35 USC 103 as being obvious over Baum et al. (U.S. Patent Application Pub. No. 2018/0285254 A1, hereinafter “Baum”) in view of non-patent literature Esser et al. (“Convolutional networks for fast, energy-efficient neuromorphic computing.” Proceedings of the national academy of sciences 113.41 (2016): 11441-11446, hereinafter “Esser”). Based upon the earlier publication date of the Esser reference, October 11, 2016, which is before the effective filing date of this application, i.e., October 11, 2018, it constitutes prior art under 35 U.S.C. 102(a)(1). Also, while the applied Esser reference is co-authored by some of the inventors of the instant application, e.g., Steven K. Esser, John V. Arthur, Andrew S. Cassidy, Rathinakumar Appuswamy, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner and Dharmendra S. Modha, the 35 USC § 102(b)(1)(A) exception does not apply because the reference was authored in part by Paul A. Merolla, Alexander Andreopoulos, David J. Berg, Jeffrey L. McKinstry, Timothy Melano, Davis R. Barch and Carmelo di Nolfo, who are not named as inventors of the instant application. See MPEP § 2153.01(a): “If ... the application names fewer joint inventors than a publication (e.g., the application names as joint inventors A and B, and the publication names as authors A, B and C), it would not be readily apparent from the publication that it is by the inventor (i.e., the inventive entity) or a joint inventor and the publication would be treated as prior art under AIA  35 U.S.C. 102(a)(1).”
Regarding claim 1, Baum discloses the invention as claimed including a method comprising:
receiving an input data tensor at a neural network processor comprising a plurality of neural cores (see, e.g., FIG. 8 – depicting step 820 to “RECEIVE SIZE INFORMATION FOR EACH DIMENSION OF THE DATA TENSOR” and paragraphs 18, 21, 98, 106 and 242, “method of accessing multi-dimensional data in memory … applicable to neural network (NN) processing engines [i.e., a plurality of neural cores] adapted to implement artificial neural networks (ANNs).”, “a method of scanning tensor data stored in a memory” [i.e., a method for receiving/scanning an input data tensor], “Neural Network (NN) Processing Core … the neural network (NN) processor comprises a plurality of basic computation units”, “NN processing system comprising one or more NN processing cores” [i.e., at a neural network processor including neural cores], “input tensor 932 including input data 938” [i.e., input data tensor 932]),
the input data tensor having … dimensions at an input bit precision (see, e.g., FIG. 8 – depicting step 824 to “RECEIVE A NEXT COMMAND CONTAINING DIMENSION INFORMATION” and paragraphs 170, 261, 265 and 279, “The input to the quad multiplier is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., an input bit precision], “Multi-Dimensional Data Stored in Memory … the data that is stored in memory is multi-dimensional”, “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., the input data tensor has feature dimensions], “the circuit receives the size of each dimension SJ of data … An external NEXT trigger (command or count) signal is received containing dimension information (step 824).” [i.e., the received tensor has dimensions of certain sizes and precisions]),
the neural network processor being configured for one or more feature dimensions at one or more processor bit precisions (aside from merely repeating the claim language in paragraphs 2-3 and 57, applicant’s specification does not mention, much less define what is meant by a “feature dimension” or “one or more feature dimensions”. The plain meaning of feature is a prominent or conspicuous part or characteristic. See https://www.dictionary.com/browse/feature. Further, the plain meaning of dimension and dimensions is a property of space; extension in a given direction; and measurement in length, width, and thickness. See https://www.dictionary.com/browse/dimension. Therefore, “one or more feature dimensions”, under the broadest reasonable interpretation (BRI), in light of the specification, is one or more properties, extensions or measurements of a part or characteristic, such as a processor bit precision or bit width) (see, e.g., paragraphs 23, 117-118 and 261, “a circuit for accessing multidimensional data”, “The NN processor … combines several features … to handle many types of neural networks” [i.e., the neural network processor includes one or more features], “computation units that are organized into various aggregation levels or hierarchical levels, such as … NN cores … compute elements that are configured to address the special nature of the computational needs of ANNs … flexibility of number representation, including integer and floating point as well as different bit widths” [i.e., the processor is organized/configured for feature dimensions at processor bit widths/precisions], “in convolutional neural networks, data arrays of two, three or more dimensions are stored in memory.” [i.e., one or more dimensions]),
determining one of the one or more processor bit precisions (see, e.g., FIGs. 6 and 7A – depicting multipliers 142, 454 that are components of processing elements 140, 450 [i.e., processor] and paragraphs 118, 170 and 236, “computation units that are organized into various aggregation levels or hierarchical levels, such as PEs, … NN cores … features of the compute fabric include: … flexibility of number representation, including integer and floating point as well as different bit widths”, “the quad multiplier of the PE … The quad multiplier, generally referenced 870, comprises four lower precision (e.g., 8-bit) multipliers” [i.e., processing element/PE/processor and NE cores/processors have determined bit widths/precisions], “translator 772 functions to receive the user model and generate an intermediate format of the model. The optimizer 774 functions to perform model level optimizations, post-translation model adjustments for performance, and numerical adaptations to different bit widths [i.e., determining different bit widths/precisions for a model to be executed]. The resource allocator 778 allocates and assigns physical resources (e.g., compute and memory elements, etc.) in accordance with the intermediate model” [i.e., allocator determines/assigns physical resources such as compute elements/processors based on a processor bit width/precision]);
based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input bit precision to the determined one of the one or more processor bit precisions (aside from repeating the “transforming” claim language in paragraphs 2-3 and stating “tensor formatted input is transformed into a specific vector, matrix, or tensor format, compatible with the underlying neural network hardware” and “the input data tensor is transformed from the input bit precision to one of the processor bit precisions” in paragraphs 55 and 57, applicant’s specification does not mention, much less define what is meant by the newly-added “based on the determination of the one of the one or more processor precisions” claim language. Therefore, “based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input bit precision to the determined one of the one or more processor bit precisions”, under the BRI, in light of the specification, is transforming or adapting input data to a processor’s or processing element’s bit precision or bit width) (see, e.g., paragraphs 164-165, 167, 170, 237, 242 and 278, “The neurons of the ANN are implemented in the PE, … The processing element, generally referenced 450, comprises an input data representation circuit 452 … transformation/rounding circuit 456”, “This circuit is operative to transform the representation of the input data … from integer to floating point (FP) format”, “circuit 456 functions to perform rounding of the product before input to the accumulator.” [i.e., transform the input data from an input precision - integer to the processor’s floating point precision], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., input precision], “perform model level optimizations … model adjustments for performance, and numerical adaptations to different bit widths [i.e., bit precisions] … allocates and assigns physical resources (e.g., compute and memory elements, etc.) [i.e., including the NN processor] … perform bit exact numerical emulation of the NN processor”, “the NN processor is able to operate at any desired granularity of any subset of the input” [i.e., based on the NN processor’s bit width/precision, adapting/transforming input data granularity], “the flexible processing granularity of the NN processor and related memory … is shown in FIG. 23. … leveraging the data pipeline to … operate at low input domain granularity. Consider the example input tensor 932 including input data 938 … One of the network layers then applies an NN operation 934 to the input data”, “input tensor 932 including input data” [i.e., the input data is in input tensor 932], “The SCALE factor is used to represent the size in bytes (i.e. the granularity) of each element” [i.e., based on the NN processor’s bit widths/precisions, transforming granularity/byte size/precision of input data tensor 932 to one of the NN processor’s bit widths/precisions]);
dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 3 – depicting input data 347 that is divided into blocks 346 that conform to/are allocated to subclusters 384 and paragraphs 109, 207, 237 and 265, “in an example NN processor embodiment [i.e., a neural network processor], a PE [processing element/processor] comprises P=16 neurons, a subcluster comprises N=64 PEs [processing elements] … and the NN core comprises L=8 clusters” [i.e., P/number of neurons and N/subcluster PEs are processor dimensions, and L/number of clusters is a feature dimension of the NN processor], “ANN input data 347 enters shared L3 memory, is read from allocated memory blocks, processed by the PEs [processing elements] in one or more subclusters, output to neighboring memory blocks”, “adjustments for performance, and numerical adaptations to different bit widths” [i.e., adjustments/adaptions conforming to one of the feature dimensions of the NN processor – number of clusters, bit widths], “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., input data/tensor is allocated/divided into blocks conforming to subclusters/number of clusters/one of the feature dimensions of the NN processor]);
providing each of the plurality of blocks to one of the plurality of neural cores (see, e.g., FIG. 3 – showing providing each of the blocks 346 of data 347 as input 341 to subclusters 384 that include PEs/processing elements [i.e., neural cores] and paragraphs 106, 108 and 207, “NN processing system comprising one or more NN processing cores”, “The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE)”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data … ANN input data 347 … is read from allocated memory blocks, processed by the PEs in one or more subclusters” [i.e., providing the blocks 346 to one of the neural cores]);
computing, by the plurality of neural cores, output of one or more neural network layers (see, e.g., FIG. 3 – showing subclusters 384 including processing elements/PEs [i.e., neural cores] that compute output 343 and paragraphs 108, 121 and 207, “The NN processing engine or core 60 comprises several … processing element[s] (PE)”, the computation units (i.e. PEs, subclusters, clusters, etc.) allows the NN core to handle numerous types of ANNs”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data and generate outputs 343 … ANN input data 347 … is … processed by the PEs in one or more subclusters, output to neighboring memory blocks, and after traversing through the various layers in the ANN [artificial neural network] is ultimately output as ANN output data 349” [i.e., computing output of layers of the ANN/neural network]).
Although Baum substantially discloses the claimed invention, Baum is not relied on to explicitly disclose input data … having feature dimensions at an input bit precision.
In the same field, analogous art Esser teaches input data … having feature dimensions at an input bit precision (see, e.g., FIG. 1 – depicting “Input” into “layers of a convolutional network” with “partitioning the feature dimension) at the same location (partitioning the spatial dimensions).” and pages 11442-11444, “Neurons within a layer are arranged in two spatial dimensions … and one feature dimension, … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., feature dimensions], “Network inputs are typically represented with multibit channels [for example, eight-bit red, green, and blue (RGB) channels]. … each bit represents a different value” [i.e., input data having input bit precision]).
Alternatively, Esser also teaches feature dimensions at one or more processor bit precisions (see, e.g., pages 11441 and 11443, “we demonstrate that neuromorphic computing … can implement deep convolution networks … This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors [i.e., neural network processors] … convolutional networks typically use high precision (≥32-bit) neurons and synapses … neuromorphic designs can use one-bit spikes to provide event-based computation … and can use low-precision synapses” [i.e., one or more processor bit precision], “Neurons within a layer are arranged in two spatial dimensions … and one feature dimension, … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., feature dimensions]) and
dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 1 – showing “layers of a convolutional network” that “designate neurons (individual boxes) belonging to the same group (partitioning the feature dimension)” [i.e., dividing/partitioning input data matrix/tensor into blocks conforming to feature dimensions] and pages 11441 and 11443, “neuromorphic processors [i.e., neural network processors] … neuromorphic systems can take advantage of blockwise connectivity”, “Connectivity between neurons follows a blockwise scheme: each neuron can connect to one input line of any core in the system”, “x ={xi,j,f} are the neuron’s input pixels or neurons, w = {xi,j,f} are the filter weights, i, j are over the topographic dimensions, and f is over the feature dimension or input channels. … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., each block of the input data matrix x/tensor conforms to one of the feature dimensions of the neuromorphic processor/core]).
Baum and Esser are analogous art because they are both directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores” and “a network of neurosynaptic cores” (see, e.g., Baum, FIG. 4 and paragraph 106, and Esser, FIG. 1 and pages 11441-11443). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Esser with Baum to provide a “neuromorphic computing … chip architecture based on spiking neurons, low precision synapses, and a scalable communication network” to “implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech” (See, e.g., Esser, page 11441). Doing so would have allowed Baum to use Esser’s approach to allow “the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors” (i.e., neural network processors) and to achieve “unprecedented energy-efficiency” in “neuromorphic systems [that] can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores” (i.e., a plurality of neural cores), as suggested by Esser (See, e.g., See, e.g., Esser page 11441). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

With respect to independent claim 12, Baum discloses the invention as claimed including a system comprising:
a neural network processor comprising a plurality of neural cores (see, e.g., paragraphs 18, 98, 106 and 108, “a system and method of accessing multi-dimensional data in memory … applicable to neural network (NN) processing engines [i.e., a plurality of neural cores] adapted to implement artificial neural networks (ANNs).”, “Neural Network (NN) Processing Core … the neural network (NN) processor comprises a plurality of basic computation units”, “NN processing system comprising one or more NN processing cores” [i.e., a neural network processor including neural cores], “The NN processing engine or core 60 comprises several hierarchical computation units”), 
the neural network processor having one or more processor precisions per activation (see, e.g., paragraphs 108, 118, 160 and 170, “The NN processing engine or core 60 comprises … a plurality of activation function circuits”, “any number of activation functions 80 and layer controllers 82 may be implemented in the cluster level or in any other level depending on the design goals and particular implementation of the NN processor.” [i.e., particular processor goals/implementations/precisions per activation function], “computation units that are organized into various aggregation levels or hierarchical levels, such as … NN cores … compute elements that are configured to address the special nature of the computational needs of ANNs … flexibility of number representation, including integer and floating point as well as different bit widths”, “activation functions are aggregated … and … implemented … at the cluster level”, “a higher precision multiplication (e.g., 16-bit) is performed by combining four low precision (e.g., 8-bit) multipliers to generate a high (or double) precision (e.g., 16-bit) product.” [i.e., the processor is organized/configured for processor bit widths/precisions]),
the neural network processor configured to accept data having a feature dimension (as indicated above, a “feature dimension”, under the BRI, in light of the specification, is a property, extension or measurement of a part or characteristic, such as a processor bit precision or bit width) (see, e.g., FIG. 8 – depicting step 820 to “RECEIVE SIZE INFORMATION FOR EACH DIMENSION OF THE DATA TENSOR” and paragraphs 118, 261 and 265, “computation units that are organized into … NN cores … configured to address the … flexibility of number representation, including integer and floating point as well as different bit widths” [i.e., the processor is organized/configured to accept data having a feature dimension/bit width/precision], “in convolutional neural networks, data arrays of two, three or more dimensions are stored in memory.” [i.e., a dimension], “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., the input data tensor has a feature dimension]);
a transformation circuit coupled to the neural network processor (see, e.g., FIGs. 4 and 7A– depicting NN processor 104 coupled to an NN core 102 and processing element 450 including transformation/rounding circuit 456, and paragraphs 106 and 164, “The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 optionally coupled to one or more additional internal or external NN processors 104” [i.e., a circuit coupled to the NN processor], “the PE is the most basic compute element of the NN processor. … The processing element, generally referenced 450, comprises … transformation/rounding circuit 456,” [i.e., a transformation circuit coupled to NN processor]), and adapted to:
receive an input data tensor having an input precision (see, e.g., FIG. 8 – depicting step 820 to “RECEIVE SIZE INFORMATION FOR EACH DIMENSION OF THE DATA TENSOR” and paragraphs 21, 170 and 242, “scanning tensor data stored in a memory” [i.e., receiving/scanning an input data tensor], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., input precision], “input tensor 932 including input data 938” [i.e., input data tensor 932]);
determine one of the one or more processor bit precisions (see, e.g., FIGs. 6 and 7A – depicting multipliers 142, 454 that are components of processing elements 140, 450 [i.e., processor] and paragraphs 118, 170 and 236, “computation units that are organized into various aggregation levels or hierarchical levels, such as PEs, … NN cores … features of the compute fabric include: … flexibility of number representation, including integer and floating point as well as different bit widths”, “the quad multiplier of the PE … The quad multiplier, generally referenced 870, comprises four lower precision (e.g., 8-bit) multipliers” [i.e., processing element/PE/processor and NE cores/processors have determined bit widths/precisions], “translator 772 functions to receive the user model and generate an intermediate format of the model. The optimizer 774 functions to perform model level optimizations, post-translation model adjustments for performance, and numerical adaptations to different bit widths [i.e., determine different bit widths/precisions for a model to be executed]. The resource allocator 778 allocates and assigns physical resources (e.g., compute and memory elements, etc.) in accordance with the intermediate model” [i.e., allocator determines/assigns physical resources such as compute elements/processors based on a processor bit width/precision]);
based on the determination of the one of the one or more processor precisions, transform the input data tensor from the input precision to the determined one of the one or more processor precisions (as indicated above regarding similar language in claim 1, “based on the determination of the one of the one or more processor precisions, transform the input data tensor from the input precision to the determined one of the one or more processor precisions”, under the BRI, in light of the specification, is transforming or adapting input data to a processor’s or processing element’s precision or bit width) (see, e.g., paragraphs (see, e.g., paragraphs 164-165, 167, 170, 237, 242 and 278, “The neurons of the ANN are implemented in the PE, … The processing element, generally referenced 450, comprises an input data representation circuit 452 … transformation/rounding circuit 456”, “This circuit is operative to transform the representation of the input data … from integer to floating point (FP) format”, “circuit 456 functions to perform rounding of the product before input to the accumulator.” [i.e., transform the input data from an input precision - integer to the processor’s floating point precision], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., input precision], “perform model level optimizations … model adjustments for performance, and numerical adaptations to different bit widths [i.e., adapt/transform to different precisions] … allocates and assigns physical resources (e.g., compute and memory elements, etc.) [i.e., including the NN processor] … perform bit exact numerical emulation of the NN processor”, “the NN processor is able to operate at any desired granularity of any subset of the input” [i.e., based on the NN processor’s precision, adapting/transforming input data granularity], “the flexible processing granularity of the NN processor and related memory … is shown in FIG. 23. … leveraging the data pipeline to … operate at low input domain granularity. Consider the example input tensor 932 including input data 938 … One of the network layers then applies an NN operation 934 to the input data”, “input tensor 932 including input data” [i.e., the input data is in input tensor 932], “The SCALE factor is used to represent the size in bytes (i.e. the granularity) of each element” [i.e., based on the NN processor’s bit widths/precisions, transforming granularity/byte size/precision of input data tensor 932 to one of the NN processor’s bit widths/precisions]);
divide the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 3 – depicting input data 347 that is divided into blocks 346 that conform to/are allocated to subclusters 384 and paragraphs 109, 207, 237 and 265, “in an example NN processor embodiment [i.e., a neural network processor], a PE [processing element/processor] comprises P=16 neurons, a subcluster comprises N=64 PEs [processing elements] … and the NN core comprises L=8 clusters” [i.e., P/number of neurons and N/subcluster PEs are processor dimensions, and L/number of clusters is a feature dimension of the NN processor], “ANN input data 347 enters shared L3 memory, is read from allocated memory blocks, processed by the PEs [processing elements] in one or more subclusters, output to neighboring memory blocks”, “adjustments for performance, and numerical adaptations to different bit widths” [i.e., adjustments/adaptions conforming to one of the feature dimensions of the NN processor – number of clusters, bit widths], “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., input data/tensor is allocated/divided into blocks conforming to subclusters/number of clusters/one of the feature dimensions of the NN processor]);
provide each of the plurality of blocks to one of the plurality of neural cores (see, e.g., FIG. 3 – showing providing each of the blocks 346 of data 347 as input 341 to subclusters 384 that include PEs/processing elements [i.e., neural cores] and paragraphs 106, 108 and 207, “NN processing system comprising one or more NN processing cores”, “The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE)”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data … ANN input data 347 … is read from allocated memory blocks, processed by the PEs in one or more subclusters” [i.e., providing the blocks 346 to one of the neural cores]);
and wherein the neural network processor is adapted to compute, by the plurality of neural cores, output of one or more neural network layers (see, e.g., FIG. 3 – showing subclusters 384 including processing elements/PEs [i.e., neural cores] that compute output 343 and paragraphs 108, 121 and 207, “The NN processing engine or core 60 comprises several … processing element[s] (PE)”, the computation units (i.e. PEs, subclusters, clusters, etc.) allows the NN core to handle numerous types of ANNs”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data and generate outputs 343 … ANN input data 347 … is … processed by the PEs in one or more subclusters, output to neighboring memory blocks, and after traversing through the various layers in the ANN [artificial neural network] is ultimately output as ANN output data 349” [i.e., computing output of layers of the ANN/neural network]).
Although Baum substantially discloses the claimed invention, Baum is not relied on to explicitly disclose receive an input data tensor having an input precision per channel at one or more features.
In the same field, analogous art Esser teaches receive an input data tensor having an input precision per channel at one or more features (see, e.g., pages 11441 and 11443-11444, “For input data, we use a first layer to transform multivalued, multichannel input into binary channels”, “x ={xi,j,f} are the neuron’s input pixels … and f is over the feature dimension or input channels.” [i.e., receive input data matrix x/tensor having input per channel at one or more features], “Network inputs are typically represented with multibit channels [for example, eight-bit red, green, and blue (RGB) channels].” [i.e., receiving input data having an input 8-bit precision per channel at RGB color features]).
Alternatively, Esser also teaches divide the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 1 – showing “layers of a convolutional network” that “designate neurons (individual boxes) belonging to the same group (partitioning the feature dimension)” [i.e., dividing/partitioning input data matrix/tensor into blocks conforming to feature dimensions] and pages 11441 and 11443, “neuromorphic processors [i.e., neural network processors] … neuromorphic systems can take advantage of blockwise connectivity”, “Connectivity between neurons follows a blockwise scheme: each neuron can connect to one input line of any core in the system”, “x ={xi,j,f} are the neuron’s input pixels or neurons, w = {xi,j,f} are the filter weights, i, j are over the topographic dimensions, and f is over the feature dimension or input channels. … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., each block of the input data matrix x/tensor conforms to one of the feature dimensions of the neuromorphic processor/core]).
Baum and Esser are analogous art because they are both directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores” and “a network of neurosynaptic cores” (see, e.g., Baum, FIG. 4 and paragraph 106, and Esser, FIG. 1 and pages 11441-11443). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Esser with Baum to provide a “neuromorphic computing … chip architecture based on spiking neurons, low precision synapses, and a scalable communication network” to “implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech” (See, e.g., Esser, page 11441). Doing so would have allowed Baum to use Esser’s approach to allow “the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors” (i.e., neural network processors) and to achieve “unprecedented energy-efficiency” in “neuromorphic systems [that] can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores” (i.e., a plurality of neural cores), as suggested by Esser (See, e.g., See, e.g., Esser page 11441). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claims 2 and 13, as discussed above, Baum in view of Esser teaches the method of claim 1 and the system of claim 12.
Baum further discloses the input data tensor (see, e.g., paragraph 242, “input tensor 932 including input data 938 that can be located at the beginning of or at any arbitrary point in the network. One of the network layers then applies an NN operation 934 to the input data” [i.e., the input data tensor]) and an image (see, e.g., paragraph 4, “Artificial neural networks (ANNs) are computing systems inspired by the biological neural networks” and “in image recognition, they might learn to identify images” [i.e., image recognition to identify input images]).
Although Baum substantially discloses the claimed invention, Baum is not relied on to explicitly disclose wherein the input data … comprises an image.
In the same field analogous art Esser teaches wherein the input data … comprises an image (see, e.g., Table 2 – listing input datasets that include images of “Natural and manufactured objects in their environment”, “Single digits of house addresses from Google’s Street View”, “traffic signs in multiple environments”, “corporate logos in their environment” and “Flickr-Logos32 [that] are cropped and/or downsampled from larger images.” and FIG. 3 – showing that each row of input is “an example image from CIFAR10” and pages 11442-11444, “A deep convolutional network is a multilayer feedforward neural network, whose input is typically image-like”, “we use a binary representation scheme for data throughout the network … By configuring TrueNorth neurons … V(t) is 0 at the beginning of each image presentation, allowing for one classification per tick using pipelining. Network input. Network inputs are typically represented with multibit channels [for example, eight-bit red, green, and blue (RGB) channels].” [i.e., input data includes an image]).
Baum and Esser are analogous art because they are both directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores” and “a network of neurosynaptic cores” (see, e.g., Baum, FIG. 4 and paragraph 106, and Esser, FIG. 1 and pages 11441-11443). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Esser with Baum to provide a “neuromorphic computing … chip architecture based on spiking neurons, low precision synapses, and a scalable communication network” to “implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech” (See, e.g., Esser, page 11441). Doing so would have allowed Baum to use Esser’s approach to allow “the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors” (i.e., neural network processors) and to achieve “unprecedented energy-efficiency” in “neuromorphic systems [that] can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores” (i.e., a plurality of neural cores), as suggested by Esser (See, e.g., See, e.g., Esser page 11441). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claim 4, as discussed above, Baum in view of Esser teaches the method of claim 1.
Although Baum substantially discloses the claimed invention, Baum is not relied on to explicitly disclose the neural network processor being configured for a predetermined number of features, wherein transforming the input data tensor comprises dividing input features into a plurality of feature sets, each having less than or equal to the predetermined number of features.
In the same field analogous art Esser teaches the neural network processor being configured for a predetermined number of features (see, e.g., page 11443, “Neurons within a layer are arranged in two spatial dimensions, corresponding to shifts in the convolution filter, and one feature dimension, corresponding to different filters. … x ={xi,j,f} are the neuron’s input pixels … and f is over the feature dimension or input channels.” [i.e., neurons arranged/configured for f feature dimensions, a predetermined number]), wherein transforming the input data tensor comprises dividing input features into a plurality of feature sets (see, e.g., page 11443, “Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension, where each group applies its filters to a different, nonoverlapping, equally sized subset of layer input features.” [i.e., partitioning/dividing input features into a plurality of feature sets/subsets], each having less than or equal to the predetermined number of features (see, e.g., page 11443, “Layers are designed such that the total filter size (rows × columns × features) of each group is less than or equal to the number of input lines available per core, and the number of output features is less than or equal to the number of neurons per core. This arrangement allows one group’s features, filters, and filter support region to be implemented using one core’s neurons, synapses and input lines” [i.e., each group/subset has less than or the predetermined number of input features]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Esser with Baum to provide a “neuromorphic computing … chip architecture based on spiking neurons, low precision synapses, and a scalable communication network” to “implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech” (See, e.g., Esser, page 11441). Doing so would have allowed Baum to use Esser’s approach to allow “the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors” (i.e., neural network processors) and to achieve “unprecedented energy-efficiency” in “neuromorphic systems [that] can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores” (i.e., a plurality of neural cores), as suggested by Esser (See, e.g., See, e.g., Esser page 11441). 

Regarding claim 15, as discussed above, Baum in view of Esser teaches the system of claim 12.
Although Baum substantially discloses the claimed invention, Baum is not relied on to explicitly disclose wherein transforming the input data tensor comprises dividing each channel into a plurality of values having precisions less than or equal to the one of the processor precisions.
In the same field analogous art Esser teaches wherein transforming the input data tensor comprises dividing each channel into a plurality of values (see, e.g., pages 11441 and 11443-11444, “For input data, we use a first layer to transform multivalued, multichannel input into binary channels”, “x ={xi,j,f} are the neuron’s input pixels or neurons, w = {xi,j,f} are the filter weights, i, j are over the topographic dimensions, and f is over the feature dimension or input channels. … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension”, “Network inputs are typically represented with multibit channels [for example, eight-bit red, green, and blue (RGB) channels].” [i.e., transforming input data tensor x by partitioning/dividing each channel into a plurality of values]) having precisions less than or equal to the one of the processor precisions (see, e.g., page 11443, “Layers are designed such that the total filter size (rows × columns × features) of each group is less than or equal to the number of input lines available per core, and the number of output features is less than or equal to the number of neurons per core. This arrangement allows one group’s features, filters, and filter support region to be implemented using one core’s neurons, synapses and input lines” [i.e., each channel has precisions less than or equal to the core’s/processor’s precision]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Esser with Baum to provide a “neuromorphic computing … chip architecture based on spiking neurons, low precision synapses, and a scalable communication network” to “implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech” (See, e.g., Esser, page 11441). Doing so would have allowed Baum to use Esser’s approach to allow “the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors” (i.e., neural network processors) and to achieve “unprecedented energy-efficiency” in “neuromorphic systems [that] can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores” (i.e., a plurality of neural cores), as suggested by Esser (See, e.g., See, e.g., Esser page 11441). 

With respect to independent claim 23, Baum discloses the invention as claimed including a system comprising:
a neural network processor comprising a plurality of neural cores (see, e.g., paragraphs 18, 98, 106 and 108, “a system and method of accessing multi-dimensional data in memory … applicable to neural network (NN) processing engines [i.e., a plurality of neural cores] adapted to implement artificial neural networks (ANNs).”, “Neural Network (NN) Processing Core … the neural network (NN) processor comprises a plurality of basic computation units”, “NN processing system comprising one or more NN processing cores” [i.e., a neural network processor including neural cores], “The NN processing engine or core 60 comprises several hierarchical computation units”), the neural network processor being configured for one or more feature dimensions at one or more processor bit precisions (as indicated above, “one or more feature dimensions”, under the BRI, is one or more properties, extensions or measurements of a part or characteristic, such as a processor bit precision or bit width)(see, e.g., paragraphs 23, 117-118 and 261, “a circuit for accessing multidimensional data”, “The NN processor … combines several features … to handle many types of neural networks” [i.e., the neural network processor includes one or more features], “computation units that are organized into various aggregation levels or hierarchical levels, such as … NN cores … compute elements that are configured to address the special nature of the computational needs of ANNs … flexibility of number representation, including integer and floating point as well as different bit widths” [i.e., the processor is organized/configured for feature dimensions at processor bit widths/precisions], “in convolutional neural networks, data arrays of two, three or more dimensions are stored in memory.” [i.e., one or more dimensions]); and
a transformation circuit coupled to the neural network processor (see, e.g., FIGs. 4 and 7A– depicting NN processor 104 coupled to an NN core 102 and processing element 450 including transformation/rounding circuit 456, and paragraphs 106 and 164, “The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 optionally coupled to one or more additional internal or external NN processors 104” [i.e., a circuit coupled to the NN processor], “the PE is the most basic compute element of the NN processor. … The processing element, generally referenced 450, comprises … transformation/rounding circuit 456,” [i.e., a transformation circuit coupled to NN processor]), and adapted to:
receive an input data tensor having … dimensions at an input bit precision (see, e.g., FIG. 8 – depicting step 820 to “RECEIVE SIZE INFORMATION FOR EACH DIMENSION OF THE DATA TENSOR” and paragraphs 21, 170 and 242, “scanning tensor data stored in a memory” [i.e., receiving/scanning an input data tensor], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., an input bit precision], “input tensor 932 including input data 938” [i.e., receive input data tensor 932]);
determine one of the one or more processor bit precisions (see, e.g., FIGs. 6 and 7A – depicting multipliers 142, 454 that are components of processing elements 140, 450 [i.e., processor] and paragraphs 118, 170 and 236, “computation units that are organized into various aggregation levels or hierarchical levels, such as PEs, … NN cores … features of the compute fabric include: … flexibility of number representation, including integer and floating point as well as different bit widths”, “the quad multiplier of the PE … The quad multiplier, generally referenced 870, comprises four lower precision (e.g., 8-bit) multipliers” [i.e., processing element/PE/processor and NE cores/processors have determined bit widths/precisions], “translator 772 functions to receive the user model and generate an intermediate format of the model. The optimizer 774 functions to perform model level optimizations, post-translation model adjustments for performance, and numerical adaptations to different bit widths [i.e., determine different bit widths/precisions for a model to be executed]. The resource allocator 778 allocates and assigns physical resources (e.g., compute and memory elements, etc.) in accordance with the intermediate model” [i.e., allocator determines/assigns physical resources such as compute elements/processors based on a processor bit width/precision]);
based on the determination of the one of the one or more processor precisions, transform the input data tensor from the input bit precision to the determined one of the one or more of the processor bit precisions (as indicated above, “based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input bit precision to the determined one of the one or more processor bit precisions”, under the BRI, in light of the specification, is transforming or adapting input data to a processor’s or processing element’s bit precision or bit width) (see, e.g., paragraphs 164-165, 167, 170, 237, 242 and 278, “The neurons of the ANN are implemented in the PE, … The processing element, generally referenced 450, comprises an input data representation circuit 452 … transformation/rounding circuit 456”, “This circuit is operative to transform the representation of the input data … from integer to floating point (FP) format”, “circuit 456 functions to perform rounding of the product before input to the accumulator.” [i.e., transform the input data from an input precision - integer to the processor’s floating point precision], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., input precision], “perform model level optimizations … model adjustments for performance, and numerical adaptations to different bit widths [i.e., bit precisions] … allocates and assigns physical resources (e.g., compute and memory elements, etc.) [i.e., including the NN processor] … perform bit exact numerical emulation of the NN processor”, “the NN processor is able to operate at any desired granularity of any subset of the input” [i.e., based on the NN processor’s bit width/precision, adapting/transforming input data granularity], “the flexible processing granularity of the NN processor and related memory … is shown in FIG. 23. … leveraging the data pipeline to … operate at low input domain granularity. Consider the example input tensor 932 including input data 938 … One of the network layers then applies an NN operation 934 to the input data”, “input tensor 932 including input data” [i.e., the input data is in input tensor 932], “The SCALE factor is used to represent the size in bytes (i.e. the granularity) of each element” [i.e., based on the NN processor’s bit widths/precisions, transforming granularity/byte size/precision of input data tensor 932 to one of the NN processor’s bit widths/precisions]));
divide the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 3 – depicting input data 347 that is divided into blocks 346 that conform to/are allocated to subclusters 384 and paragraphs 109, 207, 237 and 265, “in an example NN processor embodiment [i.e., a neural network processor], a PE [processing element/processor] comprises P=16 neurons, a subcluster comprises N=64 PEs [processing elements] … and the NN core comprises L=8 clusters” [i.e., P/number of neurons and N/subcluster PEs are processor dimensions, and L/number of clusters is a feature dimension of the NN processor], “ANN input data 347 enters shared L3 memory, is read from allocated memory blocks, processed by the PEs [processing elements] in one or more subclusters, output to neighboring memory blocks”, “adjustments for performance, and numerical adaptations to different bit widths” [i.e., adjustments/adaptions conforming to one of the feature dimensions of the NN processor – number of clusters, bit widths], “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., input data/tensor is allocated/divided into blocks conforming to subclusters/number of clusters/one of the feature dimensions of the NN processor]); and
provide each of the plurality of blocks to one of the plurality of neural cores (see, e.g., FIG. 3 – showing providing each of the blocks 346 of data 347 as input 341 to subclusters 384 that include PEs/processing elements [i.e., neural cores] and paragraphs 106, 108 and 207, “NN processing system comprising one or more NN processing cores”, “The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE)”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data … ANN input data 347 … is read from allocated memory blocks, processed by the PEs in one or more subclusters” [i.e., provide the blocks 346 to one of the neural cores]);
and wherein
the plurality of neural cores is adapted to compute the output of one or more neural network layers (see, e.g., FIG. 3 – showing subclusters 384 including processing elements/PEs [i.e., neural cores] that compute output 343 and paragraphs 108, 121 and 207, “The NN processing engine or core 60 comprises several … processing element[s] (PE)”, the computation units (i.e. PEs, subclusters, clusters, etc.) allows the NN core to handle numerous types of ANNs”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data and generate outputs 343 … ANN input data 347 … is … processed by the PEs in one or more subclusters, output to neighboring memory blocks, and after traversing through the various layers in the ANN [artificial neural network] is ultimately output as ANN output data 349” [i.e., PEs/cores adapted to compute output of layers of the ANN/neural network]).
Although Baum substantially discloses the claimed invention, Baum is not relied on to explicitly disclose input data … having … dimensions at an input bit precision.
In the same field analogous art Esser teaches input data … having … dimensions at an input bit precision (see, e.g., FIG. 1 – depicting “Input” into “layers of a convolutional network” with “partitioning the feature dimension) at the same location (partitioning the spatial dimensions).” and pages 11442-11444, “Neurons within a layer are arranged in two spatial dimensions … and one feature dimension, … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., feature dimensions], “Network inputs are typically represented with multibit channels [for example, eight-bit red, green, and blue (RGB) channels]. … each bit represents a different value” [i.e., input data having input bit precision]).
Alternatively, Esser also teaches divide the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 1 – showing “layers of a convolutional network” that “designate neurons (individual boxes) belonging to the same group (partitioning the feature dimension)” [i.e., dividing/partitioning input data matrix/tensor into blocks conforming to feature dimensions] and pages 11441 and 11443, “neuromorphic processors [i.e., neural network processors] … neuromorphic systems can take advantage of blockwise connectivity”, “Connectivity between neurons follows a blockwise scheme: each neuron can connect to one input line of any core in the system”, “x ={xi,j,f} are the neuron’s input pixels or neurons, w = {xi,j,f} are the filter weights, i, j are over the topographic dimensions, and f is over the feature dimension or input channels. … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., each block of the input data matrix x/tensor conforms to one of the feature dimensions of the neuromorphic processor/core]).
Baum and Esser are analogous art because they are both directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores” and “a network of neurosynaptic cores” (see, e.g., Baum, FIG. 4 and paragraph 106, and Esser, FIG. 1 and pages 11441-11443). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Esser with Baum to provide a “neuromorphic computing … chip architecture based on spiking neurons, low precision synapses, and a scalable communication network” to “implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech” (See, e.g., Esser, page 11441). Doing so would have allowed Baum to use Esser’s approach to allow “the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors” (i.e., neural network processors) and to achieve “unprecedented energy-efficiency” in “neuromorphic systems [that] can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores” (i.e., a plurality of neural cores), as suggested by Esser (See, e.g., See, e.g., Esser page 11441). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

With respect to independent claim 24, Baum discloses the invention as claimed including a method comprising:
receiving an input data tensor (see, e.g., FIG. 8 – depicting step 820 to “RECEIVE SIZE INFORMATION FOR EACH DIMENSION OF THE DATA TENSOR” and paragraphs 18, 21 and 242, “method of accessing multi-dimensional data in memory … applicable to neural network (NN) processing engines adapted to implement artificial neural networks (ANNs).”, “a method of scanning tensor data stored in a memory” [i.e., a method for receiving/scanning an input data tensor], “input tensor 932 including input data 938” [i.e., input data tensor 932]), the input data tensor having an input precision (see, e.g., FIG. 8 – depicting step 820 to “RECEIVE SIZE INFORMATION FOR EACH DIMENSION OF THE DATA TENSOR” and paragraphs 21, 170 and 242, “scanning tensor data stored in a memory” [i.e., receiving/scanning an input data tensor], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., input precision], “input tensor 932 including input data 938” [i.e., input data tensor 932]); 
determining a processor bit precision of a neural network processor (see, e.g., FIGs. 6 and 7A – depicting multipliers 142, 454 that are components of processing elements 140, 450 [i.e., processor] and paragraphs 118, 170 and 236, “computation units that are organized into various aggregation levels or hierarchical levels, such as PEs, … NN cores [i.e., neural network processors] … features of the compute fabric include: … flexibility of number representation, including integer and floating point as well as different bit widths”, “the quad multiplier of the PE … The quad multiplier, generally referenced 870, comprises four lower precision (e.g., 8-bit) multipliers” [i.e., processing element/PE/processor and NE cores/processors have determined bit widths/precisions], “translator 772 functions to receive the user model and generate an intermediate format of the model. The optimizer 774 functions to perform model level optimizations, post-translation model adjustments for performance, and numerical adaptations to different bit widths [i.e., determining different bit widths/precisions for a model to be executed]. The resource allocator 778 allocates and assigns physical resources (e.g., compute and memory elements, etc.) in accordance with the intermediate model” [i.e., allocator determines/assigns physical resources such as compute elements/processors based on a NN processor bit width/precision]);
based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input precision to the processor precision of the neural network processor (as indicated above regarding similar language in claim 1, “based on the determination of the one of the one or more processor precisions, transforming the input data tensor from the input precision to the processor precision”, under the BRI, in light of the specification, is transforming or adapting input data to a processor’s or processing element’s precision or bit width) (see, e.g., paragraphs (see, e.g., paragraphs 164-165, 167, 170, 237, 242 and 278, “The neurons of the ANN are implemented in the PE, … The processing element, generally referenced 450, comprises an input data representation circuit 452 … transformation/rounding circuit 456”, “This circuit is operative to transform the representation of the input data … from integer to floating point (FP) format”, “circuit 456 functions to perform rounding of the product before input to the accumulator.” [i.e., transform the input data from an input precision - integer to the processor’s floating point precision], “The input … is a double precision input X made up of two low precision (e.g., 8-bit) values” [i.e., input precision], “perform model level optimizations … model adjustments for performance, and numerical adaptations to different bit widths [i.e., adapt/transform to different precisions] … allocates and assigns physical resources (e.g., compute and memory elements, etc.) [i.e., including the NN processor] … perform bit exact numerical emulation of the NN processor”, “the NN processor is able to operate at any desired granularity of any subset of the input” [i.e., based on the NN processor’s precision, adapting/transforming input data granularity], “the flexible processing granularity of the NN processor and related memory … is shown in FIG. 23. … leveraging the data pipeline to … operate at low input domain granularity. Consider the example input tensor 932 including input data 938 … One of the network layers then applies an NN operation 934 to the input data”, “input tensor 932 including input data” [i.e., the input data is in input tensor 932], “The SCALE factor is used to represent the size in bytes (i.e. the granularity) of each element” [i.e., based on the NN processor’s bit widths/precisions, transforming granularity/byte size/precision of input data tensor 932 to one of the NN processor’s bit widths/precisions]),
the neural network processor having a plurality of neural cores (see, e.g., paragraphs 18, 98, 106 and 108, “a system and method of accessing multi-dimensional data in memory … applicable to neural network (NN) processing engines [i.e., a plurality of neural cores] adapted to implement artificial neural networks (ANNs).”, “Neural Network (NN) Processing Core … the neural network (NN) processor comprises a plurality of basic computation units”, “NN processing system comprising one or more NN processing cores” [i.e., a neural network processor having/including neural cores], “The NN processing engine or core 60 comprises several hierarchical computation units”), 
the neural network processor being configured to accept data having a feature dimension (as indicated above, a “feature dimension”, under the BRI, in light of the specification, is a property, extension or measurement of a part or characteristic, such as a processor bit precision or bit width)(see, e.g., FIG. 8 – depicting step 820 to “RECEIVE SIZE INFORMATION FOR EACH DIMENSION OF THE DATA TENSOR” and paragraphs 118, 261 and 265, “computation units that are organized into … NN cores … configured to address the … flexibility of number representation, including integer and floating point as well as different bit widths” [i.e., the processor is organized/configured to accept data having a feature dimension/bit width/precision], “in convolutional neural networks, data arrays of two, three or more dimensions are stored in memory.” [i.e., a dimension], “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., the input data tensor has a feature dimension]);
dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 3 – depicting input data 347 that is divided into blocks 346 that conform to/are allocated to subclusters 384 and paragraphs 109, 207, 237 and 265, “in an example NN processor embodiment [i.e., a neural network processor], a PE [processing element/processor] comprises P=16 neurons, a subcluster comprises N=64 PEs [processing elements] … and the NN core comprises L=8 clusters” [i.e., P/number of neurons and N/subcluster PEs are processor dimensions, and L/number of clusters is a feature dimension of the NN processor], “ANN input data 347 enters shared L3 memory, is read from allocated memory blocks, processed by the PEs [processing elements] in one or more subclusters, output to neighboring memory blocks”, “adjustments for performance, and numerical adaptations to different bit widths” [i.e., adjustments/adaptions conforming to one of the feature dimensions of the NN processor – number of clusters, bit widths], “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., input data/tensor is allocated/divided into blocks conforming to subclusters/number of clusters/one of the feature dimensions of the NN processor]);
providing each of the plurality of blocks to one of the plurality of neural cores (as indicated above, “one of the plurality of neural cores” has been interpreted as “one of a plurality of neural cores)(see, e.g., FIG. 3 – showing providing each of the blocks 346 of data 347 as input 341 to subclusters 384 that include PEs/processing elements [i.e., neural cores] and paragraphs 106, 108 and 207, “NN processing system comprising one or more NN processing cores”, “The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE)”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data … ANN input data 347 … is read from allocated memory blocks, processed by the PEs in one or more subclusters” [i.e., providing the blocks 346 to one of the neural cores]),
and wherein the neural network processor is adapted to compute, by the plurality of neural cores, output of one or more neural network layers (see, e.g., FIG. 3 – showing subclusters 384 including processing elements/PEs [i.e., neural cores] that compute output 343 and paragraphs 108, 121 and 207, “The NN processing engine or core 60 comprises several … processing element[s] (PE)”, the computation units (i.e. PEs, subclusters, clusters, etc.) allows the NN core to handle numerous types of ANNs”, “Input data 341 to a subcluster is received from an allocated memory block 346 from a shared portion of L3 memory. The PEs within the subcluster process the … input data and generate outputs 343 … ANN input data 347 … is … processed by the PEs in one or more subclusters, output to neighboring memory blocks, and after traversing through the various layers in the ANN [artificial neural network] is ultimately output as ANN output data 349” [i.e., computing output of layers of the ANN/neural network]).
Although Baum substantially discloses the claimed invention, Baum is not relied on to explicitly disclose the input data tensor having an input precision per channel at one or more features.
In the same field analogous art Esser teaches the input data tensor having an input precision per channel at one or more features (see, e.g., pages 11441 and 11443-11444, “For input data, we use a first layer to transform multivalued, multichannel input into binary channels”, “x ={xi,j,f} are the neuron’s input pixels … and f is over the feature dimension or input channels.” [i.e., an input data matrix x/tensor having input per channel at one or more features], “Network inputs are typically represented with multibit channels [for example, eight-bit red, green, and blue (RGB) channels].” [i.e., receiving input data having an input 8-bit precision per channel at RGB color features]).
Alternatively, Esser also teaches dividing the input data tensor into a plurality of blocks, each block conforming to one of the feature dimensions of the neural network processor (see, e.g., FIG. 1 – showing “layers of a convolutional network” that “designate neurons (individual boxes) belonging to the same group (partitioning the feature dimension)” [i.e., dividing/partitioning input data matrix/tensor into blocks conforming to feature dimensions] and pages 11441 and 11443, “neuromorphic processors [i.e., neural network processors] … neuromorphic systems can take advantage of blockwise connectivity”, “Connectivity between neurons follows a blockwise scheme: each neuron can connect to one input line of any core in the system”, “x ={xi,j,f} are the neuron’s input pixels or neurons, w = {xi,j,f} are the filter weights, i, j are over the topographic dimensions, and f is over the feature dimension or input channels. … Network structure is mapped by partitioning each layer into 1 or more equally sized groups along the feature dimension” [i.e., each block of the input data matrix x/tensor conforms to one of the feature dimensions of the neuromorphic processor/core]).
Baum and Esser are analogous art because they are both directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores” and “a network of neurosynaptic cores” (see, e.g., Baum, FIG. 4 and paragraph 106, and Esser, FIG. 1 and pages 11441-11443). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Esser with Baum to provide a “neuromorphic computing … chip architecture based on spiking neurons, low precision synapses, and a scalable communication network” to “implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech” (See, e.g., Esser, page 11441). Doing so would have allowed Baum to use Esser’s approach to allow “the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors” (i.e., neural network processors) and to achieve “unprecedented energy-efficiency” in “neuromorphic systems [that] can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores” (i.e., a plurality of neural cores), as suggested by Esser (See, e.g., See, e.g., Esser page 11441). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Baum in view of Esser as applied to claims 1 and 12 above, and further in view of non-patent literature Chung, Jaeyong, et al. ("Insight: A neuromorphic computing system for evaluation of large neural networks." arXiv preprint arXiv:1508.01008 (2015): 1-10, hereinafter “Chung”).
Regarding claims 3 and 14, as discussed above, Baum in view of Esser teaches the method of claim 1 and the system of claim 12.
Baum further discloses transforming the input data tensor (see, e.g., paragraphs 164-165, 167 and 242, “transformation/rounding circuit 456”, “This circuit is operative to transform the representation of the input data and/or weights from integer to floating point (FP) format and vice versa”, “circuit 456 functions to perform rounding of the product before input to the accumulator.” [i.e., transform the input data from the input precision - integer to a floating point precision], “the flexible processing granularity of the NN processor and related memory … is shown in FIG. 23. … leveraging the data pipeline to … operate at low input domain granularity. Consider the example input tensor 932 including input data 938 that can be located at the beginning of or at any arbitrary point in the network. One of the network layers then applies an NN operation 934 to the input data” [i.e., transforming input data tensor 932 from a low granularity by applying an NN operation]).
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach wherein transforming the input data … comprises removing least significant bits.
In the same field, analogous art Chung teaches wherein transforming the input data … comprises removing least significant bits (see, e.g., pages 5-6, “The k-input adder has the zero latency, and the 2n-bit result is aligned by the pipeline registers, which truncates the least significant m bits.”, “In our implementation, a number is represented as an n-bit stream where the least significant bit (LSB) comes first. To represent signed real numbers in the n-bit, we use the fixed-point format and let m denote the fractional bit-width … we truncate the least significant m-bits” [i.e., transform the input by truncating/removing the least significant bits]).
Baum, Esser and Chung are analogous art because they are each directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores”, “a network of neurosynaptic cores” and a “neuromorphic computing system” including “custom designed cores” (see, e.g., Baum, FIG. 4 and paragraph 106, Esser, FIG. 1 and pages 11441-11443 and Chung, page 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Baum in view of Esser to incorporate the teachings of Chung to provide “a space-efficient microarchitecture” and a “neuromorphic computing system that stores all the weight parameters on-chip by compressing neural networks.” (See, e.g., Chung, Abstract and pages 1-2). Doing so would have allowed Baum in view of Esser to use Chung’s neuromorphic computing system and “microarchitecture that leverages existing integrated circuit design methodologies” to perform “energy-efficient evaluation of large-scale neural networks” and to “performs the MNIST [Modified National Institute of Standards and Technology] hand-written digit classification with 97.64% accuracy”, as suggested by Chung (See, e.g., Chung, Abstract and page 1). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Baum in view of Esser as applied to claims 1 and 12 above, and further in view of non-patent literature Na, Taesik, et al. ("Speeding up Convolutional Neural Network Training with Dynamic Precision Scaling and Flexible Multiplier-Accumulator.", Proceedings of the 2016 International Symposium on Low Power Electronics and Design. 2016, cited in applicant’s IDS submitted on 10/11/2018, hereinafter “Na”).
Regarding claims 5 and 16, as discussed above, Baum in view of Esser teaches the method of claim 1 and the system of claim 12.
Baum further discloses dividing the input data tensor … the plurality of blocks in one of the feature dimensions of the input data tensor to conform with one of the feature dimensions of the neural network processor (see, e.g., FIG. 3 – depicting input data 347 that is divided into blocks 346 that conform to/are allocated to subclusters 384 and paragraphs 109, 207, 237 and 265, “in an example NN processor embodiment, a PE [processing element] comprises P=16 neurons, a subcluster comprises N=64 PEs [processing elements] … and the NN core comprises L=8 clusters” [i.e., the NN processor dimensions], “ANN input data 347 enters shared L3 memory, is read from allocated memory blocks, processed by the PEs [processing elements] in one or more subclusters, output to neighboring memory blocks”, “adjustments for performance, and numerical adaptations to different bit widths” [i.e., adaptions conforming to one of the feature dimensions/bit widths of the NN processor], “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., feature dimensions of the input data tensor are allocated/divided into blocks conforming to subclusters/one of the feature dimensions of the NN processor]).
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach zero-padding the plurality of blocks in one of the feature dimensions.
In the same field, analogous art Na teaches zero-padding the plurality of blocks in one of the feature dimensions (see, e.g., page 4, section 4.1, “we pad zero or msb of x in front of x if sign _x is 0 or 1 respectively for partial product generation. … We normally pad zero at the end of y and use this for the first encoding”).
Baum, Esser and Na are analogous art because they are directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores”, “a network of neurosynaptic cores” and a “design for deep neural network training” (see, e.g., Baum, FIG. 4 and paragraph 106, Esser, FIG. 1 and pages 11441-11443 and Na, page 5, section 5.1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Baum in view of Esser to incorporate the teachings of Na to provide “a dynamic precision scaling (DPS) algorithm and flexible multiplier-accumulator (MAC) to speed up convolutional neural network training”, a “dynamic precision scaling (DPS) algorithm for training [a] CNN” [convolutional neural network] and “a configurable MAC unit that can be configured for various precision (bit width) computation modes” (See, e.g., Na, Abstract and page 1, section 1). Doing so would have allowed Baum in view of Esser to use Na’s DPS algorithm and MAC unit to “perform fixed point computation with variable precision mode providing differentiated computation time which enables speeding up training for lower precision computation”, to “achieve 5.7x speed-up while consuming 31% energy compared to baseline for modified Alexnet on Flickr image style recognition” and “enable faster computation by lowering latency for lower precision”, as suggested by Na (See, e.g., Na, Abstract and page 1, section 1). 

Claims 6, 7, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Baum in view of Esser as applied to claims 1 and 12 above, and further in view of Kovvuri et al. (U.S. Patent Application Pub. No. 2019/0286973 A1, hereinafter “Kovvuri”). Kovvuri was filed on May 4, 2018, and this date is before the effective filing date of this application, i.e., October 11, 2018. Therefore, Kovvuri constitutes prior art under 35 U.S.C. 102(a)(2).
Regarding claims 6 and 17, as discussed above, Baum in view of Esser teaches the method of claim 1 and the system of claim 12.
Baum further discloses dividing the input data tensor (see, e.g., FIG. 3 – depicting input data 347 that is divided into blocks 346 and paragraphs 207 and 265, “ANN input data 347 enters shared L3 memory, is read from allocated memory blocks, processed by the PEs [processing elements] in one or more subclusters, output to neighboring memory blocks”, “the neural network data stored in the memory represents a tensor, i.e. an Z-dimensional matrix” [i.e., input data/tensor is allocated/divided]).
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach wherein dividing the input data tensor comprises packing the input data tensor.
In the same field, analogous art Kovvuri teaches wherein dividing the input data tensor comprises packing the input data tensor (see, e.g., paragraph 95, “a mapping of input values … metadata can include a format for communicating input values … the values can be transferred between the subgraph 320 and the general purpose CPU using a tensor data structure. A tensor is a data structure organized as an array of numbers. The tensor array is characterized by a degree or order of the tensor [i.e., an input data tensor]. … Each dimension of the tensor can have a different respective number of elements or values. The values of a given tensor can be packed linearly” [i.e., packing the given, input data tensor]). 
Baum, Esser and Kovvuri are analogous art because they are each directed to neural network systems including multiple “NN processing cores”, “dedicated neural cores”, “a network of neurosynaptic cores” and a “multiprocessor 100 [that] includes a plurality 110 of one or more neural processing cores, including individual NN processor core 115.” (see, e.g., Baum, FIG. 4 and paragraph 106, Esser, FIG. 1 and pages 11441-11443 and Kovvuri, paragraph 46).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Baum in view of Esser to incorporate the teachings of Kovvuri to provide “a machine learning model [that] can include a graph of computational nodes” where “The machine learning model can be partitioned into different subgraphs, where each of the subgraphs comprises a subset of the computational nodes of the machine learning model” and where “Each of the subgraphs can be executed by either a CPU, a GPU, or programmable hardware.” (See, e.g., Kovvuri, paragraph 20). Doing so would have allowed Baum in view of Esser to use Kovvuri’s machine learning model to “enabl[e] the most appropriate hardware to execute a given subgraph” so that “a system can potentially have higher performance than systems where the individual subgraphs are not individually assignable to different types of hardware”, as suggested by Kovvuri (See, e.g., Kovvuri, paragraph 20). 

Regarding claims 7 and 18, as discussed above, Baum in view of Esser and Kovvuri teaches the method of claim 6 and the system of claim 17.
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach wherein packing the input data tensor comprises:
reorganizing input features to load unused feature dimensions of the processor with data from non-feature dimensions of the input features.
In the same field, analogous art Kovvuri teaches wherein packing the input data tensor comprises:
reorganizing input features to load unused feature dimensions of the processor with data from non-feature dimensions of the input features (see, e.g., paragraphs 89-90 and 95, “accelerator 450 can include conversion of values between a generic model implemented on the server and a specific instance of a model implemented for the subgraph on the accelerator. … neural network models may model node and other network value[s] using 32-bit values. The neural network accelerator 450 may model subgraphs using a fewer number of bits”, “neural network model 310 and the subgraph 320 can be executed on a general-purpose CPU that performs computations with relatively high precision … with 24 or 53 bits … subgraph 320 can be accelerated on a neural network accelerator using lower precision computations (e.g., using a format with eight or fewer bits … subgraph 320 can be truncated or rounded to match a precision of the accelerator.” [i.e., truncate or round unused feature dimensions/precisions/bit widths of the CPU/processor], “a mapping of input values to internal resources of the accelerator [i.e., input data includes accelerator/processor features] … the values can be transferred between the subgraph 320 and the general purpose CPU using a tensor data structure [i.e., input data tensor includes values representing CPU/processor features, transfer/load unused feature dimensions of the CPU/processor] … a second-order tensor is a two-dimensional array … Each dimension of the tensor can have a different respective number of elements … a two-dimensional tensor with three elements in the first dimension and two elements in the second dimension [i.e., a two-dimensional tensor has a feature dimension and a non-feature dimension] can have a length of six and be packed in six linear fields of the data structure.” [i.e., packing the tensor so as to reorganize the tensor by unloading/truncating CPU/processor feature dimensions from the input’s non-feature dimension]).

Claims 8-11 and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Baum in view of Esser as applied to claims 1 and 12 above, and further in view of Langhammer (U.S. Patent Application Pub. No. 20190042191 A1, hereinafter “Langhammer”). Langhammer was filed on September 26, 2018, and this date is before the effective filing date of this application, i.e., October 11, 2018. Therefore, Langhammer constitutes prior art under 35 U.S.C. 102(a)(2).
Regarding claims 8 and 19, as discussed above, Baum in view of Esser teaches the method of claim 1 and the system of claim 12.
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach wherein the neural network processor is configured to:
compute a plurality of fixed precision partial sums; and
combine the plurality of fixed precision partial sums into complete sums.
In the same field, analogous art Langhammer teaches wherein the neural network processor is configured to (see, e.g., paragraph 26, “convolutional neural networks (CNN) may work very well with a mixture of half-precision floating-point arithmetic (i.e., FP16) and single-precision floating-point arithmetic circuitry … a specialized processing block supports both, single-precision floating-point arithmetic and half-precision floating-point arithmetic” [i.e., a neural network processor/processing block configured to]):
compute a plurality of fixed precision partial sums (see, e.g., FIG. 1 – depicting MAIN_ADDER 200 that computes FIXED_SUM 130, paragraphs 8, 36 and 64 and claim 17, “arithmetic operator circuit may generate a first partial product of first and second half-precision floating-point numbers”, “input to main adder 200 to provide the resultant product of the multiplication operation, which can be a fixed-point output 130”, “The computation of a sum-plus-one signal … the circuitry is performing … a fixed-point operation.”, “a fixed-point operation on the plurality of input signals with the plurality of input signals representing a plurality of fixed-point numbers [i.e., adder 200 computes fixed-point output 130/fixed-precision partial sums of fixed-point numbers]); and
combine the plurality of fixed precision partial sums into complete sums (see, e.g., paragraphs 8 and 42, “The compressor circuit may generate … a sum vector signal based on the first and second partial products”, “compressors 210 and 212 may receive the partial product (e.g., signals 202, 203, 204, and 205) of a first half-precision floating-point operation” [i.e., combine the fixed-point/fixed-precision partial products into a complete sum]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Baum in view of Esser to incorporate the teachings of Langhammer to provide “integrated circuits and, more particularly, to performing reduced-precision floating-point arithmetic operations using specialized processing blocks with higher-precision floating-point arithmetic circuitry.” (See, e.g., Langhammer, paragraph 24). Doing so would have allowed Baum in view of Esser to use Langhammer’s circuitry to support “both, single-precision floating-point arithmetic and half-precision floating-point arithmetic, efficiently and effectively” in “convolutional neural networks (CNN) [that] may work very well with a mixture of half-precision floating-point arithmetic (i.e., FP16) and single-precision floating-point arithmetic circuitry (i.e., FP32)”, as suggested by Langhammer (See, e.g., Langhammer, paragraph 26). 

Regarding claims 9 and 20, as discussed above, Baum in view of Esser teaches the method of claim 8 and the system of claim 19.
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach wherein the plurality of fixed precision partial sums are intermediate results.
In the same field, analogous art Langhammer teaches wherein the plurality of fixed precision partial sums are intermediate results (see, e.g., FIG. 1 – depicting MAIN_ADDER 200 that computes FIXED_SUM 130 and paragraphs 36-37, “Output vectors 119 and 129 may each be up to 74 bits wide and are input to main adder 200 to provide the resultant product … which can be a fixed-point output 130”, “To accommodate normalization and rounding, it may be necessary to add either zero, one or two to the least significant bit(s) of the result (which may be referred to as the sum). [i.e., fixed-point/fixed-precision sum is an intermediate result that is then normalized and rounded]).

Regarding claims 10 and 21, as discussed above, Baum in view of Esser teaches the method of claim 9 and the system of claim 20.
Baum further discloses wherein the intermediate results are weighted sums of a subset of inputs (see, e.g., FIG. 7A – showing that weights 470 and input 468 are used by “input data representation circuit 452” to produce intermediate results 504, 506 and paragraphs 165, 207 and 221, “input data (X) 468 and weights (W) 470 are input from L3 memory to the input data representation circuit 452. This circuit is operative to transform the representation of the input data and/or weights … The resulting X 504 and W 506 are input to the multiplier 454”, “The PEs within the subcluster process the weights and input data and generate outputs 343.”, “PEs 664 multiply the input data with weights” [i.e., intermediate results X 504, W 506 and 343 are weighted sums of a subset of inputs/input data]). 
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach wherein the intermediate results are … sums of a subset of inputs.
In the same field, analogous art Langhammer teaches wherein the intermediate results are … sums of a subset of inputs (see, e.g., claims 4 and 6, “an adder that receives two of the first subset of the plurality of input numbers, adds the two of the first subset of the plurality of input numbers together, and outputs a sum of the two of the first subset of the plurality of input numbers to the first multiplier.”, “an adder that receives two of the second subset of the plurality of input numbers, adds the two of the second subset of the plurality of input numbers together, and outputs a sum of the two of the second subset of the plurality of input numbers to the second multiplier.” [i.e., multipliers create weighted sums of first and second subsets of input numbers]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Baum in view of Esser to incorporate the teachings of Langhammer to provide “integrated circuits and, more particularly, to performing reduced-precision floating-point arithmetic operations using specialized processing blocks with higher-precision floating-point arithmetic circuitry.” (See, e.g., Langhammer, paragraph 24). Doing so would have allowed Baum in view of Esser to use Langhammer’s circuitry to support “both, single-precision floating-point arithmetic and half-precision floating-point arithmetic, efficiently and effectively” in “convolutional neural networks (CNN) [that] may work very well with a mixture of half-precision floating-point arithmetic (i.e., FP16) and single-precision floating-point arithmetic circuitry (i.e., FP32)”, as suggested by Langhammer (See, e.g., Langhammer, paragraph 26). 

Regarding claims 11 and 22, as discussed above, Baum in view of Esser teaches the method of claim 9 and the system of claim 19.
Although Baum in view of Esser substantially teaches the claimed invention, Baum in view of Esser is not relied on to teach wherein the neural network processor is configured to iteratively compute a partial sum from the plurality of fixed precision partial sums.
In the same field, analogous art Langhammer teaches wherein the neural network processor is configured to iteratively compute a partial sum from the plurality of fixed precision partial sums (see, e.g., paragraphs 8, 36 and 43 and claims 4 and 6, “The compressor circuit may generate … a sum vector signal based on the first and second partial products, and the third arithmetic operator circuit may generate … first and second results … and at least third and fourth results … based on the … sum vector signals”, “vectors 119 and 129 may each be up to 74 bits wide and are input to main adder 200 to provide … a fixed-point output 130” [i.e., fixed-point/fixed-precision partial sums 130 from adder 200], “compressors 210, 212, 214, and 216 may each generate two signals, which may be referred to as sum vector signals 211, 215, 221, and 225, or simply sum signals … perform a bitwise logical XOR operation of the respective input signals (i.e., signals 202 and 203) to generate the respective sum signal” [i.e., iteratively generating/computing partial sums/respective sum signals from fixed-point/fixed-precision partial sums], “an adder that receives two of the first subset of the plurality of input numbers, adds the two of the first subset of the plurality of input numbers together, and outputs a sum of the two of the first subset of the plurality of input numbers to the first multiplier.”, “an adder that receives two of the second subset of the plurality of input numbers, adds the two of the second subset of the plurality of input numbers together, and outputs a sum of the two of the second subset of the plurality of input numbers to the second multiplier.” [i.e., multipliers create weighted sums of first and second subsets of input numbers]).

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
The prior art made of record, listed on the accompanying PTO-892 Notice of References Cited form, and not relied upon is considered pertinent to applicant's disclosure.
The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.K.B./Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125