DETAILED ACTION
This is in response to the request for continued examination filed on 11/1/2022.

Status of Claims
Claims 1 – 28 are pending, of which claims 1 and 11 are in independent form.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/1/2022 has been entered.

Claim Rejections - 35 USC § 101
In light of applicant’s amendments to the claims, the examiner withdraws the previous rejections to the claims under 35 USC 101.

Information Disclosure Statement
The information disclosure statement filed 10/26/2022 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been considered.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 5, 7 – 10, and 20 – 24 are rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi et al., U.S. Patent Application 2019/0354846 (hereinafter referred to as Mellempudi) in view of ‘Format Abstraction for Sparse Tensor Algebra Compilers’ by Stephen Chou et al. (hereinafter referred to as Chou), further in view of ‘Hardware-Oriented Approximation of Convolutional Neural Networks’ by Gysel et al. (hereinafter referred to as Gysel).

Referring to claim 1, Mellempudi discloses “a hybrid computational system, comprising: a configurable and programmable specialty hardware programmed by a central processing unit” and “an extensible multi-precision data pipeline” (Fig. 1 processor 102, parallel processor 112 and [0047] programmable processor, FPGA, ASIC. [0004] graphics processors and pipelines); “a memory interface controller; a memory; and a central interconnect” (Fig. 2A memory interface 218, memory 222, memory crossbar 216); “herein the extensible multi-precision data pipeline” (Figs. 5 and 24) “comprises: a local buffer that” “stores the input” tensor ([0231] unified return buffer and [0273] writing intermediate data during processing to a buffer.  Fig. 6B input tensors); “an input tensor shaper coupled to the local buffer that reads the input” tensor (Fig. 6B scale unit 654, [0231] unified return buffer and [0273] writing intermediate data during processing to a buffer) “and converts the input” tensor “into an input tensor data set” (Fig. 6B scale unit 654, Fig. 7A and [0140], [0145] scaling input tensors before computation); “a cascaded pipeline coupled to the input tensor shaper that routes the input tensor data set through at least one function stage resulting in an output tensor data set” (Fig. 7A and compute operations 708, Fig. 6B output tensor); “an output tensor shaper coupled to the cascaded pipeline that converts the output tensor data set into an output local data set having the local storage format” (Fig. 6B re-scale unit 656, Fig. 7A re-scale at 710 and [0145] - [0146] re-scaling tensors after computation); and “wherein the output tensor shaper writes the output local data set to the local buffer to be stored on the memory” ([0231] unified return buffer and [0273] writing intermediate data during processing to a buffer, [0132] output to be stored in memory).
Mellempudi appears to teach receiving input tensors before scaling and rescaling to create an output tensor (Fig. 6B).  Thus, it follows that Mellempudi does not appear to explicitly disclose “a local buffer that loads input local data from the memory and stores the input local data set in a local storage format; an input tensor shaper coupled to the local buffer that reads the input local data set in the local storage format in the local buffer and converts the input local data set into an input tensor data set.”
However, Chou discloses another tensor handling system with “a local buffer that loads input local data from the memory” and a “local data set in a local storage format” is stored (Figure 2 and section 2.1 tensor storage formats, such as COO, CSR, etc.), wherein the method “reads the input local data set in the local storage format in the local buffer and converts the input local data set into an input tensor data set” (page 123:24 paragraph beginning with ‘Our technique’s support’ describes disparate format handling, converting to a CSR matrix for more efficient processing.  Section 6.1 advantages to different ‘formats for storing sparse matrices.’  The COO format is a natural format for importing and exporting tensors).
	Mellempudi does not appear to explicitly disclose converting the local data set to “an input tensor data set having a 2-dimensional tensor format of vector width N by tensor length L, wherein N and L are integers.”
However, Chou discloses “an input tensor data set having a 2-dimensional tensor format of vector width N by tensor length L, wherein N and L are integers” (Introduction multi-dimensional data.  page 123:24 paragraph beginning with ‘Our technique’s support’ describes disparate format handling, converting to a CSR matrix for more efficient processing).
Mellempudi and Chou are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi and Chou before him or her, to modify the teachings of Mellempudi to include the teachings of Chou so that input data is converted to a tensor format of vector width N by tensor length L.
The motivation for doing so would have been to take advantage of the processing advantages of certain tensor formats (as described by Chou at page 123:24 paragraph beginning with ‘Our technique’s support’ – “computing matrix-vector products directly on COO matrices can take up to twice as much time as with CSR matrices due to higher memory traffic”).
Neither Mellempudi nor Chou appears to explicitly disclose “a configurable and programmable specialty hardware programmed by a central processing unit with model data determined by a graphical processor unit to implement neural networks associated with an extensible multi-precision data pipeline.”  Further, neither Mellempudi nor Chou appears to explicitly disclose “storing the input local data set in a local storage format of either 32-bit fixed-point, 16-bit fixed-point, or 8-bit fixed-point.”
However, Gysel discloses “model data determined by a graphical processor unit to implement neural networks associated with an extensible multi-precision data pipeline” (section 4 Ristretto takes a trained model as an input, implementation on the GPU).  Also, Gysel discloses “storing the input local data set in a local storage format of either 32-bit fixed-point, 16-bit fixed-point, or 8-bit fixed-point” (Abstract, Introduction, section 2 ‘Mixed Fixed Point Precision,’ section 3 ‘Dynamic Fixed Point’)
It would have been obvious to one of ordinary skill in the art at the time of Applicant’s filing to combine Gysel with Mellempudi/Chou so that the system includes “a configurable and programmable specialty hardware programmed by a central processing unit (as in Mellempudi) with model data determined by a graphical processor unit to implement neural networks associated with an extensible multi-precision data pipeline (as in Gysel).”
Mellempudi, Chou, and Gysel are analogous art because they are from the same field of endeavor, which is processing tensor data.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, and Gysel before him or her, to modify the teachings of Mellempudi and Chou to include the teachings of Gysel so that a fixed-point notation is utilized.
The motivation for doing so would have been to use a less resource hungry version of arithmetic (as stated by Gysel in the 2nd paragraph of the Introduction).
Therefore, it would have been obvious to combine Gysel with Mellempudi and Chou to obtain the invention as specified in the instant claim.

	As per claim 3, Mellempudi discloses “the cascaded pipeline allows multiple operations to be computed in inline fashion and save memory bandwidth” (Fig. 24B and [0273] – [0275] pipeline operations).

	As per claim 5, Mellempudi discloses “a tensor-wise stage within the at least one function stage that processes the input tensor data” ([0141] per-tensor scale factor).
	
	As per claim 7, Mellempudi discloses “the at least one function stage retains a” “notation of an intermediate step value and defines locally optimized data representations to fulfill a dynamic range” ([0140]-[0146] dynamic range, range scaling unit).
	Neither Mellempudi nor Chou appears to explicitly disclose “a fixed-point notation.”
	However, Gysel discloses utilizing “a fixed-point notation” and dynamic fixed point (section 2 ‘Mixed Fixed Point Precision’).
Mellempudi, Chou, and Gysel are analogous art because they are from the same field of endeavor, which is processing tensor data.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, and Gysel before him or her, to modify the teachings of Mellempudi and Chou to include the teachings of Gysel so that a fixed-point notation is utilized.
The motivation for doing so would have been to use a less resource hungry version of arithmetic (as stated by Gysel in the 2nd paragraph of the Introduction).
Therefore, it would have been obvious to combine Gysel with Mellempudi and Chou to obtain the invention as specified in the instant claim.

	As per claim 8, Mellempudi discloses “the intermediate step value includes at least one of an input value, a resultant value and an output value” ([0273] writing intermediate data during processing to a buffer).

	As per claim 9, Mellempudi discloses “a normalizer within the at least one function stage that normalizes the input tensor data set into a smaller range” ([0140] – [0146] range scaling unit).

	As per claim 10, Mellempudi discloses “a look up stage within the at least one function stage that maps the normalized input tensor data set to an index memory store that outputs a difference between the normalized input tensor data set and a referenced value used to determine a lookup location in a memory store” ([0248] instruction with index values and compaction table, [0253] indirect addressing mode).

	As per claim 20, Mellempudi discloses “the configurable and programmable specialty hardware, the graphical processor unit, and the central processing unit are all interconnected with the central interconnect and further connected to the memory interface controller” (Fig. 2A parallel processor 112, FPGA, ASIC memory interface 218, memory 222, memory crossbar 216 and [0047] programmable processor).

	As per claim 21, Mellempudi discloses “the configurable and programmable specialty hardware is specifically connected to the memory interface controller via a programmable logic circuit to memory interconnect configured to minimize circuity utilized” (Fig. 2A parallel processor 112, FPGA, ASIC memory interface 218, memory 222, memory crossbar 216 and [0047] programmable processor).

	As per claim 22, Mellempudi discloses “the memory interface controller is further
connected to the memory, wherein the memory comprises at least one of the following: 
persistent memory disk, a system memory, and a read only memory” ([0056] In various embodiments, the memory units 224A-224N can include various types of memory devices, including dynamic random access memory (DRAM) or graphics random access memory, such as synchronous graphics random access memory (SGRAM), including graphics double data rate (GDDR) memory).

As per claim 23, Mellempudi discloses “the memory interface controller is further
connected to the central interconnect” (Fig. 2A memory interface 218 and memory crossbar 216). 

	As per claim 24, Mellempudi discloses “the central interconnect is further connected to an input and output interface and a network interface” (Fig. 2A memory crossbar 216 connects to I/O unit 204 and host interface 206).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi in view of Chou, further in view of Gysel, as applied to claims above, and further in view of Ghosh, U.S. Patent Application 2020/0311569 (hereinafter referred to as Ghosh).

	As per claim 2, neither Mellempudi nor Chou nor Gysel appears to explicitly disclose “an encapsulator coupled to the cascaded pipeline to fuse multiple function stages into a fused operation.”
	However, fusing pipeline stages is known in the art.  For example, Ghosh discloses “an encapsulator coupled to the cascaded pipeline to fuse multiple function stages into a fused operation” ([0060] fusing together to form a pipeline stage).
Mellempudi, Chou, Gysel, and Ghosh are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, Gysel, and Ghosh before him or her, to modify the teachings of Mellempudi, Chou, and Gysel to include the teachings of Ghosh so that multiple function stages are fused in the pipeline.
The motivation for doing so would have been to provide a more efficient program that only has to call the first function, which would then be fused to another function.
Therefore, it would have been obvious to combine Ghosh with Mellempudi, Chou, and Gysel to obtain the invention as specified in the instant claim.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi in view of Chou, further in view of Gysel, as applied to claims above, and further in view of Lu et al., U.S. Patent 10,073,816 (hereinafter referred to as Lu).

	As per claim 4, neither Mellempudi nor Chou nor Gysel appears to explicitly disclose “an element-wise stage within the at least one function stage that processes the input tensor data set on an element by element basis along the tensor length L.”
	However, Lu discloses “an element-wise stage within the at least one function stage that processes the input tensor data set on an element by element basis along the tensor length L” (column 2 lines 46 - 55 element-wise processing engine that performs element-by-element operations between tensors).
Mellempudi, Chou, Gysel, and Lu are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, Gysel, and Lu before him or her, to modify the teachings of Mellempudi, Chou, and Gysel to include the teachings of Lu so that a function stage includes an element-wise stage.
The motivation for doing so would have been to provide a means for accomplishing the common operation of the linear combination of two tensors (as stated by Lu at column 4 lines 4 – 13).
Therefore, it would have been obvious to combine Lu with Mellempudi, Chou, and Gysel to obtain the invention as specified in the instant claim.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi in view of Chou, further in view of Gysel, as applied to claims above, and further in view of ‘How to Quantize Neural Networks with TensorFlow’ by Warden (hereinafter referred to as Warden).

	As per claim 6, neither Mellempudi nor Chou nor Gysel appears to explicitly disclose “a quantizing adjuster within the at least one function stage that dynamically adjusts quantization through the at least one function stage.”
	However, Warden discloses “a quantizing adjuster within the at least one function stage that dynamically adjusts quantization through the at least one function stage”
(‘How Does the Quantization Process Work?’ section starting on page 3 with min and max input to quantizer along with the input data.  “The min and max operations actually look at the values in the input float tensor”).
Mellempudi, Chou, Gysel, and Warden are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, Gysel, and Warden before him or her, to modify the teachings of Mellempudi, Chou, and Gysel to include the teachings of Warden so that a quantizing adjuster dynamically adjusts quantization.
The motivation for doing so would have been to remove unnecessary operations (as described by Warden in the first paragraph on page 5).
Therefore, it would have been obvious to combine Warden with Mellempudi, Chou, and Gysel to obtain the invention as specified in the instant claim.

Claims 11, 14, 16 – 19, and 25 – 28 are rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi in view of Chou, further in view of Yao et al., U.S. Patent Application 2018/0046903 (hereinafter referred to as Yao).

Referring to claim 11, Mellempudi discloses “a hybrid computational system, comprising: a configurable and programmable specialty hardware” “a central processing unit” and “an extensible multi-precision data pipeline” (Fig. 1 processor 102, parallel processor 112 and [0047] programmable processor, FPGA, ASIC. Fig. 5 and [0004] graphics processors and pipelines); “comprising the steps of: reading an input” tensor “that is stored in a local storage format” ([0231] unified return buffer and [0273] writing intermediate data during processing to a buffer.  Fig. 6B input tensors, scale unit 654, [0231] unified return buffer and [0273] writing intermediate data during processing to a buffer) “converting the input” tensor “into an input tensor data set” (Fig. 6B scale unit 654, Fig. 7A and [0140], [0145] scaling input tensors before computation); “routing the input tensor data set through at least one function stage resulting in an output tensor data set” (Fig. 7A and compute operations 708, Fig. 6B output tensor); “converting the output tensor data set into an output local data set having the local storage format” (Fig. 6B re-scale unit 656, Fig. 7A re-scale at 710 and [0145] - [0146] re-scaling tensors after computation); and “writing the output local data set to the output buffer” ([0231] unified return buffer and [0273] writing intermediate data during processing to a buffer, [0132] output to be stored in memory).
Mellempudi appears to teach receiving input tensors before scaling and rescaling to create an output tensor (Fig. 6B).  Thus, it follows that Mellempudi does not appear to explicitly disclose “reading an input local data set” “that is stored in a local storage format; converting the input local data set in the local storage format” “into an input tensor data set.”
However, Chou discloses another tensor handling system with “a local buffer that loads input local data from the memory” and a “local data set in a local storage format” is stored (Figure 2 and section 2.1 tensor storage formats, such as COO, CSR, etc.), wherein the method “reads the input local data set in the local storage format in the local buffer and converts the input local data set into an input tensor data set” (page 123:24 paragraph beginning with ‘Our technique’s support’ describes disparate format handling, converting to a CSR matrix for more efficient processing.  Section 6.1 advantages to different ‘formats for storing sparse matrices.’  The COO format is a natural format for importing and exporting tensors).
	Mellempudi does not appear to explicitly disclose converting the local data set to “an input tensor data set having a 2-dimensional tensor format of vector width N by tensor length L, wherein N and L are integers.”
However, Chou discloses “an input tensor data set having a 2-dimensional tensor format of vector width N by tensor length L, wherein N and L are integers” (Introduction multi-dimensional data.  page 123:24 paragraph beginning with ‘Our technique’s support’ describes disparate format handling, converting to a CSR matrix for more efficient processing).
Mellempudi and Chou are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi and Chou before him or her, to modify the teachings of Mellempudi to include the teachings of Chou so that input data is converted to a tensor format of vector width N by tensor length L.
The motivation for doing so would have been to take advantage of the processing advantages of certain tensor formats (as described by Chou at page 123:24 paragraph beginning with ‘Our technique’s support’ – “computing matrix-vector products directly on COO matrices can take up to twice as much time as with CSR matrices due to higher memory traffic”).
Neither Mellempudi nor Chou appears to explicitly disclose “a configurable and programmable specialty hardware connected to a configurable and programmable specialty hardware controller, which interfaces with a direct memory access connected to an input buffer and an output buffer, which are both connected to the configurable and programmable specialty hardware; and a central processing unit connected to a main switch configured to shuttle data and commands to the direct memory access; wherein the hybrid computational system is configured to implement neural networks associated with an extensible multi-precision data pipeline method.”  Further, neither Mellempudi nor Chou appears to explicitly disclose “storing the input local data set in a local storage format of either 32-bit fixed-point, 16-bit fixed-point, or 8-bit fixed-point.”
However, Yao discloses “a configurable and programmable specialty hardware connected to a configurable and programmable specialty hardware controller, which interfaces with a direct memory access connected to an input buffer and an output buffer, which are both connected to the configurable and programmable specialty hardware” (Figure 8A programmable logic 8200 with controller 8210, DMA 8230, input buffer 8240, and output buffer 8250); “and a central processing unit connected to a main switch configured to shuttle data and commands to the direct memory access” (Fig. 8A CPU 8110 data and instruction bus); “wherein the hybrid computational system is configured to implement neural networks” (Abstract a CPU+FPGA heterogeneous architecture for implementing and optimizing a convolutional neural network based on an embedded FPGA).  Yao also discloses “storing the input local data set in a local storage format of either 32-bit fixed-point, 16-bit fixed-point, or 8-bit fixed-point” ([0097] – [0101]).
It would have been obvious to one of ordinary skill in the art at the time of Applicant’s filing to combine Yao with Mellempudi/Chou so that the system includes “a configurable and programmable specialty hardware connected to a configurable and programmable specialty hardware controller, which interfaces with a direct memory access connected to an input buffer and an output buffer.”
Mellempudi, Chou, and Yao are analogous art because they are from the same field of endeavor, which is processing tensor data.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, and Yao before him or her, to modify the teachings of Mellempudi and Chou to include the teachings of Yao so that a fixed-point notation is utilized and the structure includes configurable hardware, a configurable hardware controller, and a DMA with buffers.
The motivation for doing so would have been to reduce memory footprint and bandwidth requirements (as described by Yao at [0101]).
Therefore, it would have been obvious to combine Yao with Mellempudi and Chou to obtain the invention as specified in the instant claim.

As per claim 14, Mellempudi discloses “the step of processing the input tensor data set in the at least one function stage” ([0141] per-tensor scale factor).

As per claim 16, Mellempudi discloses “the step of retaining a” “notation of an intermediate step value and defines locally optimized data representations to fulfill a dynamic range” ([0140]-[0146] dynamic range, range scaling unit).
	Neither Mellempudi nor Chou appears to explicitly disclose “a fixed-point notation.”
	However, Yao discloses utilizing “a fixed-point notation” ([0097] – [0101]).
Mellempudi, Chou, and Yao are analogous art because they are from the same field of endeavor, which is processing tensor data.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, and Yao before him or her, to modify the teachings of Mellempudi and Chou to include the teachings of Yao so that a fixed-point notation is utilized.
The motivation for doing so would have been to reduce memory footprint and bandwidth requirements (as described by Yao at [0101]).
Therefore, it would have been obvious to combine Yao with Mellempudi and Chou to obtain the invention as specified in the instant claim.

	As per claim 17, Mellempudi discloses “the intermediate step value includes at least one of an input value, a resultant value and an output value” ([0273] writing intermediate data during processing to a buffer).

	As per claim 18, Mellempudi discloses “the step of normalizing the input tensor data set into a smaller range” ([0140] – [0146] range scaling unit).

	As per claim 19, Mellempudi discloses “the step of mapping the normalized input tensor data set to an index memory store that outputs a difference between the normalized input tensor data set and a referenced value used to determine a lookup location in a memory store” ([0248] instruction with index values and compaction table, [0253] indirect addressing mode).

As per claim 25, Yao discloses “the direct memory access if further connected to a SDRAM controller configured to allow data to be shuttled to and from the configurable and programmable specialty hardware device to the central processing unit” (Fig. 8A programmable logic 8200, CPU 8110, external memory 8120).


As per claim 26, Yao discloses “the SDRAM controller is further connected to an external SDRAM and the central processing unit” (Fig. 8A external memory 8120 and CPU 8110).

As per claim 27, Mellempudi discloses “wherein the main switch is further connected to a peripheral interface” (Fig. 2A memory crossbar connects memory interface 218 to processing array 212, I/O unit 204, and host interface 206).

As per claim 28, Mellempudi discloses “the central processing unit is further connected to a flash controller configured to control persistent memory” (Fig. 16 connections to memory controller hub 1616 and memory device 1620, which may be flash memory [0211]).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi in view of Chou, further in view of Yao, as applied to claims above, and further in view of Ghosh.

	As per claim 12, neither Mellempudi nor Chou nor Yao appears to explicitly disclose “the step of encapsulating multiple function stages into a fused operation.”
	However, fusing pipeline stages is known in the art.  For example, Ghosh discloses “an encapsulator coupled to the cascaded pipeline to fuse multiple function stages into a fused operation” ([0060] fusing together to form a pipeline stage).
Mellempudi, Chou, Yao, and Ghosh are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, Yao, and Ghosh before him or her, to modify the teachings of Mellempudi, Chou, and Yao to include the teachings of Ghosh so that multiple function stages are fused in the pipeline.
The motivation for doing so would have been to provide a more efficient program that only has to call the first function, which would then be fused to another function.
Therefore, it would have been obvious to combine Ghosh with Mellempudi, Chou, and Yao to obtain the invention as specified in the instant claim.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi in view of Chou, further in view of Yao, as applied to claims above, and further in view of Lu.

	As per claim 13, neither Mellempudi nor Chou nor Yao appears to explicitly disclose “the step of processing the input tensor data set on an element by element basis along the tensor length L in the at least one function stage.”
	However, Lu discloses “the step of processing the input tensor data set on an element by element basis along the tensor length L in the at least one function stage” (column 2 lines 46 - 55 element-wise processing engine that performs element-by-element operations between tensors).
Mellempudi, Chou, Yao, and Lu are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, Yao, and Lu before him or her, to modify the teachings of Mellempudi, Chou, and Yao to include the teachings of Lu so that a function stage includes an element-wise stage.
The motivation for doing so would have been to provide a means for accomplishing the common operation of the linear combination of two tensors (as stated by Lu at column 4 lines 4 – 13).
Therefore, it would have been obvious to combine Lu with Mellempudi, Chou, and Yao to obtain the invention as specified in the instant claim.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi in view of Chou, further in view of Yao, as applied to claims above, and further in view of Warden.

	As per claim 15, neither Mellempudi nor Chou nor Yao appears to explicitly disclose “the step of dynamically adjusting quantization through the at least one function stage.”
	However, Warden discloses “a quantizing adjuster within the at least one function stage that dynamically adjusts quantization through the at least one function stage”
(‘How Does the Quantization Process Work?’ section starting on page 3 with min and max input to quantizer along with the input data.  “The min and max operations actually look at the values in the input float tensor”).
Mellempudi, Chou, Yao, and Warden are analogous art because they are from the same field of endeavor, which is tensor data processing.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Mellempudi, Chou, Yao, and Warden before him or her, to modify the teachings of Mellempudi, Chou, and Yao to include the teachings of Warden so that a quantizing adjuster dynamically adjusts quantization.
The motivation for doing so would have been to remove unnecessary operations (as described by Warden in the first paragraph on page 5).
Therefore, it would have been obvious to combine Warden with Mellempudi, Chou, and Yao to obtain the invention as specified in the instant claim.

Response to Arguments
Applicant’s arguments with respect to claims 1 – 10 and 20 – 24 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent Application 20200183837 teaches an accelerator for tensor computation with input and output buffers.
U.S. Patent Application 20200380341 and Patent 11232347 teaches deep learning acceleration with vectors and queues.
U.S. Patent Application 20180046913 and Patent 10802992 are also to Yao and Deephi Technology, with similar teachings to Yao above.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN G SNYDER whose telephone number is (571)270-1971.  The examiner can normally be reached on M-F 8:00am-4:30pm (flexible).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henry Tsai can be reached on 571-272-4176.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/STEVEN G SNYDER/Primary Examiner, Art Unit 2184