DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/24/2022 has been entered.

Response to Amendment
In responsive to applicant amendment filed on 06/24/2022. Claims 1-38 are pending. Applicant’s amendment have overcome the objections of drawing and claims and rejection under 35 U.S.C. 112(a) previously set forth in the Final Rejection dated 04/04/2022.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claim 20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 20 recites “an input data transform circuit configured to transform and output the at least first portion of the input data” and “a filter data transform circuit configured to transform and output the at least second portion of the filter data”. It is unclear whether outputting “the at least first portion of the input data” and the at least second portion of the filter data” refers to the transformed data, or the original data. For examination purposes, examiner interpreted the limitation as “an input data transform circuit configured to transform and output transformed input data” and “a filter data transform circuit configured to transform and output transformed filter data”.

Claim 20 recites “the function processor comprises two or more of an input data transform circuit … a filter data transform circuit… a multiplier … an output data transform circuit…”. It is unclear whether “two or more of” refers to two or more of each circuit, for examples, two or more of an input data transform circuit, two or more of a filter data transform circuit, etc., or the function processor comprises two or more of the following circuits, for example, the function processor comprises an input data transform circuit, a multiplier [i.e. two or more of]. For examination purposes, Examiner interpreted the limitation as the function processor comprises two or more of the following circuit.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20, 29-36, and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Lin (NPL - CNNWire: Boosting CNN with Winograd on ReRAM based accelerator) (listed on IDS 04/06/2021) in view of Sze – (NPL – Efficient processing of deep neural network a tutorial and survey) (listed on IDS 04/06/2021).

Regarding claim 1, Lin teaches a memory device comprising (Lin, figure 5 ): a memory configured to store input data and filter data for a convolution operation (Lin, figure 5, section 3.1 the transformed kernels are stored in the ReRAM array; the input bank buffer are composed with SRAM); and a function processor (Lin, figure 5 the overview of the whole architecture) configured to transform at least a portion of the input data using a first transforming matrix (Lin figure 6, section 3.2, the WPE processes tiles IFM and comprises Winograd transform Module in equation 4 (note that e.4 of Lin (Winograd algorithm) is corresponding to e.6 and e.8 of the instant application. Figures 1,9 section 2.2, 3.4, 2x2 input IFM are transformed into 4x4 using WTM_B array, according to e.4, input d is transformed using matrix B) and transform at least a second portion of the filter data using a second transforming matrix (Lin, section 3.1 the transformed convolutional kernels are stored in the ReRAM array. Section 2.2 figure 1, illustrates tiled filter are transformed, e.3 and 3.4 describes weight w is transformed using matrix G), the first transforming matrix and the second transforming matrix being respectively based on a parameter of the convolution operation during a clock cycle and output a corresponding transformation result as transformed data (Lin, figure 1, 5, 6, section 3.2, the WPE consists of Winograd transform modules in equation 4. Section 2.2, the tiles and kernels are transformed into the same dimension. Section 1, when data are stored in ReRAM, we can get the results within a single cycle. The size of IFM tile/kernel is the parameter of the convolution operation).  
Lin does not teach a function processor to transform the data in response to a read command of at least a portion of data from among the input data and the filter data. However, Sze teaches read operation among the input data and filter data (Sze, figure 21).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the system of Lin to perform a read/write operation command as disclosed in Sze. This modification would have been obvious because discloses method to perform convolution for neural network efficient algorithm, and one of ordinary skill in the art could have combined the elements as claimed by known method and the combination would have yielded predictable results as performing read/write operations to transform data and perform convolution operation. See MPEP 2141.III.(a) combining prior art elements according to known method to yield predictable result.

Regarding claim 2, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor comprises an input data transform circuit configured to transform the at least first portion of the input data and output a corresponding transformation result as transformed input data, and the input data transform circuit is structured based on the parameter of the convolution operation and a type of an algorithm applied to transform the at least first portion of the input data to reduce a computational quantity of the convolution operation (Lin, figure 6, section 3.2, WPE processes tiles IFM and comprises Winograd transform modules in e.4 (note that e.4 of Lin (Winograd algorithm) is corresponding to e.6 and e.8 of the instant application. Figures 1, 9 section 3.4, the 2x2 matrix input is transformed into 4x4 and output the transformed input IFM, the size of IFM tile/kernel is the parameter of the convolution operation. Thus, the input IFM is transformed based on size and Winograd algorithm).  

Regarding claim 3, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including an input transform matrix is determined based on both the parameter of the convolution operation and the type of the algorithm, and the input data transform circuit is structured to correspond to the input transform matrix (Lin, figure 6, section 3.2 WPE comprises Winograd transform modules in e.4; the inputs get transformed with WTM_B array; section 2.2 matrix                         
                            
                                
                                    B
                                
                                
                                    T
                                
                            
                        
                     [i.e. input transform matrix] corresponds to e.6 of instant claim).  

Regarding claim 4, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including to transform the at least second portion of the filter data and output a corresponding transformation result as transformed filter data (Lin, section 3.1 the transformed convolutional kernels are stored in the ReRAM array. Section 3.4 the filters get pre-transformed and stored in the crossbars. Section 2.2 the filters are transform based on the filter matrix and Winograd algorithm), but the combined system of Lin in view of Sze does not teach a filter data transform circuit
However, another embodiment of Lin teaches an input data transform circuit configured to transform at least a portion of the input data and the input data transform circuit is structured based on the parameter of the convolution operation and a type of an algorithm applied to transform the at least a portion of the input data to reduce a computational quantity of the convolution operation (Lin, figure 6, section 3.2, WPE processes tiles IFM and comprises Winograd transform modules in e.4 (note that e.4 of Lin (Winograd algorithm) is corresponding to e.6 and e.8 of the instant application. Figure 9 section 3.4, the 2x2 matrix input is transformed into 4x4).
	It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the combined system of Lin in view of Sze to have a filter data transform circuit as similar to an input data transform circuit as disclosed in Lin figure 6. This modification would have been obvious because the combined system of Lin in view of Sze teaches the transformation of filter data and input data as shown in figure 1, 9 and section 3.4, but only explicitly show the input data transform circuit and one of ordinary skill in the art could have combined the elements as claimed by known methods and the combination would have yielded predictable result as transforming the filter data using the filter data transform circuit. See MPEP 2141.III.(a) combining prior art elements according to known method to yield predictable result.
	As modified, the combined system of Lin in view of Sze teaches a filter data transform circuit configured to transform the at least second portion of the filter data and output a corresponding transformation result as transformed filter data, and the filter data transform circuit is structured based on the parameter of the convolution operation and a type of an algorithm applied to transform the at least a portion of the filter data to reduce a computational quantity of the convolution operation (Lin, section 3.1 the transformed convolutional kernels are stored in the ReRAM array. Section 3.4 the filters get pre-transformed and stored in the crossbars. Section 2.2 the filters are transform based on the filter matrix and Winograd algorithm).

	Regarding claim 5, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including a filter transform matrix is determined based on both the parameter of the convolution operation and the type of the algorithm, and 34012052.1799 the filter data transform circuit is structured to correspond to the filter transform matrix (Lin section 2.2 matrix G shown in e.3 [i.e. a filter transform matrix] the tiles input and kernels are transformed into the same dimension using e3 of the Winograd algorithm).

	Regarding claim 6, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the filter data stored in the memory comprises filter data transformed based on the parameter of the convolution operation, and when the filter data is the filter data transformed based on the parameter of the convolution operation, the function processor is configured to output the transformed filter data without further transforming the transformed filter data (Lin, section 3.1, the transformed convolution kernels are stored in the ReRAM array. section 3.4, the filter data get pre-transformed and stored in the crossbars. Section 2.2 the filter data w is transformed based on size of weight w, figure 1 illustrate the transformed filter data are outputted).  

	Regarding claim 7, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor comprises a multiplier accumulator (MAC) configured to perform an operation between the transformed input data and the transformed filter data (Lin, figure 9 section 3.4 the crossbars performs element wise matrix multiplication and accumulating across channels).  

	Regarding claim 8, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor is configured to, in response to a write command (Sze, figure 21) of at least a portion of intermediate output data that is output through an operation between the transformed input data and the transformed filter data, transform the at least portion of the intermediate output data based on the parameter of the convolution operation during a clock cycle corresponding to the write command, and output a corresponding transformation result as transformed intermediate output data (Lin, figure 6, section 3.2 the WPE gets the convolution result. Figures 1, 9, section 2.2 the Winograd convolution performs multiplication of the transformed tiles input and kernels and transform back to the convolution results. Section 1, when the data are stored in the ReRAM, crossbar array, we can get the results within a single cycle).  
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the system of Lin to perform a write operation command as disclosed in Sze. This modification would have been obvious because discloses method to perform convolution for neural network efficient algorithm, and one of ordinary skill in the art could have combined the elements as claimed by known method and the combination would have yielded predictable results as performing write operation to transform output data as disclosed in figure 9 of Lin. See MPEP 2141.III.(a) combining prior art elements according to known method to yield predictable result.

Regarding claim 9, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the intermediate output data comprises output data of an elementwise multiplication between the transformed input data and the transformed filter data (Lin, figure 1 and 9, sections 2.2 and 3.4, after both tiles input and kernels are transformed, an element wise multiplication operation is performed).  

Regarding claim 10, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor comprises an output data transform circuit configured to transform the at least a portion of the intermediate output data and output a corresponding transformation result as transformed output data, and the output data transform circuit is structured based on the parameter of the convolution operation and a type of an algorithm that transforms data to reduce a computational quantity of the convolution operation (Lin, figure 6, WPE comprises Winograd transform modules. Section 3.2 the output gets transform with WTM_A. E.4 provide matrix A [i.e. output transform matrix]).  

Regarding claim 11, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the output transform matrix is determined based on the parameter of the convolution operation and the type of the algorithm, and the output data transform circuit is structured to correspond to the output transform matrix (Lin, section 2.2 provides Winograd convolution, e.4 shows matrix A [i.e. the output transform matrix]. section 3.2 the WTM_A deals with output transform).  

Regarding claim 12, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the parameter of the convolution operation is determined based on any one or any combination of any two or more of a size of output data, a size of filter data, a size of input data, a stride interval, and a padding size (Lin, figure 1, section 2.2 and figure 9 section 3.4, convolution operation is performed on size of tiles input and kernels and output data).   

Regarding claim 13, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor comprises either one or both of an adder and a shifter (Lin, figure 6, section 3.2 MV multiplier are realized by ReRAM crossbars along with shifting and adding trees).  

Regarding claim 14, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including an algorithm that transforms the input data to reduce a computational quantity of the convolution operation is a Winograd algorithm (Lin, section 3.1 the WPE aims to process in Winograd algorithm).  

Regarding claim 15, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including an algorithm that transforms the input data to reduce a computational quantity of the convolution operation is a Strassen algorithm (Sze, page 2309, the Strassen’s algorithm has also been explored for reducing the number of multiplication in DNN).  
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Lin’s controller to decide to compute using Strassen algorithm as disclosed in Sze. This modification would have been obvious because Lin discloses that the architecture to compute in different modes (Winograd based convolution, GEMM based convolution, FC, etc.), and Sze discloses a method to perform matrix multiplication in DNN using Strassen algorithm. Also recognized by Sze, using the Strassen algorithm would reduce the number of multiplication from                         
                            
                                
                                    O
                                    (
                                    N
                                
                                
                                    3
                                
                            
                            )
                        
                     to                         
                            
                                
                                    O
                                    (
                                    N
                                
                                
                                    2.807
                                
                            
                            )
                        
                    .

Regarding claim 16, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor is configured to output a predetermined value in response to a size of the transformed input data being less than or equal to a threshold (Sze, figure 41 page 2320, the activations can be made to be even more sparse by pruning the low valued activation).
	It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Lin’s architecture to perform pruning as disclosed in Sze. This modification would have been obvious because performing pruning allows for additional 11% speedup or 2X power reduction with little impact on accuracy.

Regarding claim 17, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor is configured to compute a value of a nonlinear function to determine whether to activate the transformed filter data (Lin, section 3.2 the functional unit is for activation and pooling operations. we only enable ReLU in the module).  

Regarding claim 18, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor comprises an operation processor configured to perform an operation using the transformed input data and the transformed filter data (Lin, figure 6, sections 2.2 and 3.4, after both tiles input and kernels are transformed, an element wise multiplication operation is performed using the multiplier. Section 3.2 the multiplier is realized by ReRAM crossbars).  

Regarding claim 19, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the operation processor comprises a multiplier configured to perform an elementwise multiplication using the transformed input data and the transformed filter data (Lin, sections 2.2 and 3.4, after both tiles input and kernels are transformed, an element wise multiplication operation is performed using the multiplier. Section 3.2 the multiplier is realized by ReRAM crossbars).  

Regarding claim 20, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the function processor comprises two or more of: an input data transform circuit configured to transform and output the at least portion of the input data; a filter data transform circuit configured to transform and output the at least second portion of the filter data; a multiplier configured to perform a multiplication using the transformed input data and the transformed filter data; and an output data transform circuit configured to transform and output at least a portion of intermediate output data output by performing the multiplication (Lin, figures 1, 5, 6 and 9 section 2.2, 3.2 illustrates WPE [i.e. an input data transform circuit] to transform tiles input IFM to transformed tiles and output to perform subsequent operation [i.e. an input data transform circuit configured to transform and output the at least first portion of the input data], and sections 2.2 and 3.4, after both tiles input and kernels are transformed, an element wise multiplication operation is performed using the multiplier. Section 3.2 the multiplier is realized by ReRAM crossbars [i.e. multiplier configured to perform multiplication...])

Regarding claim 29, recites a method claim that includes all the step that would be practiced by the apparatus claim 1. Thus it is rejected for the same reasons as claim 1.

Regarding claims 30 and 31, recite method claims that include all the step that would be practiced by the apparatus claims 2-3 and 4-5 respectively. Thus it is rejected for the same reasons as claims 2-3 and 4-5.
Regarding claim 32, recites a method claim that includes all the step that would be practiced by the apparatus claim 8. Thus it is rejected for the same reasons as claim 8.
Regarding claim 33, recites a method claim that includes all the step that would be practiced by the apparatus claims 10-11. Thus it is rejected for the same reasons as claims 10-11.
Regarding claim 34, recites a product claim that includes all the step of claim 29 that would be practiced by the apparatus claim 1. Thus it is rejected for the same reasons as claim 1.

Regarding claims 35, Lin teaches a computing apparatus comprising: at least a portion of data among input data and filter data stored in a memory comprises a function in memory (FIM) (Lin, figure 5, section 3.1 the transformed kernels are stored in the ReRAM array; the input bank buffer are composed with SRAM. Abstract, the ReRAM demonstrates the great potential of in memory processing for neural network), one or more processors (Lin figure 5 overview of the whole architecture) configured to transform at least a first portion of the input data using a first transforming matrix (Lin figure 6, section 3.2, the WPE processes tiles IFM and comprises Winograd transform Module in equation 4 (note that e.4 of Lin (Winograd algorithm) is corresponding to e.6 and e.8 of the instant application. Figures 1,9 section 2.2, 3.4, 2x2 input IFM are transformed into 4x4 using WTM_B array, according to e.4, input d is transformed using matrix B) and transform at least a second portion of the filter data using a second transforming matrix (Lin, section 3.1 the transformed convolutional kernels are stored in the ReRAM array. Section 2.2 figure 1, illustrates tiled filter are transformed, e.3 and 3.4 describes weight w is transformed using matrix G), the first transforming matrix and the second transforming matrix being respectively based on a parameter of the convolution operation during a clock cycle and output a corresponding transformation result as transformed data (Lin, figure 1, 5, 6, section 3.2, the WPE consists of Winograd transform modules in equation 4. Section 2.2, the tiles and kernels are transformed into the same dimension. Section 1, when data are stored in ReRAM, we can get the results within a single cycle. The size of IFM tile/kernel is the parameter of the convolution operation).  
Lin does not teach one or more processor to transform the data in response to a read command of at least a portion of data from among the input data and the filter data. However, Sze teaches read operation among the input data and filter data (Sze, figure 21).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the system of Lin to perform a read/write operation command as disclosed in Sze. This modification would have been obvious because discloses method to perform convolution for neural network efficient algorithm, and one of ordinary skill in the art could have combined the elements as claimed by known method and the combination would have yielded predictable results as performing read/write operation to transform data and perform convolution operation. See MPEP 2141.III.(a) combining prior art elements according to known method to yield predictable result.

Regarding claim 36, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the memory configured to store the input data and the filter data for a convolution operation (Lin, figure 5, section 3.1 the transformed kernels are stored in the ReRAM array; the input bank buffer are composed with SRAM). 

Regarding claim 38, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, including the one or more processors comprises a multiplier accumulator (MAC) configured to perform an operation between the transformed input data and the transformed filter data (Lin, figure 9 section 3.4 the crossbars performs element wise matrix multiplication and accumulating across channels).

Claims 21-28 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Lin (NPL - CNNWire: Boosting CNN with Winograd on ReRAM based accelerator) in view of Sze – (NPL – Efficient processing of deep neural network a tutorial and survey) and Luo (US – 20200349217).

Regarding claim 21, Lin teaches a computing apparatus comprising (Lin, figure 5): a memory configured to store input data and filter data for a convolution operation (Lin, figure 5, section 3.1 the transformed kernels are stored in the ReRAM array; the input bank buffer are composed with SRAM); and a function processor (Lin, figure 5 the overview of the whole architecture) configured to transform at least a first portion of the input data using a first transforming matrix (Lin figure 6, section 3.2, the WPE processes tiles IFM and comprises Winograd transform Module in equation 4 (note that e.4 of Lin (Winograd algorithm) is corresponding to e.6 and e.8 of the instant application. Figures 1,9 section 2.2, 3.4, 2x2 input IFM are transformed into 4x4 using WTM_B array, according to e.4, input d is transformed using matrix B) and transform at least a second portion of the filter data using a second transforming matrix (Lin, section 3.1 the transformed convolutional kernels are stored in the ReRAM array. Section 2.2 figure 1, illustrates tiled filter are transformed, e.3 and 3.4 describes weight w is transformed using matrix G), the first transforming matrix and the second transforming matrix being respectively based on a parameter of the convolution operation during a clock cycle and output a corresponding transformation result as transformed data (Lin, figure 1, 5, 6, section 3.2, the WPE consists of Winograd transform modules in equation 4. Section 2.2, the tiles and kernels are transformed into the same dimension. Section 1, when data are stored in ReRAM, we can get the results within a single cycle. The size of IFM tile/kernel is the parameter of the convolution operation).  
Lin does not teach a function processor to transform the data in response to a read command of at least a portion of data from among the input data and the filter data. However, Sze teaches read operation among the input data and filter data (Sze, figure 21).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the system of Lin to perform a read/write operation command as disclosed in Sze. This modification would have been obvious because discloses method to perform convolution for neural network efficient algorithm, and one of ordinary skill in the art could have combined the elements as claimed by known method and the combination would have yielded predictable results as performing read/write operation to transform data and perform convolution operation. See MPEP 2141.III.(a) combining prior art elements according to known method to yield predictable result.
The combined system of Lin in view Sze does not teach a direct memory access (DMA) processor configured to align and store the at least a portion of the data in the memory based on a connection relationship between the memory and the function processor.
However Luo teaches a direct memory access (DMA) processor configured to align and store the at least a portion of the data in the memory based on a connection relationship between the memory and the function processor (Luo, figure 5A, [0118] operands are identified for dedicated data transfer hardware (e.g., a DMA). As shown in figure 5A, DMA 508 connects between memory and matrix fabric and MMU).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the combined system of Lin in view of Sze to include a DMA to transfer data directly to memory. This modification would have been obvious because using DMA to transfer data directly to memory, thereby bypassing the CPU to speed up the memory operation.

Regarding claim 22-25, recite apparatus claims include steps that would be practiced on the apparatus claims 2-4 and 18, respectively, thus, they are rejected for the same reasons as claims 2-4 and 18.

Regarding claim 26, the combined system of Lin in view of Sze and Luo discloses the claim invention as in the parent claim above, including the operation processor comprises any one or any combination of any two or more of a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a neural processing unit (NPU), and a field programmable gate away (FPGA) (Sze, page 2307 section V hardware for DNN processing).  
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the operation processor discloses in Lin to comprises any one of CPU, GPU, or FPGA as disclosed in Sze in section V. This modification would have been obvious because as recognized by Sze, DNN are popularly being used on many hardware platform such as CPU, CPU and FPGA, and one of ordinary skill in the art could have chosen any of the hardware platform to yield a predictable result of performing an operation. See MPEP 2141.III.(a) combining prior art elements according to known method to yield predictable result.

Regarding claims 27 and 28, recite apparatus claims include steps that would be practiced on the apparatus claims 8 and 10, respectively, thus, they are rejected for the same reasons as claims 8 and 10.

Regarding claim 37, the combined system of Lin in view of Sze discloses the claim invention as in the parent claim above, but does not teach further comprises a direct memory access (DMA) processor configured to align and store the at least portion of data in the memory based on a connection relationship between the memory and the one or more processors. However Luo teaches a direct memory access (DMA) processor configured to align and store the at least a portion of data in the memory based on a connection relationship between the memory and the one or more processors (Luo, figure 5A, [0118] operands are identified for dedicated data transfer hardware (e.g., a DMA). As shown in figure 5A, DMA 508 connects between memory and matrix fabric and MMU).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the combined system of Lin in view of Sze to include a DMA to transfer data directly to memory. This modification would have been obvious because using DMA to transfer data directly to memory, thereby bypassing the CPU to speed up the memory operation.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUY DUONG whose telephone number is (571)272-2764.  The examiner can normally be reached on Mon-Friday 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/H.D./Examiner, Art Unit 2182                                                                                                                                                                                                        (571)272-2764

/EMILY E LAROCQUE/Primary Examiner, Art Unit 2182