Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 8,,9,16,17,21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 8 (line 6) includes the language “arbitrated memory banks”. This structure and/or operation performed to provide these arbitrated memory banks was not described in the specification in such way as to reasonably convey to one of ordinary skill that the inventor at the time the application was filed has possession of arbitrated memory banks.

	Claims that depend from claim 8, 16, 21 respectively include the above problems. 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 8,9,16,17,21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 8 (line 6) includes the language “arbitrated memory banks”, the scope of meaning of this language is not clear. (i.e., what particular operation is performed to yield arbitrated memory banks and what type of arbitration is performed). Claim 16 (line 7), and claim 21 (line 7) also include this language. 
Claims that depend from claims 8, 16 or 21 include the same ambiguity above.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-6,10-14,18,19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang,L. et al., (IEEE paper entitled The Impulse Memory Controller) (hereafter referred to as Zhang) in view of  Daultani, V.  et al., (Springer paper entitiled  Sparse Direct Convolutional Neural Network at pp. 293-302 of Springer publication entitled Advances in Neural networks)(hereafter referred to as Daultiani).


1. (Canceled)

Zhang taught the invention substantially as claimed including (as to claim  2) including A method for accessing multi-dimensional tensors to accelerate computations (e.g., see page 1118, first column  in introduction section including fig. 1), the method comprising: receiving inputs for a first tensor comprising a plurality of elements(e.g., see fig. 4)[the data provides the inputs], wherein each element of the first tensor represents a respective dimensional location for a corresponding input of the received inputs(note the locations of the data in array/tensor A provide the dimensional location(s); storing, in a first memory(pseudo-virtual memory in fig. 2), each input for the first tensor at a distinct address location of the first memory(e.g., see  page 1119 first column and fig. 2)(the strided mapping  of an address soffset in shadow space to pseudovirtual address soffset to pvaddr + stride + soffset  where pvaddr is starting address of the data structures virtual image provides this limitaiton), wherein each address location corresponds to an element of the plurality of elements, where the element is along a particular dimension of the first tensor (e.g., see fig. 4 and first column and second column of page 1120)[note Zhang conventional memory provides each element of  column of data is used only once and also in the impulse implementation the remap of element to x’ address to  physical memory column… where all  elements have different addresses] ; fetching, from the first memory, a first plurality of inputs, wherein each input of the first plurality of inputs corresponds to a respective element of the plurality of elements (e.g., see figs. 1, 2, 3 where elements go from pseudo-virtual memory to physical memory using a page table), wherein the respective element is one of multiple elements in a sequence of e.g., see fig. 4)[the elements are in a sequence along a row or column], and wherein the first plurality of inputs are fetched from non-contiguous address locations of the first memory; and processing, through the first neural network layer, each input of the first plurality of inputs fetched from the non-contiguous address locations of the first memory (e.g., see page 1118,  section 2 Impulse Architecture; and section 3.2 and section 3.3;  and section 3.4 on page 1121 which teaches processing of tiled data). 
	Zhang did not expressly detail  for a neural network having a plurality of neural network layers  and fetch to a first neural network layer  Daultani  however taught passing feature maps the through activation layer and inserting a pooling layer which reduces spatial size  and the connecting of layers, and multiplication provides output of fully connected layers. (e.g., see  fig. 1 and pages 295-296 This provides plural neural network layers and fetching data to  neural network layer as data in figure 1 is fetched from one layer to the next layer) and to generate a neural network output for the first neural network layer . Daultani however taught the processing (including convolution and matrix multiplication) of data  using  sparse filters and  using convolutional layer and input layer and activation network layers (e.g., see fig.1,2,3 and page 295-296 and sections 4.1 and 4.2 on page 298-299). 
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Zhang and Daultani. Both references were directed toward the problems of processing tensors that where data was arranged in a memory in sparse manner (e.g., see page 294 lines 5-11 of Daultani).  One of ordinary skill would have been motivated  (see KSR Int’l Co. v. Teleflex Inc., 550, 415-421, 82 USPQ2d 1385, 1395-07 (2007)). 
Due to the similarities between claims 2 and 10 and 18, claims 10 and 18 are rejected for the same reasons as claim 2 above. A to the further limitations of claim 10 and 18, Zhang taught processing (CPU) and non-transitory storage medium(s) (DRAM, MMU, L1, L2) in fig. 3 on page 1119. 

As to claims 3,11, and first portion/clause of claim 19 [beginning with “fetching”]  Zhang and Daultani taught  The method of claim 2, Daultani taught  further comprising: fetching, , a second plurality of inputs to a second neural network layer, wherein each input of the second plurality of inputs corresponds to a respective element of the plurality of elements, wherein the respective element is one of multiple elements in a sequence of elements along a second (e.g., see page 297 section 2.2 Daultani teachings matrix multiplication with  transformation of  both  feature maps for each image and filter to 2D image)]note this would have required fetching of both of the feature maps from memory such as shown in figure 1,3) as to the fetch being from (see KSR Int’l Co. v. Teleflex Inc., 550, 415-421, 82 USPQ2d 1385, 1395-07 (2007)). 
 And one of ordinary skill would have been motivated to implement  the second plurality of inputs are fetched from non-contiguous address locations of the first memory At least to provide the sparse filter operations taught by Daultani (e.g., see page 298 section 4.1)


As to claims 4,12, and second portion/clause of claim 19 (beginning with “processing”),  Zhang and Daultani taught  the method of claim 3,  Daultanit taught further comprising: processing, through the second neural network layer, each input of the second plurality of inputs fetched from the non-contiguous address locations of the first memory to generate a neural network output for the second neural network layer , wherein the second neural network layer is different than the first neural network layer ( see fig. 1 and page 295 where convolution is performed in convolution neural network through convolutional layers, and see section 2.2 on page 296 where Daultani teaches performing matrix multiplication) Therefore one ordinary skill would have been motivated to process each input of the second array required for matrix multiplication (see KSR Int’l Co. v. Teleflex Inc., 550, 415-421, 82 USPQ2d 1385, 1395-07 (2007)).  

As to claims 5,13 Zhang and Daultani taught  The method of claim 4, Daultani taught  wherein the first tensor is: an input tensor comprising inputs for processing through the first neural network layer to generate a set of output activations that correspond to the neural network output for the first neural network layer (e.g., see fig. 1 and text of page 295-296)( the input image or tensor is taught as passing through the input layer, convolutional layer and activation layer and pooling layer  to provide a fully connected activation).

As to claims 6, 14  Zhang and Daultani taught  The method of claim 5, Daultaini taught  wherein: the first neural network layer is a convolutional layer of the neural network; and the second neural network layer is a pooling layer of the neural network. (e.g., see fig 1 and text of page 295).
Claims 7-9, 15-17, 20,21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang and Daultani as applied to claims 2-6,10-14,18-19 above, and further in view of Zeng, D.  et al., (Springer paper, entitled  Compressing Accelerating Neural Network for Facial Point  Localization pp. 359-367). .

As to claims 7,15,20  Zhang and Daultani taught The method of claim 2, Zeng taught  further comprising: receiving a plurality of weights for a second tensor 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Zhang and Zeng. Both references were directed toward the problems processing arrays in a data processor. One of ordinary skill would have been motivated to incorporate the Zeng teachings of storing weights in the tensor for processing of tensors at least to provide reduced memory consumption (see Zeng page 360 second column, lines 10-27) Also the addition of the Zeng teachings would have been merely choosing from a finite number of identified predictable solutions (see KSR Int’l Co. v. Teleflex Inc., 550, 415-421, 82 USPQ2d 1385, 1395-07 (2007)).  


As to claims 8,16,21 are understood Zhang and Daultani and Zeng taught  The method of claim 7, Zhang taught  further comprising: fetching, using a tensor traversal unit (impulse MMC)(e.g., see fig. 3), the first plurality of inputs from the non-contiguous (e.g., see page 1118  section 2 Impulse Architecture; and section 3.2 and section 3.3),  ; Zhang did not expressly detail  concurrently fetching, using the tensor traversal unit, the plurality of weights from address locations of arbitrated memory banks of the second memory that correspond to elements along a dimension of the second tensor  Zeng however taught convolutional kernels and weight matrix  with merging kernels  and performing product quantization. For performing the product quantization  one would have been motivated to access for by fetching the weights and data used in the quantization data in the arrays concurrently at least to reduce time for accessing the data for processing which would have improved throughput also this would have been merely choosing from a finite number of identified predictable. solutions (see KSR Int’l Co. v. Teleflex Inc., 550, 415-421, 82 USPQ2d 1385, 1395-07 (2007)).  
.


As  claims 9 and 17 are understood Zhang and Daultani and Zeng taught  The method of claim 8, Zeng taught wherein the inputs and the plurality of weights represent operands used to perform a neural network computation and the method further comprises: storing one or more operands in the first memory and in the second memory such that a dimensional layout of the first tensor and a dimensional layout of the second tensor enables accelerated performance of a plurality of neural network computations that correspond to an inference workload (e.g., see  section   entitled Pruning neurons and Connections on page 361 and section entitled network retraining on page 361  and fig. 2)[note the plural layers provide the first and second memory that store weights  (W) and second that store operands (input x)  and fig. 3 shows corresponding inference of first and second tensors providing corresponding to inference workload as the pruning of the weight matrix when an index zero and where measure  importance of a neuron is based on  sum of  connection correlations).
Response to Arguments
Applicant’s arguments, see remarks, filed 04/28/2021, with respect to the rejection(s) of claim(s) 2-21 under 25 U.S.C 112 first paragraph have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Zhang, and Daultani for claims 2-7,10-15,18,19, 20  and Zhang,Daultani and Zeng for claims 7, 15, 20 and  rejections under 35 U.S.C. 112 second paragraph.and 35 U.S.C. 112 first paragraph above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
	Doshi (patent No. 6,243,734) disclosed computer product and method for sparse matrices (e.g., see abstract and fig. 8). 
	Young (patent No. 9,721,203) disclosed performing kernel striding in hardware (e.g., see abstract and fig.10).

Sanghai (patent application publication No. 20115224) disclosed memory interconnect network architecture for vector processor (e.g., see abstract and fig. 6).
Hartono (patent application publication No. 2014/0189288) disclosed instruction to reduce elements in a vector register with strided access pattern (e.g., see abstract).
Yang (patent application publication No. 2016/0342888) disclosed memory efficiency for convolutional neural networks operating on graphics processing units (e.g., see abstract and fig. 1).
Woo (patent application publication No. 2017/0220345) disclosed accessing multi-dimensional tensors (e.g., see abstract and figs; 1, 3, 4).
Young (patent application publication No. 2018/0165577) disclosed performing average pooling in hardware (e.g., see abstract and fig. 8). 
Nurviadhi (patent application publication No. 2018/0189675) disclosed hardware accelerator architecture and template for web scale k-means clustering (e.g, see abstract and figs. 1,21,23,27).
El-Khamy (patent application publication No. 2018/0300624) disclosed method and apparatus for reducing computational complexity of convolutional neural networks (e.g., see abstract and figs. 2, 3, 6).
Lu, L. et al., SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs,2018, ACM, 6 pages.  (Year: 2018)
Cabezas, J., Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes, 2015, ACM, 3-13 (Year: 2015).


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner




EC
/ERIC COLEMAN/Primary Examiner, Art Unit 2183