DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/14/2020 is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the second access bit" in lines 11-12.
Claim 9 recites the limitation "the second access bit" in line 12.  
Claim 17 recites the limitation "the second access bit" in line 12.  
There is insufficient antecedent basis for this limitation in the claim.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1-2, 4-10 and 12-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 2016/0233850) (hereinafter referred as Wang ‘850) in view of Wang et al. (CN103902507) (hereinafter referred as Wang ‘507).
With respect to claim 1, Wang ‘850 teaches a processor, wherein the processor includes a computation array and a cache array (see paragraph 43; device comprises: multi-granularity memory 10 (i.e., cache array), data cache device (i.e., cache array), vector operation device (i.e.. computation array)), a bit width of each cache in the cache array is equal to a bit width of a data unit processed by the computation array (see paragraph 44-45; multi-granularity memory 10 generally includes a multi-granularity to-be-filtered data storage unit 101, a multi-granularity filter coefficient storage unit 102 and a multi-granularity filtering result storage unit 103… multi-granularity to-be-filtered data storage unit 101 and the multi-granularity filter coefficient storage 
Wang et al. does not explicitly teach the method comprising reading the data units in the N input caches to the computation array with the second access bit width, wherein the second access bit width is equal to the bit width of each cache.
However, Wang ’850 teaches wherein the vector operation device 40 is configured to perform a vector operation based on the data to be filtered as read from the data cache device 20 and the output coefficient data 3001 as read from the coefficient buffer broadcast device 30, and write an operation result into the multi-granularity filtering result storage unit 103 (see paragraph 51); the operational size of the vector operation device 40 is identical to the read/write bit width of the multi-granularity to-be-filtered data storage unit 101 and the multi-granularity filter coefficient storage unit 102 (see paragraph 53).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method to include the above mentioned to improve the efficiency of the device (see Wang, paragraph 27).
Wang ‘850 do not teach reading M*N data units from a memory to N input caches in the cache array with a first access bit width, wherein the first access bit width is N times of the bit width of each cache, data units in each column of the M*N data units are stored together in one corresponding input cache of the N input caches, and M and N are positive integers greater than 1.
However, Wang ‘507 teaches reading M*N data units from a memory to N input caches in the cache array with a first access bit width (see page 4, paragraph 16 and page 6, paragraph 7); 
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Wang ‘850 to include the above mentioned to improve the efficiency of the device (see Wang ‘507, page 3, paragraph 7).


claim 2, Wang ‘850 teaches wherein the reading of the data units in the N input caches to the computation array with the second access bit width includes: reading the data units in the N input caches to the computation array with the second access bit width according to a processing sequence of the computation array (see paragraphs 13-15; the vector operation device 40 is configured to perform a vector operation based on the data to be filtered as read from the data cache device 20 and the output coefficient data 3001 as read from the coefficient buffer broadcast device 30, and write an operation result into the multi-granularity filtering result storage unit 103… ep 1): reading a number, BS, of data to be filtered from a data cache device 20 and a number, BS, of output coefficient data).

With respect to claim 4, Wang ‘850 does not explicitly teach storing the data units processed by the computation array to N output caches in the cache array with the second access bit width.
However, Wang ‘850 teaches the vector operation device 40 is configured to perform a vector operation based on the data to be filtered as read from the data cache device 20 and the output coefficient data 3001 as read from the coefficient buffer broadcast device 30, and write an operation result into the multi-granularity filtering result storage unit 103 (see paragraphs 13-15).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method to include the above mentioned to improve the efficiency of the device (see Wang, paragraph 27).
Wang ‘850 does not teach storing the M*N data units from the N output caches to the memory with the first access bit width.

It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Wang ‘850 to include the above mentioned to improve the efficiency of the device (see Wang ‘507, page 3, paragraph 7).

With respect to claim 5, Wang ‘850 does not teach wherein the cache array is a random access memory (RAM) array, a first in first out (FIFO) array, or a register (REG) array.
However, Wang ‘507 teaches wherein the cache array is a random access memory (RAM) array, a first in first out (FIFO) array, or a register (REG) array (see page 3, paragraph 17; register).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor taught by Wang ‘850 to include the above mentioned to improve the efficiency of the device (see Wang ‘507, page 3, paragraph 7).


claim 6, Wang et al. ‘850 teaches wherein the processor is an on-chip component (see Fig. 2 and paragraph 43; multi-granularity memory 10, data cache device, vector operation device), and the memory is an on-chip memory or an off-chip memory (see Fig. 2 and paragraph 43; multi-granularity memory 10, data cache device, vector operation device).

With respect toclaim 7, Wang et al. ‘850 teaches wherein the computation array is a multiply- accumulate (MAC) computation array (see paragraphs 16-17, 68 and 73-74; vector multiplier and accumulator device).

With respect to claim 8, Wang et al. ‘850 teaches wherein the processor further includes the memory (see Fig. 2 and paragraph 43; multi-granularity memory 10, data cache device).

With respect to claim 9, Wang et al. ‘850 teaches a computation array (see paragraph 43; device comprises: vector operation device (i.e.. computation array)); and 
a cache array (see paragraph 43; device comprises: multi-granularity memory 10 (i.e., cache array), data cache device (i.e., cache array)), 
wherein a bit width of each cache in the cache array is equal to a bit width of a data unit processed by the computation array (see paragraph 44-45; multi-granularity memory 10 generally includes a multi-granularity to-be-filtered data storage unit 101, a multi-granularity filter coefficient storage unit 102 and a multi-granularity filtering result storage unit 103… multi-granularity to-be-filtered data storage unit 101 and the multi-granularity filter coefficient storage unit 102 each have a read/write bit width, denoted as BS, identical to an operational size of the vector operation device).

However, Wang ‘850 teaches wherein the vector operation device 40 is configured to perform a vector operation based on the data to be filtered as read from the data cache device 20 and the output coefficient data 3001 as read from the coefficient buffer broadcast device 30, and write an operation result into the multi-granularity filtering result storage unit 103 (see paragraph 51); the operational size of the vector operation device 40 is identical to the read/write bit width of the multi-granularity to-be-filtered data storage unit 101 and the multi-granularity filter coefficient storage unit 102 (see paragraph 53).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor to include the above mentioned to improve the efficiency of the device (see Wang, paragraph 27).
Wang ‘850 does not teach the cache array is configured to read M*N data units from a memory to N input caches in the cache array with a first access bit width, wherein the first access bit width is N times of the bit width of each cache, data units in each column of the M*N data units are stored together in one corresponding input cache of the N input caches, and M and N are positive integers greater than 1.
However, Wang ‘507 teaches cache array is configured to read M*N data units from a memory to N input caches in the cache array with a first access bit width (see page 4, paragraph 16 and page 6, paragraph 7; matrix data (M*N data units) is read from granularities parallel storage to buffer storages 20 and 30 (i.e., N caches)… The data bit width of the many granularities parallel 
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor taught by Wang ‘850 to include the above mentioned to improve the efficiency of the device (see Wang ‘507, page 3, paragraph 7).

With respect to claim 10, Wang ‘850 teaches wherein the computation array is configured to read the data units in the N input caches to the computation array with the second access bit width according to a processing sequence of the computation array (see paragraphs 13-15; the 

With respect to claim 12, Wang ‘850 does not explicitly teach wherein the computation array is further configured to store the data units processed by the computation array to N output caches in the cache array with the second access bit width.
However, Wang ‘850 teaches the vector operation device 40 is configured to perform a vector operation based on the data to be filtered as read from the data cache device 20 and the output coefficient data 3001 as read from the coefficient buffer broadcast device 30, and write an operation result into the multi-granularity filtering result storage unit 103 (see paragraphs 13-15).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor to include the above mentioned to improve the efficiency of the device (see Wang, paragraph 27).
Wang ‘850 does not teach the cache array is further configured to store the M*N data units in the N output caches to the memory with the first access bit width.
However, Wang ‘507 teaches the cache array is further configured to store the M*N data units in the N output caches to the memory with the first access bit width (see page 4, paragraph 16 and page 6, paragraph 7; matrix data (M*N data units) is read from granularities parallel storage and stored in buffer storages 20 and 30 (i.e., N caches)… The data bit width of the many 
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor taught by Wang ‘850 to include the above mentioned to improve the efficiency of the device (see Wang ‘507, page 3, paragraph 7).

With respect to claim 13, Wang ‘850 does not teach wherein the cache array is a random access memory (RAM) array, a first in first out (FIFO) array, or a register (REG) array.
However, Wang ‘507 teaches wherein the cache array is a random access memory (RAM) array, a first in first out (FIFO) array, or a register (REG) array (see page 3, paragraph 17; register).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor taught by Wang ‘850 to include the above mentioned to improve the efficiency of the device (see Wang ‘507, page 3, paragraph 7).

With respect to claim 14, Wang ‘850 teaches wherein the processor is an on-chip component(see Fig. 2 and paragraph 43; multi-granularity memory 10, data cache device, vector operation device), and the memory is an on-chip memory or an off-chip memory (see Fig. 2 and paragraph 43; multi-granularity memory 10, data cache device, vector operation device).

claim 15, Wang ‘850 teaches wherein the computation array is a multiply- accumulate (MAC) computation array (see paragraphs 16-17, 68 and 73-74; vector multiplier and accumulator device).

With respect to claim 16, Wang ‘850 teaches wherein the processor further includes the memory (see Fig. 2 and paragraph 43; multi-granularity memory 10, data cache device).

With respect to claim 17, Wang ‘850 teaches a processor or a computer system (see Fig. 2 paragraph 43; device/apparatus comprises: multi-granularity memory 10 (i.e., cache array), data cache device (i.e., cache array), vector operation device (i.e.. computation array)); 
the processor includes a computation array and a cache array (see paragraph 43; device comprises: multi-granularity memory 10 (i.e., cache array), data cache device (i.e., cache array), vector operation device (i.e.. computation array)); 
wherein a bit width of each cache in the cache array is equal to a bit width of a data unit processed by the computation array (see paragraph 44-45; multi-granularity memory 10 generally includes a multi-granularity to-be-filtered data storage unit 101, a multi-granularity filter coefficient storage unit 102 and a multi-granularity filtering result storage unit 103… multi-granularity to-be-filtered data storage unit 101 and the multi-granularity filter coefficient storage unit 102 each have a read/write bit width, denoted as BS, identical to an operational size of the vector operation device).
	Wang et al. does not explicitly teach the computation array is configured to read the data units in the N input caches to the computation array with the second access bit width, wherein the second access bit width is equal to the bit width of each cache.

It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the device to include the above mentioned to improve the efficiency of the device (see Wang, paragraph 27).
Wang et al. does not teach the cache array is configured to read M*N data units from a memory to N input caches in the cache array with a first access bit width, wherein the first access bit width is N times of the bit width of each cache, data units in each column of the M*N data units are stored together in one corresponding input cache of the N input caches, and M and N are positive integers greater than 1.
However, Wang ‘507 teaches cache array is configured to read M*N data units from a memory to N input caches in the cache array with a first access bit width (see page 4, paragraph 16 and page 6, paragraph 7; matrix data (M*N data units) is read from granularities parallel storage to buffer storages 20 and 30 (i.e., N caches)… The data bit width of the many granularities parallel storage using in the present invention is measured take storage unit as unit, and storage unit is defined as the organization unit of storer, is also the read-write minimum data bit wide of storer. In the present invention, all suppose that minimum data bit wide is that storage unit is 8bit), wherein the first access bit width is N times of the bit width of each cache (page 6, paragraphs 7 
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor taught by Wang ‘850 to include the above mentioned to improve the efficiency of the device (see Wang ‘507, page 3, paragraph 7).

With respect to claim 18, ‘850 teaches wherein the computer system includes a memory configured to store a computer-executable instruction; and the processor configured to access the memory (see paragraph 50-51; command queue).

Claims 3 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 2016/0233850) (hereinafter referred as Wang ‘850) in view of Wang et al. (CN103902507)  as applied to claims 1-2 and 9-10 above, and further in view of Motoya et al. (US 2018/0276527).
With respect to claim 3, Wang ‘850 and Wang ‘507 do not teach wherein the data units are eigenvalues in a feature map, and the processing sequence is a processing sequence in a convolutional neural network.
However, Motoya et al. teaches wherein the data units are eigenvalues (see paragraph 138, eigenvalue), and the processing sequence is a processing sequence in a convolutional neural network (see paragraph 1 and 7; convolutional neural network).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Wang ‘850 and Wang ‘507 to include the above mentioned to improve the efficiency of the device (see Motoya, paragraph 6).

With respect to claim 11, Wang ‘850 and Wang ‘507 do not teach wherein the data units are eigenvalues in a feature map, and the processing sequence is a processing sequence in a convolutional neural network.
However, Motoya et al. teaches wherein the data units are eigenvalues (see paragraph 138, eigenvalue), and the processing sequence is a processing sequence in a convolutional neural network (see paragraph 1 and 7; convolutional neural network).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the processor taught by Wang ‘850 and Wang ‘507 to include the above mentioned to improve the efficiency of the device (see Motoya, paragraph 6).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARACELIS RUIZ whose telephone number is (571)270-1038.  The examiner can normally be reached on Monday-Friday 11:00am-7:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald G. Bragdon can be reached on (571)272-4204.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ARACELIS RUIZ/Primary Examiner, Art Unit 2139