DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) s 1-2, is/are rejected under 35 U.S.C. 103 as being unpatentable over Nurvitadhi (patent application publication No. 2018/0189234) in view of Vantrease (patent No. 10,678,508).
Nurvitadhi  taught the invention substantially as claimed including  (as to claim 1) including  A graphics processor (3645,3738) (e.g., see paragraphs 0276, 0281) comprising: one or more hardware tiles(404) including sparse matrix multiply acceleration hardware (e.g., see paragraphs 0208-0212) including a modular processing array with feedback inputs (e.g., see figs. 12,13a,13b), the modular processing array including one or more processing array modules having a first number of pipeline paths, the first number of pipeline paths having a second number of pipeline stages(e.g., see fig. 12), wherein a first pipeline stage is configurable to receive feedback output from a final pipeline stage[the stage that provides multiple outputs to stage 1209 in fig. 12 receives input from the sum stage 1211]. 
Nurvitadhi did not expressly detail that the processor array was a systolic array. Vantrease however taught implementing sparse array processing and dot multiply operating using systolic array (e.g. see col. 15, lines 17-55 and col. 11, lines 41-61 and col. 20, lines 21-29). 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Nurvitadhi and Vantrease.. Both references were directed toward the problems of performing matrix multiply operations in a data processor  in a data processor. One of ordinary skill in the art would have been motivated to incorporate the Vantrease teachings of the matrix processing operation as a systolic array at least to provide efficient repeated multiplication processing of the array such as used for convolution (e.g., see col. 15, lines 17-27 of Vantrease). This would have provided an additional application of the combined system (for applications that performed convolution). Also the addition of the Vantrease teachings would have yielded predictable results at least because both references were directed toward processing using programmable processors performing similar processes namely array multiplication in processor. 
Nurvitadhi  taught and the one or more processing array modules include pipeline paths configured with shared hardware circuitry to read data elements associated with a first source input and separate hardware circuitry to read data elements associated with a second source input (e.g., see figs. 3,4 and paragraphs 0053-0055).
As to claim 2 Nurvitadhi and Vantrease taught  the graphics processor as in claim 1, Nurvitadhi taught wherein the modular systolic processing array includes multiple array modules (2220) (e.g., see fig. 22 and paragraphs 0200-0201).

Claim(s) 3,4,5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nurvitadhi  and Vantrease as applied to claim 1 above, and further in view of Narayanamoorhy (patent application publication No. 2019/0042542).

As to claim 3 Nuvitadhi and Vantrease The graphics processor as in claim 1, Narayanamoorthy taught wherein the one or more processing array modules include pipeline paths configured with separate hardware circuitry to read data elements associated with a first source input and separate hardware circuitry to read data elements associated with a second source input (e.g., see figs. 3,4 and paragraphs 0053-0055 note the separate inputs from sparse source Matrix 312 and dense source   matrix 314 and inputs 318 and 320 separate hardware circuitry).
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Narayanmoorthy and Nurvitadhi. Both references were directed toward the problems of performing multiplication of sparse array with a dense array. One of ordinary skill in the art would have been motivated to incorporate the Narayanamoorthy teaching of separate inputs for reading first and second data elements at least to speed access to the data for processing increasing throughput for matrix processing. Also the addition of the Naranamoorthy references would have yielded predictable results at least because both references  were directed to using programming processor performing similar processes  namely multiplication of sparse  and dense arrays. 

As to claim 4 Narayanmoorthy and Vantrease taught  The graphics processor as in claim 1,  Narayamoorthy taught wherein the one or more processing array modules include hardware circuitry configured to detect non-zero data elements in the second source input and (e.g., see paragraph 0050, 0058-0061 of Narayamoothy).As to selectively perform dot product operations based on the non-zero data elements of the second source input and data elements of the first source input that correspond with the non-zero data elements of the second source input, Narayamoorthy did not expressly detail the method was used for dot product operation. However Nurvitadhi  taught this limitation for use with sparse-dense matrix multiplication (e.g., see paragraphs 0231,0238).

As to claim 5 Nuvitadhi and Vantrease taught  The graphics processor as in claim 1, Narayamorrthy taught wherein the one or more processing array modules include a pipeline path including separate output hardware for each pipeline stage (e.g., see fig. 4, note the portions providing  output via lines 424 and 426 and 428 and 434 provide separate output hardware).

Claim(s) 7-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Narayanamoorthy in view of Nurvitadhi.
As to claim 7 Narayanamoothy taught  A method of performing a dot product operation on a set of input matrices via a hardware matrix multiply accelerator having a multi-stage processing pipeline, the method comprising: reading, via a first source operand, multiple data elements of a first matrix into memory of the hardware matrix multiply accelerator (e.g., see figs. 3,4,6, and paragraphs 0046,0050,0058-0061); reading, via a second source operand, multiple data elements of a second matrix into the memory of the hardware matrix multiply accelerator(e.g., see figs. 3,4,6, and paragraphs 0046,0050,0058-0061); detecting non-zero values within the multiple data elements of the second matrix; grouping the non-zero values within the multiple data elements of the second matrix into a group including one or more data elements (e.g., see paragraphs 0050,0058-0061); providing a data element of the group to a corresponding stage of the processing pipeline (e.g., see paragraph 0061)[the elements of the dense matrix are broadcasted to multiple channels]; multiplying, a provided data element of the group with multiple data elements of the first matrix to generate a set of products (e.g., see paragraph 0061); summing the set of products and accumulating a sum of the set of products with an accumulator value(e.g., see paragraph 0061[the accumulator array (418) sums the set of products with an accumulator value]; and writing the accumulator value to a next stage of the processing pipeline (e.g., see paragraphs 0064-0066 and fig. 4) [the accumulated result is sent and written to rounding stage and sent to and written to scratch pad page].
Narayanamoorthy did not expressly detail the method was used for dot product operation. However Nurvitadhi taught this limitation for use with sparse-dense matrix multiplication (e.g., see paragraphs 0231,0238).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Narayanamoorthy and Nurvitadhi. Both  references were directed toward the problems of performing multiplication of sparse array with dense array. One of ordinary skill would have been motivated to incorporate the Nurvitadhi teachings of performing the multiplication on dense and sparse array for dot products at least to provide additional application(s) for the system. Also, the addition of the Nurvitadhi teachings would have yielded predictable results at least because both references were directed to processing using programmable processors performing similar processes namely multiplication of sparse and dense arrays.

 	As to claim 8 Narayanamoorthy and Nurvitadhi taught The method as in claim 7, Narayanamoorthy taught  wherein a number of data elements of the group including one or more data elements corresponds with a number of stages in the multi-stage processing pipeline of the hardware matrix multiply accelerator(e.g., see 2B,fig. 3 and paragraphs 0054-0056).

As to claim 9. Narayanamoorthy and Nurvitadhi taught The method as in claim 7, Nurviadhi taught wherein writing the accumulator value to the next stage of the processing pipeline includes writing a pipeline feedback value to a first stage of the processing pipeline (e.g., see figs 12,13a,13b)[the sum from stage 1211 is fed back to the previous stage in the pipeline  for further accumulation by stage 1209].

As to claim 10 Narayanamoorthy and Nurvitadhi taught. The method as in claim 7, Narayanamoorthy taught  wherein providing a data element of the group to a corresponding stage of the processing pipeline includes broadcasting the data element to multiple channels of a processing element of the corresponding stage(e.g., see paragraphs 0054-0055 and figs. 2B, 3)[note the parallel multiplication of element (6,1) of sparse source matrix 312 with  element in row 1 with every element of dense source matrix 314 provides this limitation]  .

As to claim 11 Narayanamoorthy and Nurvitadhi taught The method as in claim 7  Nurvitadhi taught, wherein detecting the non-zero values within the multiple data elements of the second matrix includes detecting the non-zero values within the memory of the hardware matrix multiply accelerator (e.g., see paragraphs 0076-0078,0080][note changing arrays stored in matrix form in memory to compressed form eliminating zeros provides this limitation as this would require detecting the zeros so they would not be stored together with non-zero data]. 

As to claim 12 Narayanamoorthy and Nurvitadhi taught  The method as in claim 7, Naranaynamoothy taught wherein the multi-stage processing pipeline of the hardware matrix multiply accelerator includes multiple pipeline paths (e.g., see fig. 3, the parallel multiply stages and parallel addition stages provide the multiple stages of including multiple paths].

As to claim 13 Nurvitadhi and Narayanamoorthy The method as in claim 12, Narayanamoorthy taught further comprising providing data elements of the first matrix and the second matrix[note the matrices stored in paragraph 0053 provides this limitation also the sparse source matrix and dense source matrix of figs 2A,2B,3 provide this limitation] to the multiple pipeline paths via shared hardware circuitry associated with the first source operand and separate hardware circuitry associated with the second source operand (e.g., see figs. 3,4 and paragraphs 0053-0055).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-5 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 12-16 of U.S. Patent No. 11,204,977. Although the claims at issue are not identical, they are not patentably distinct from each other because the side by side showing of representative claims of the patent and the instant application show that both are directed common subject matter

Instant application (SN 17/527882)
Patent No. 11,204,977
1. A graphics processor comprising: 

one 
or more hardware tiles

 including sparse matrix
 multiply acceleration hardware including a modular systolic processing array with feedback inputs, the modular systolic processing array including one or more processing array modules 


having a first number of pipeline paths, the first number of pipeline paths having a second number of pipeline stages, wherein a first pipeline stage is configurable to receive feedback output from a final pipeline stage and the one or more processing array modules include pipeline paths configured with shared hardware circuitry to read data elements associated with a first source input and separate hardware circuitry to read data elements associated with a second source input.
12. An accelerator device comprising: a host interface; a fabric interconnect coupled with the host interface; and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a modular systolic processing array with feedback inputs, the modular systolic processing array comprising multiple processing array modules, the multiple processing array modules including one or more processing array modules having a first number of pipeline paths, the first number of pipeline paths having a second number of pipeline stages, 
wherein a first pipeline stage is configurable to receive feedback output from a final pipeline stage and the one or more processing array modules include pipeline paths configured with shared hardware circuitry to read data elements associated with a first source input and separate hardware circuitry to read data elements associated with a second source input.



Allowable Subject Matter
Claim 6 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 14-20 are allowed.
The following is a statement of reasons for the indication of allowable subject matter:   Claims 6 requires among other things: 
The graphics processor as in claim 5, wherein the one or more processing array modules include a first pipeline path configurable to execute a first dot product instruction having a first set of inputs and a second pipeline path configurable to execute a second dot product instruction having a second set of inputs.
Claim 14 requires among other things: 
 	An accelerator device  comprising: a host interface; a fabric interconnect coupled with the host interface; and one or more hardware tiles coupled with the fabric interconnect  the one or more hardware tiles including sparse matrix multiply acceleration hardware  including a modular systolic processing array with feedback inputs  , the modular systolic processing array including one or more processing array modules including a first pipeline path configurable to execute a first dot product instruction having a first set of inputs and a second pipeline path configurable to execute a second dot product instruction having a second set of inputs.
	The closest prior art includes Narayanamoorthy and and Vantrease and  Nurvitadhi.  The closest prior art taught the limitations of the claims that claim  6 depends and some of the limitations of claim 14. 
	However the closest prior art did not teach the limitation(s) of claims 6 and 14 are shown above. 
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
	Sivaraman (patent No. 10,628,622) disclosed stream FIFO insertion in a compilation flow for a heterogeneous multi-core  architecture (e.g., see abstract).
`	Bajic (patent application publication No. 2019/0379396) disclosed processing core data compression and storage system (e.g., see abstract).
	Olsen (patent No. 10,387,122) disclosed residue number matrix multiplier (e.g,. see abstract).
	Jennings (patent 2017/0052857) disclosed simultaneous multiprocessor apparatus applicable to achieving exascale performance for algorithms and program system  (e.g., see abstract). 
	Langhammer (patent No. 9,207,908) disclosed digital signal processing blocks with embedded arithmetic circuits (e.g., see abstract).
Bleiweiss (patent application publication No. 2019/0205737) disclosed machine learning accelerator mechanism  with feed back(e.g,. see abstract and fig. 17)
	Zlateski (patent application publication No. 2020/0160181) disclosed system for generation of sparse code for convolution neural networks (e.g., see abstract).
	Frumkin (patent application publication No. 2019/027866) disclosed tiled compressed sparse matrix format (e.g., see abstract).
Daga (patent application publication No. 2016/0140084) disclosed efficient sparse matrix vector multiplication on parallel processors (e.g., see abstract). 
Pearce (Patent application publication No. 2019/0042269) disclosed apparatus for gang invariant operation optimizations (e.g., see abstract)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC
/ERIC COLEMAN/           Primary Examiner, Art Unit 2183