DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:  The paragraphs of the specification are numbered incorrectly.  From pages 1-38, the paragraphs are numbered from [0001]-[00159].  From pages 39-47, the paragraphs are numbered from [0001]-[0033].  From pages 47-50, the paragraphs are numbered from [00135]-[00155].  Appropriate correction is required.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10,997,686 in view of Moss, D et al. 2018. A customizable matrix multiplication framework for the Intel HARPv2 Xeon + FPGA platform. In Proceedings of the ACM/SIGDA International Symposium on Field .
Instant Application 17-234,039
U.S. Patent No. 10,997,686
Claim 1
A graphics processor comprising: 
a first tile of graphics processing resources; 

a second tile of graphics processing resources, 

the second tile of graphics processing resources including circuitry to accelerate a matrix operation; 

and an interface between a host system and the graphics processor, the interface to receive a set of commands for a workload having a first partition and a second partition, submit the set of commands to the first tile of graphics processing resources, and submit the set of commands to the second tile of graphics processing resources; 

wherein the first tile of graphics processing resources is to read a first partition identifier from a first hardware context, the first partition identifier associated with the first partition, and conditionally execute commands of the 


and wherein the second tile of graphics processing resources is to read a second partition identifier from a second hardware context, 

the second partition identifier associated with the second partition, and conditionally execute commands of the second partition while bypassing commands of the first partition, 


wherein to conditionally execute the commands of the second partition includes to execute a command associated with the matrix operation.

A graphics processor comprising: a first tile of graphics processing engines; 


a second tile of graphics processing engines; 





and an interface between a host system and the graphics processor, the interface to receive a set of commands for a workload having a first partition and a second partition, submit the set of commands to the first tile of graphics processing engines, and submit the set of commands to the second tile of graphics processing engines; 


wherein the first tile of graphics processing engines is to read a first partition identifier from a first hardware context, the first partition identifier associated with the first partition, and 

and wherein the second tile of graphics processing engines is to read a second partition identifier from a second hardware context, 

the second partition identifier associated with the second partition, and conditionally execute commands of the second partition while bypassing commands of the first partition.

the second tile of graphics processing resources including circuitry to accelerate a matrix operation; (“The hardware template illustrated in Fig. 1 contains the systolic GEMM… As illustrated in Fig. 4a, the hardware template is a systolic array of processing elements (PEs), each containing a dot product module and two memory buffers… The systolic array operates by iteratively processing chunks of the input matrices stored in the feeders… We presented a customizable matrix multiplication framework that includes a simple software API and hardware template for designing custom GEMM accelerators on the HARPv2.”; Moss, p. 110, st para under “4 Hardware Template”, p. 114, 1st para under “8 Conclusion”).  The GEMM matrix multiplication framework includes GEMM accelerators.  The GEMM includes a systolic array of processing elements (PE), each of which includes a dot product module.  Fig. 4a illustrates the systolic array, where each of the PEs is considered to be a tile of graphics processing resources (which would include a second tile of graphics processing resources).  Each of the PEs (tiles) includes a dot product module (circuitry to accelerate a matrix operation).  Moss can be combined with the U.S. Patent to arrive at the claimed invention of the Instant Application.  
     Moss teaches wherein to conditionally execute the commands of the second partition includes to execute a command associated with the matrix operation (“The hardware template also supports heterogenous load balancing.  At runtime the workload is partitioned across both the FPGA and the CPU.  In the case of a GEMM, the A and B matrices are divided into sub blocks and the computation is balanced across the two compute engines.”; Moss, p. 110, 1st para under “3.1.1 Heterogeneous Load Balancing”)
Moss teaches the workload (commands), which includes matrix commands, is divided into sub blocks (partitions, which include a second partition) and balanced across two compute engines for execution (execute a command associated with the matrix operation).  Moss can be combined with the U.S. Patent to arrive at the claimed invention of the Instant Application.  Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing date to modify the U.S. Patent by adding the feature of the second tile of graphics processing resources including circuitry to accelerate a matrix operation;… wherein to conditionally execute the nd para under “Abstract”)

The graphics processor as in claim 1, the interface to the host system further to receive a command to associate the first hardware context with the first tile of graphics processing resources.
Claim 2
The graphics processor as in claim 1, the interface to the host system further to receive a command to associate the first hardware context with the first tile of graphics processing engines.
Claim 3
The graphics processor as in claim 2, the interface to the host system further to receive a command to configure the first hardware context based on a first logical render context.
Claim 3
The graphics processor as in claim 2, the interface to the host system further to receive a command to configure the first hardware context based on a first logical render context.
Claim 4
The graphics processor as in claim 3, the interface to the host system further to receive a command to associate the second hardware context with the second tile of graphics processing resources.
Claim 4
The graphics processor as in claim 3, the interface to the host system further to receive a command to associate the second hardware context with the second tile of graphics processing engines.
Claim 5
The graphics processor as in claim 4, the interface to the host system further to receive a command to configure the second hardware context based on a second logical render context.
Claim 5
The graphics processor as in claim 4, the interface to the host system further to receive a command to configure the second hardware context based on a second logical render context.
Claim 6




The graphics processor as in claim 6, wherein the first hardware context includes a first offset within the memory buffer associated with a start of the first partition, the second hardware context includes a second offset within the memory buffer associated with the start of the second partition, and the first hardware context and the second hardware context each include a step value associated with a number of partitions of the workload.
Claim 7
The graphics processor as in claim 6, wherein the first hardware context includes a first offset within the memory buffer associated with a start of the first partition, the second hardware context includes a second offset within the memory buffer associated with the start of the second partition, and the first hardware context and the second hardware context each include a step value associated with a number of partitions of the workload.
Claim 8
The graphics processor as in claim 7, wherein the first tile of graphics processing resources is to begin execution of commands for the first partition with a command stored at the first offset within the memory buffer.
Claim 8
The graphics processor as in claim 7, wherein the first tile of graphics processing engines is to begin execution of commands for the first partition with a command stored at the first offset within the memory buffer.
Claim 9
The graphics processor as in claim 7, wherein the second tile of graphics processing resources is to begin execution of commands for the second 

The graphics processor as in claim 7, wherein the second tile of graphics processing engines is to begin execution of commands for the second partition with 

The graphics processor as in claim 7, wherein the first tile of graphics processing resources is to synchronize with the second tile of graphics processing resources when execution of the first partition and the second partition completes.
Clam 10
The graphics processor as in claim 7, wherein the first tile of graphics processing engines is to synchronize with the second tile of graphics processing engines when execution of the first partition and the second partition completes.
Claim 11
A non-transitory machine-readable medium storing instructions which cause one or more processors to perform operations, 

wherein the one or more processors include a graphics processor and the operations comprise: 
receiving a set of commands for a workload having a first partition and a second partition; 

submitting the set of commands to a first tile of graphics processing resources of the graphics processor; 

submitting the set of commands to a second tile of graphics processing resources of the graphics processor, 



at the first tile of graphics processing resources, reading a first partition identifier from a first hardware context, the first partition identifier associated with the first partition, and conditionally executing commands of the first partition while bypassing commands of the second partition; 

and at the second tile of graphics processing resources, reading a second partition identifier from a second hardware context, the second partition identifier associated with the second partition, and conditionally executing commands of the second partition while bypassing commands of the first partition, 

wherein conditionally executing the commands of the second partition includes executing a command associated with the matrix operation.

A non-transitory machine-readable medium storing instructions which cause one or more processors to perform operations, 

wherein the one or more processors include a graphics processor and the operations comprise: 
receiving a set of commands for a workload having a first partition and a second partition; 

submitting the set of commands to a first tile of graphics processing engines of the graphics processor; 

submitting the set of commands to a second tile of graphics processing engines of the graphics processor; 





at the first tile of graphics processing engines, reading a first partition identifier from a first hardware context, the first partition identifier associated with the first partition, and conditionally executing commands of the first partition while bypassing commands of the second partition; 

and at the second tile of graphics processing engines, reading a second partition identifier from a second hardware context, the second partition identifier associated with the second partition, and conditionally executing commands of the second partition while bypassing commands of the first partition.

the second tile of graphics processing resources including circuitry to accelerate a matrix operation; (“The hardware template illustrated in Fig. 1 contains the systolic GEMM… As illustrated in Fig. 4a, the hardware template is a systolic array of processing elements (PEs), each containing a dot product module and two memory buffers… The systolic array operates by iteratively processing chunks of the input matrices stored in the feeders… We presented a customizable matrix multiplication framework that includes a simple software API and hardware template for designing custom GEMM accelerators on the HARPv2.”; Moss, p. 110, 1st para under “4 Hardware Template”, p. 114, 1st para under “8 Conclusion”).  The GEMM matrix multiplication framework includes GEMM accelerators.  The GEMM includes a systolic array of processing elements (PE), each of which includes a dot product module.  Fig. 4a illustrates the systolic array, where each of the PEs is considered to be a tile of graphics processing resources (which would include a second tile of graphics processing resources).  Each of the PEs (tiles) includes a dot product module (circuitry to accelerate a matrix operation).  Moss can be combined with the U.S. Patent to arrive at the claimed invention of the Instant Application.  
     Moss teaches wherein to conditionally execute the commands of the second partition includes to execute a command associated with the matrix operation (“The hardware template also supports heterogenous load balancing.  At runtime the workload is partitioned across both the FPGA and the CPU.  In the case of a GEMM, the A and B matrices are divided into sub blocks and the computation is balanced across the two compute engines.”; Moss, p. 110, 1st para under “3.1.1 Heterogeneous Load Balancing”).  Moss teaches the workload (commands), which includes matrix commands, is divided into sub blocks (partitions, which include a second partition) and balanced across two compute engines for execution (execute a command associated with the matrix operation).  Moss can be combined with the U.S. Patent to nd para under “Abstract”)

The non-transitory machine-readable medium as in claim 11, the operations further comprising: receiving a first command to associate the first hardware context with the first tile of graphics processing resources; 

and receiving a second command to associate the second hardware context with the second tile of graphics processing resources.
Claim 12
The non-transitory machine-readable medium as in claim 11, the operations further comprising: receiving a first command to associate the first hardware context with the first tile of graphics processing engines; 

and receiving a second command to associate the second hardware context with the second tile of graphics processing engines.
Claim 13
 The non-transitory machine-readable medium as in claim 12, the operations further comprising: 
receiving a third command to configure the first hardware context based on a first logical render context; 



The non-transitory machine-readable medium as in claim 12, the operations further comprising: 
receiving a third command to configure the first hardware context based on a first logical render context; 



The non-transitory machine-readable medium as in claim 13, the operations further comprising receiving the set of commands for the workload via a memory buffer including commands to be executed for the workload.
Claim 14
The non-transitory machine-readable medium as in claim 13, the operations further comprising receiving the set of commands for the workload via a memory buffer including commands to be executed for the workload.
Claim 15
The non-transitory machine-readable medium as in claim 14, wherein the first hardware context includes a first offset within the memory buffer associated with a start of the first partition, 

the second hardware context includes a second offset within the memory buffer associated with the start of the second partition, 

and the first hardware context and the second hardware context each include a step value associated with a number of partitions of the workload.
Claim 15
The non-transitory machine-readable medium as in claim 14, wherein the first hardware context includes a first offset within the memory buffer associated with a start of the first partition, 

the second hardware context includes a second offset within the memory buffer associated with the start of the second partition, 

and the first hardware context and the second hardware context each include a step value associated with a number of partitions of the workload.
Claim 16
A data processing system comprising: 
a host interconnect; 



the second tile of graphics processing resources including circuitry to accelerate a matrix operation, 

wherein the graphics processor is configured to be presented to a host processor of the data processing system as a single device, 

and the graphics processor includes hardware circuitry to: 
receive, via the host interconnect, a set of commands for a workload having a first partition and a second partition; 

submit the set of commands to a first tile of graphics processing resources of the graphics processor; 

submit the set of commands to a second tile of graphics processing resources of the graphics processor; 

at the first tile of graphics processing resources, read a first partition identifier 

and at the second tile of graphics processing resources, read a second partition identifier from a second hardware context, the second partition identifier associated with the second partition, and conditionally execute commands of the second partition while bypassing commands of the first partition, 

wherein to conditionally execute the commands of the second partition includes to execute a command associated with the matrix operation.

A data processing system comprising: 
a host interconnect; 







the graphics processor configured to be presented to a host processor of the data processing system as a single device, 


wherein the graphics processor includes hardware circuitry to: 
receive, via the host interconnect, a set of commands for a workload having a first partition and a second partition; 

submit the set of commands to a first tile of graphics processing engines of the graphics processor; 

submit the set of commands to a second tile of graphics processing engines of the graphics processor; 

at the first tile of graphics processing engines, read a first partition identifier 

and at the second tile of graphics processing engines, read a second partition identifier from a second hardware context, the second partition identifier associated with the second partition, and conditionally execute commands of the second partition while bypassing commands of the first partition.

the second tile of graphics processing resources including circuitry to accelerate a matrix operation; (“The hardware template illustrated in Fig. 1 contains the systolic GEMM… As illustrated in Fig. 4a, the hardware template is a systolic array of processing elements (PEs), each containing a dot product module and two memory buffers… The systolic array operates by iteratively processing chunks of the input matrices stored in the feeders… We presented a customizable matrix multiplication framework that includes a simple software API and hardware template for designing custom GEMM accelerators on the HARPv2.”; Moss, p. 110, st para under “4 Hardware Template”, p. 114, 1st para under “8 Conclusion”).  The GEMM matrix multiplication framework includes GEMM accelerators.  The GEMM includes a systolic array of processing elements (PE), each of which includes a dot product module.  Fig. 4a illustrates the systolic array, where each of the PEs is considered to be a tile of graphics processing resources (which would include a second tile of graphics processing resources).  Each of the PEs (tiles) includes a dot product module (circuitry to accelerate a matrix operation).  Moss can be combined with the U.S. Patent to arrive at the claimed invention of the Instant Application.  
     Moss teaches wherein to conditionally execute the commands of the second partition includes to execute a command associated with the matrix operation (“The hardware template also supports heterogenous load balancing.  At runtime the workload is partitioned across both the FPGA and the CPU.  In the case of a GEMM, the A and B matrices are divided into sub blocks and the computation is balanced across the two compute engines.”; Moss, p. 110, 1st para under “3.1.1 Heterogeneous Load Balancing”)
Moss teaches the workload (commands), which includes matrix commands, is divided into sub blocks (partitions, which include a second partition) and balanced across two compute engines for execution (execute a command associated with the matrix operation).  Moss can be combined with the U.S. Patent to arrive at the claimed invention of the Instant Application.  Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing date to modify the U.S. Patent by adding the feature of the second tile of graphics processing resources including circuitry to accelerate a matrix operation;… wherein to conditionally execute the nd para under “Abstract”)

The data processing system as in claim 16, the graphics processor further to: 
receive a first command to associate the first hardware context with the first tile of graphics processing resources; 

and receive a second command to associate the second hardware context with the second tile of graphics processing resources.
Claim 17
The data processing system as in claim 16, the graphics processor further to: 
receive a first command to associate the first hardware context with the first tile of graphics processing engines; 

and receive a second command to associate the second hardware context with the second tile of graphics processing engines.
Claim 18
The data processing system as in claim 17, the graphics processor further to: 
receive a third command to configure the first hardware context based on a first logical render context; 

and receive a fourth command to configure the second hardware context based on a second logical render context.
Claim 18
The data processing system as in claim 17, the graphics processor further to: 
receive a third command to configure the first hardware context based on a first logical render context; 

and receive a fourth command to configure the second hardware context based on a second logical render context.
Claim 19
The data processing system as in claim 18, the graphics processor further to: 


wherein the first hardware context includes a first offset within the memory buffer associated with a start of the first partition and the second hardware context includes a second offset within the memory buffer associated with the start of the second partition; 

wherein the first tile of graphics processing resources is to begin execution of commands for the first partition with a command stored at a first offset within the memory buffer; 

and wherein the second tile of graphics processing resources is to begin execution of commands for the second partition with a command stored at a second offset within the memory buffer.

The data processing system as in claim 18, the graphics processor further to: 


wherein the first hardware context includes a first offset within the memory buffer associated with a start of the first partition and the second hardware context includes a second offset within the memory buffer associated with the start of the second partition; 

wherein the first tile of graphics processing engines is to begin execution of commands for the first partition with a command stored at a first offset within the memory buffer; 

and wherein the second tile of graphics processing engines is to begin execution of commands for the second partition with a command stored at a second offset within the memory buffer.

The data processing system as in claim 19, wherein the first tile of graphics processing resources is to synchronize with the second tile of graphics processing resources when execution of 

The data processing system as in claim 19, wherein the first tile of graphics processing engines is to synchronize with the second tile of graphics processing engines when execution of the first .



Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a): 
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 11 and 16 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.  Claims 1, 11 and 19 recite “wherein to conditionally execute the commands of the second partition includes to execute a command associated with the matrix operation.”  Nowhere in the specification is it disclosed that conditional execution of commands includes executing a command associated with the matrix operation.  The specification discloses on p. 43, [0020], “At block 2008, the graphics processor can configure the first tile of the graphics 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Andonieh et al. U.S. Pub. No. 2012/0001925, Diard U.S. Patent No. 7,629,978, Nalluri et al. U.S. Pub. No. 2015/0123890, Swift U.S. Patent No. 8,310,491, Gruber et al. U.S. Pub. No. 2015/0379663, King et al. U.S. Pub. No. 2020/0175645, Redshaw et al. U.S. Patent No. 10,055,877, Ye et al. U.S. Pub. No. 2014/0334701 and Bourd et al. U.S. Pub. No. 2013/0222399.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DONNA J RICKS whose telephone number is (571)270-7532.  The examiner can normally be reached on M-F 7:30am-5pm EST (alternate Fridays off).

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Donna J. Ricks/Examiner, Art Unit 2612 



/JACINTA M CRAWFORD/Primary Examiner, Art Unit 2612