DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The information disclosure statements (IDS) submitted on July 26, 2021 and September 15, 2021 were filed on/after the mailing date of the application on July 26, 2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-8, 10-16, and 18-25 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of co-pending Application No. 17/145,885 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other as outlined in the tables below.


Present Application #17/385,693
1
2
3
4
5
6
7
8
10
Co-pending Application #17/145,885
1-3
4
4
5
6
7
8
9
1-3, 10


Present Application #17/385,693
11
12
13
14
15
16
Co-pending Application #17/145,885
4, 11, 12
4, 11
5, 6, 13, 14
7, 14
8
9


Present Application #17/385,693
18
19
20
21
22
23
24
25
Co-pending Application #17/145,885
15, 16, 18
17
18
19
19
19
20
20


Present Application #17/385,693 Claim 1
Co-pending Application #17/145,885 Claims 1-3
A graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including:
A graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including:
a register file to store a plurality of different types of operands; and
a register file to store operands for a plurality of different types of operands; and
a plurality of processing cores, including:
a plurality of processing cores, including:
a first set of processing cores of a first type to perform multi-dimensional matrix operations on a first set of operands in a first set of registers of the register file, 
a first set of processing cores of a first type to perform multi-dimensional matrix math operations on a first set of operands in a first set of registers of the register file; and…
wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file,
…the first set of processing cores of the first type includes a first set of floating point units (FPUs) to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file… (claim 2)
the first set of operands including one or more 64-bit operands; and
…matrix operations performed on the first set of operands include 64-bit floating (FP64) operations… (claim 3)
a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of 




Present Application #17/385,693 Claim 2
Co-pending Application #17/145,885 Claim 4
The graphics processing unit as in claim 1, 
The graphics processing unit as in claim 2, 
wherein the second set of processing cores comprises:
wherein the second set of processing cores comprises:
Limitation A below
a set of integer units to execute instructions to perform integer operations; and (limitation B)
a set of floating point units (FPUs) to execute instructions to perform floating point operations, the set of FPUs to perform 32-bit floating point (FP32) operations and 16-bit floating point (FP16) operations; and
a second set of FPUs to execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to FP64 operations.
a set of integer units to execute instructions to perform integer operations (limitation A)
Limitation B above


Claim 2 of the present application differs from claim 4 of the co-pending application in that claim 2 of the present application recites its second set of processing cores to include floating point units (FPUs) to perform “16-bit floating point (FP16) operations” where the co-pending application recites its second set of processing cores to include FPUs to perform “64-bit floating point (FP64) operations.”  However, it is well known that floating point units (FPUs) may be capable of performing in various precision floating point formats, where 16-bit floating point is known as half-precision floating point, 32-bit floating point are known as single precision floating point, and 64-bit floating point is known as double-precision floating point format, thus each are just different variations of each other, where it would be obvious to replace one precision for another according to the needs of the system.
 
Present Application #17/385,693 Claim 3
Co-pending Application #17/145,885 Claim 4

The graphics processing unit as in claim 2, 
the set of FPUs includes first FPUs to perform 32-bit floating point (FP32) operations and second FPUs to perform 16-bit floating point (FP 16) operations.
…a second set of FPUs to execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to FP64 operations…


Present Application #17/385,693 Claim 4
Co-pending Application #17/145,885 Claim 5
The graphics processing unit as in claim 1, wherein
The graphics processing unit as in claim 1, wherein
the first set of processing cores of the first type is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file.
the first set of processing cores of the first type is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file.


Present Application #17/385,693 Claim 5
Co-pending Application #17/145,885 Claim 6
The graphics processing unit as in claim 4, wherein
The graphics processing unit as in claim 5, wherein
the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file.
the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file.


Present Application #17/385,693 Claim 6
Co-pending Application #17/145,885 Claim 7
The graphics processing unit as in claim 5, wherein
The graphics processing unit as in claim 6, wherein
the source includes a register address start limit, stride, number of elements, and element size.
the source includes a register address start limit, stride, number of elements, and element size.


Present Application #17/385,693 Claim 7
Co-pending Application #17/145,885 Claim 8
The graphics processing unit as in claim 1, wherein
The graphics processing unit as in claim 1, wherein
the at least one of the one or more multiprocessors further comprises an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands.
the at least one of the one or more multiprocessors further comprises an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands.



Co-pending Application #17/145,885 Claim 9
The graphics processing unit as in claim 1, wherein
The graphics processing unit as in claim 1, wherein
the first set of processing cores of the first type are associated with a first memory channel and the second set of processing cores of the second type are associated with a second memory channel.
the first set of processing cores of the first type are associated with a first memory channel and the second set of processing cores of the second type are associated with a second memory channel.


Present Application #17/385,693 Claim 10
Co-pending Application #17/145,885 Claim 10
A method to facilitate processing of data at a graphics processing unit (GPU), the method comprising:
A method comprising:
receiving, at a first set of processing cores of a first type, a first set of operands from first registers of a register file,
receiving a first set of operands from a first set of registers of a register file at a first set of processing cores of a first type at a graphics processing unit;
wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands; and

receiving, at a second set of processing cores of a second type, a second set of operands from second registers of the register file, the second set of processing cores being different from the first set of processing cores;
receiving a second set of operands from a second set of registers of a register file at a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, at the graphics processing unit;
performing multi-dimensional matrix math operations on the first set of operands at the first set of processing cores; and
performing multi-dimensional matrix math operations on the first set of operands at the first set of processing cores; and
performing general-purpose graphics processing unit (GPGPU) operations on the second set of operands at the second set of processing cores.
performing general-purpose graphics processing unit (GPGPU) operations on the second set of operands at the second set of processing cores.


Claim 10 of the present application above differs from claim 10 of the co-pending application in that claim 10 of the present application recites, “wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of 

Present Application #17/385,693 Claim 10
Co-pending Application #17/145,885 Claims 1-3
A method to facilitate processing of data at a graphics processing unit (GPU), the method comprising:
A graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including:

a register file to store operands for a plurality of different types of operands; and

a plurality of processing cores, including:
receiving, at a first set of processing cores of a first type, a first set of operands from first registers of a register file,
a first set of processing cores of a first type to perform multi-dimensional matrix math operations on a first set of operands in a first set of registers of the register file; and…
wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file,
…the first set of processing cores of the first type includes a first set of floating point units (FPUs) to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file… (claim 2) (Limitation A)
the first set of operands including one or more 64-bit operands; and
…matrix operations performed on the first set of operands include 64-bit floating (FP64) operations… (claim 3)
receiving, at a second set of processing cores of a second type, a second set of operands from second registers of the register file, the second set of processing cores being different from the first set of processing cores;
…a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file (Limitation B)
performing multi-dimensional matrix math operations on the first set of operands at the first set of processing cores; and
Limitation A above

Limitation B above


Present Application #17/385,693 Claim 11
Co-pending Application #17/145,885 Claims 11 and 12
The method as in claim 10, wherein
The method as in claim 10, wherein
performing the GPGPU operations at the second set of processing cores comprises:
performing the GPGPU operations at the second set of processing cores comprises:
executing instructions at a set of floating point units (FPUs) to perform floating point operations, wherein executing the instructions at the set of floating point units (FPUs) comprises performing 32-bit floating point (FP32) operations and 16-bit floating point (FP 16) operations; and
executing instructions at a set of floating point units (FPUs), to perform floating point operations, wherein executing instructions at the set of floating point units (FPUs) comprises performing 32-bit floating point (FP32) operations at first subset of FPU; and performing 64-bit floating point (FP64) operations at a second subset of FPUs (claim 11)
executing instructions at a set of integer units to perform integer operations.
…performing the GPGPU operations at the second set of processing cores further comprises executing instructions at a set of integer units to perform integer operation… (claim 12)


Claim 11 of the present application is also similar to claim 4 of the co-pending application, thus may be rejected under similar rationale. Additionally, Claim 11 of the present application differs from claim 4 of the co-pending application in that claim 11 of the present application recites its second set of processing cores to include floating point units (FPUs) to perform “16-bit floating point (FP16) operations” where the co-pending application recites its second set of processing cores to include FPUs to perform “64-bit floating point (FP64) operations.”  However, it is well known that floating point units (FPUs) may be capable of performing in various precision floating point formats, where 16-bit floating point is known as half-precision floating point, 32-bit floating point are known as single precision floating point, and 64-bit floating point is known as double-precision floating point format, thus each are just different variations of each other, 

Present Application #17/385,693 Claim 12
Co-pending Application #17/145,885 Claim 11 
The method as in claim 11, further comprising
The method as in claim 10, wherein performing the GPGPU operations at the second set of processing cores comprises:
performing 32-bit floating point (FP32) operations at first FPUs of the set of FPUs and performing 16-bit floating-point operations at second FPUs of the set of FPUs.
executing instructions at a set of floating point units (FPUs), to perform floating point operations, wherein executing instructions at the set of floating point units (FPUs) comprises performing 32-bit floating point (FP32) operations at first subset of FPU; and performing 64-bit floating point (FP64) operations at a second subset of FPUs.


Claim 12 of the present application is also similar to claim 4 of the co-pending application, thus may be rejected under similar rationale.

Present Application #17/385,693 Claim 13
Co-pending Application #17/145,885 Claims 13 and 14
The method as in claim 10, further comprising
The method as in claim 10, wherein
performing an in-place matrix to vector transformation for a first type of operand stored in the register file via the first set of processing cores of the first type, 
the first set of processing cores is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file (claim 13)
wherein the in-place matrix to vector transformation includes a set of operations having a source and destination and the source and destination are within the register file.
…the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file…(claim 14)


Claim 13 of the present application is also similar to claims 5 and 6 of the co-pending application, thus may be rejected under similar rationale.

Present Application #17/385,693 Claim 14
Co-pending Application #17/145,885 Claim 14
The method as in claim 13, wherein
The method as in claim 10, wherein
the source includes a register address start limit, stride, number of elements, and element size.
…the source includes a register address start limit, stride, number of elements, and element size.


Claim 14 of the present application is also similar to claim 7 of the co-pending application, thus may be rejected under similar rationale.

Present Application #17/385,693 Claim 15
Co-pending Application #17/145,885 Claim 8
The method as in claim 10, wherein
The graphics processing unit as in claim 1, wherein
the GPU includes one or more multiprocessors comprising the first set of processing cores of the first type, the second set of processing cores of the second type, and an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands.
the at least one of the one or more multiprocessors further comprises an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands.


Present Application #17/385,693 Claim 16
Co-pending Application #17/145,885 Claim 9
The method as in claim 15, wherein
The graphics processing unit as in claim 1, wherein
the first set of processing cores of the first type are associated with a first memory channel and the second set of processing cores of the second type are associated with a second memory channel.
the first set of processing cores of the first type are associated with a first memory channel and the second set of processing cores of the second type are associated with a second memory channel.


Present Application #17/385,693 Claim 18
Co-pending Application #17/145,885 Claims 15, 16, and 18
A graphics processing system comprising:
A data processing system comprising:

a memory device; and
a graphics processing unit comprising one or more multiprocessors, 
a graphics processing unit comprising one or more multiprocessors, 

at least one of the one or more multiprocessors including a register file to store operands for a plurality of different types of operands and 
a plurality of processing cores, including:
a plurality of processing cores, including:
a first set of processing cores of a first type to perform multi-dimensional matrix operations on a first set of operands in a first set of registers of the register file,
a first set of processing cores of a first type to perform multi-dimensional matrix math operations on a first set of operands in a first set of registers of the register file; and
wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file; and
…wherein the first set of processing core of the first type includes a first set of floating point units (FPUs) to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file…(claim 16)
a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file, 
a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file.
wherein the second set of processing cores comprises:
…wherein the second set of processing cores comprises…(claim 18)
Limitation A below
a set of integer units to execute instructions to perform integer operations; and…(claim 18) (Limitation B)
a set of floating point units (FPUs) to execute instructions to perform floating point operations, the set of FPUs to perform 32-bit floating point (FP32) operations and 16-bit floating point (FP16) operations; and
a second set of FPUs to execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to FP64 operations….(claim 18)
a set of integer units to execute instructions to perform integer operations. (Limitation A)
Limitation B above


Claim 18 of the present application differs from claim 15, 16, and 18 of the co-pending application in that claim 18 of the present application recites its second set of processing cores to include floating point units (FPUs) to perform “16-bit floating point (FP16) operations” where the co-pending application recites its second set of processing cores to include FPUs to perform 

Present Application #17/385,693 Claim 19
Co-pending Application #17/145,885 Claim 17
The graphics processing system as in claim 18, wherein
The data processing system as in claim 16, wherein
the first set of operands include one or more 64-bit operands
matrix operations performed on the first set of operands include 64-bit floating (FP64) operations.


Present Application #17/385,693 Claim 20
Co-pending Application #17/145,885 Claim 18
The graphics processing system as in claim 18, wherein
The data processing system as in claim 16, wherein the second set of processing cores comprises…
the set of FPUs includes first FPUs to perform 32-bit floating point (FP32) operations and second FPUs to perform 16- bit floating point (FP16) operations.
…a second set of FPUs to execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to FP64 operations…


Present Application #17/385,693 Claim 21
Co-pending Application #17/145,885 Claim 19
The graphics processing system as in claim 18, wherein
The data processing system as in claim 15, wherein
the first set of processing cores of the first type is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file.
the first set of processing cores of the first type is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file…



Co-pending Application #17/145,885 Claim 19
The graphics processing system as in claim 21, wherein
The data processing system as in claim 15, wherein
the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file.
…the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file…


Present Application #17/385,693 Claim 23
Co-pending Application #17/145,885 Claim 19
The graphics processing system as in claim 22, wherein
The data processing system as in claim 15, wherein
the source includes a register address start limit, stride, number of elements, and element size.
…the source includes a register address start limit, stride, number of elements, and element size…


Present Application #17/385,693 Claim 24
Co-pending Application #17/145,885 Claim 20
The graphics processing system as in claim 18, wherein
The data processing system in claim 15, wherein
the at least one of the one or more multiprocessors further comprises an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands.
the at least one of the one or more multiprocessors further comprises an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands…


Present Application #17/385,693 Claim 25
Co-pending Application #17/145,885 Claim 20
The graphics processing system as in claim 18, wherein 
The data processing system in claim 15, wherein
the first set of processing cores of the first type are associated with a first memory channel and the second set of processing cores of the second type are associated with a second memory channel.
…the first set of processing cores of the first type are associated with a first memory channel, and the second set of processing cores of the second type are associated with a second memory channel…



Claims 1-8, 10-14, 16, and 18-25 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3, 5, 8, and 10 of U.S. Patent No. 10,902,547 in view of Narvaez et al. (US 2014/0189704).   Please see the tables and rejections below.

Present Application #17/385,693  
1
2
3
7
10
11
12
18
20
24
U.S. Patent #10,902,547  
1
1
1
3
5
5
5
8
8
10


Present Application #17/145,885  Claim 1
U.S. Patent #10,902,547  Claim 1
A graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including:
A graphics processing unit (GPU) comprising one or more multiprocessors, at least one of the one or more multiprocessors including:
a register file to store operands for a plurality of different types of operands; and
a register file to store operands for a plurality of different types of operands; and
a plurality of processing cores, including:
a plurality of processing cores, including:
a first set of processing cores of a first type to perform multi-dimensional matrix math
operations on a first set of operands in a first set of registers of the register file, 
a first set of processing cores of a first type to perform multi-dimensional matrix math operations including deep learning matrix operations on a first set of operands in a first set of registers of the register file; and
wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands

a second set of processing cores of a second type, the second set of processing cores being
different from the first set of processing cores, 
a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, 
the second set of processing cores to
perform general purpose graphics processing unit (GPGPU) operations on a second
set of operands in a second set of registers of the register file.
the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file,

wherein the second set of processing cores comprises:

a set of floating point units (FPUs) to
execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset 

a set of integer units to execute
instructions to perform integer operations.


Claim 1 of the present application differs from claim 1 of the patent application in that claim 1 of the present application is broader in scope than claim 1 of the patent application, thus encompasses that of the patent application.  Additionally, claim 1 of the present application recites, “wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands” which is not recited by the patent application.  However, Narvaez et al. disclose the first set of processing cores of the first type (first set within cores 202A-202N implemented as special purpose cores) includes circuitry (Figure 1 illustrates example of a core 190, e.g. one of cores 202A-N, comprises execution clusters 160 with execution units 162, where [0032] notes execution units 162 may perform various operations, such as shifts, addition, subtraction, multiplication, and on various types of data, such as scalar floating point, packed integer, packed floating point, vector integer, vector floating point) to execute instructions to perform matrix operations ([0035] notes the core 190 may support one or more instruction sets) on the first set of operands in the first set of registers of the register file (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers), the first set of operands including one or more 64-bit operands (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers).

It would have been obvious to one of ordinary skill in the art at the time of the invention to modify the patent application’s multiprocessors comprising a plurality of processing cores to include processing cores of different types as described in Narvaez et al.’s to expand the capabilities and functionality of the multiprocessors to perform additional operations, thus enhancing the performance of the system.

Present Application #17/385,693  Claim 2
U.S. Patent #10,902,547  Claim 1
The graphics processing unit as in claim 1, 
A graphics processing unit (GPU) comprising…
wherein the second set of processing cores comprises:
…wherein the second set of processing cores comprises:
a second set of FPUs to execute instructions to perform floating point operations, the set of
FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and 16-bit floating point (FP16) operations; and
a set of floating point units (FPUs) to
execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to perform 64-bit floating (FP64) operations; and

a set of integer units to execute instructions to perform integer operations. 


Claim 2 of the present application differs from claim 4 of the co-pending application in that claim 2 of the present application recites its second set of processing cores to include floating point units (FPUs) to perform “16-bit floating point (FP16) operations” where the co-pending application recites its second set of processing cores to include FPUs to perform “64-bit floating point (FP64) operations.”  However, it is well known that floating point units (FPUs) may be capable of performing in various precision floating point formats, where 16-bit floating point is known as half-precision floating point, 32-bit floating point are known as single precision floating point, and 64-bit floating point is known as double-precision floating point format, thus each are just different variations of each other, where it would be obvious to replace one precision for another according to the needs of the system.

Present Application #17/385,693 Claim 3
U.S. Patent #10,902,547  Claim 1
The graphics processing unit as in claim 2, wherein
A graphics processing unit (GPU) comprising…
the set of FPUs includes first FPUs to perform 32-bit floating point (FP32) operations and second FPUs to perform 16-bit floating point (FP 16) operations.
…a set of floating point units (FPUs) to
execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to perform 64-bit floating (FP64) operations…


As to claim 4, Narvaez et al. disclose the first set of processing cores of the first type (first set within cores 202A-202N implemented as special purpose cores) is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers; further modified with Agarwal, column 9, lines 48-50 notes neural cores perform parallel vector matrix multiplication, matrix vector multiplication, and parallel rank-1 outer-product updates).

As to claim 5, Narvaez et al. disclose the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file ([0153] notes register index field 1544 specifies the locations of source and destination operands in registers, e.g. register file).

As to claim 6, Narvaez et al. disclose the source includes a register address start limit, stride, number of elements, and element size ([0148] notes example of vector instruction format supporting different operand length or sizes, or different data element widths or sizes (may be considered stride, number of elements, element size), [0153] notes register index field specifying locations, e.g. addresses, of source and destination operands, (may be considered register address start limit)). 

Present Application #17/385,693  Claim 7
U.S. Patent #10,902,547  Claim 3
The graphics processing unit as in claim 1, wherein the at least one of the one or
more multiprocessors further comprises
The GPU of claim 2, wherein the at least one of the one or more multiprocessors further comprises 

instruction associated with the first set of operands and a second instruction
associated with the second set of operands.
an instruction cache to store the first instructions and the second instructions.


As to claim 8, Narvaez et al. disclose the first set of processing cores of the first type are associated with a first memory channel, and the second set of processing cores of the second type are associated with a second memory channel (modified with Narvaez, Figure 2 illustrates each of cores 202 comprising one or more cache units 204, further in communication with shared cache units 206).

Present Application #17/385,693  Claim 10
U.S. Patent #10,902,547  Claim 5
A method to facilitate processing of data at a graphics processing unit (GPU), the method comprising:
A method to facilitate processing of data at a graphics processing unit (GPU), the method comprising:
receiving, at a first set of processing cores of a first type, a first set of operands from first registers of a register file,
receiving a first set of operands from a first set of registers of a register file at a first set of processing cores of a first type at the GPU;
wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands

receiving, at a second set of processing cores of a second type, a second set of operands from second registers of the register file, 
the second set of processing cores being
different from the first set of processing cores, 
receiving a second set of operands from a second set of registers of a register file at a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, at the GPU;
performing multi-dimensional matrix math operations on the first set of operands at the
first set of processing cores; and
performing multi-dimensional matrix math operations including deep learning matrix operations on the first set of operands at the first set of processing cores: and
performing general-purpose graphics processing unit (GPGPU) operations on the
second set of operands at the second set of processing cores.
performing general-purpose graphics processing unit (GPGPU) operations on the second set of operands at the second set of processing cores, 

wherein performing the GPGPU operations at the second set of processing cores comprises:

executing instructions at a set of floating
point units (FPUs), to perform floating point operations, wherein executing instructions at the set of floating point units (FPUs) comprises

performing 32-bit floating point (FP32) operations at a first subset of FPU and performing 64-bit floating point
(FP64) operations at a second subset of FPUs,
and

executing instructions at a set of
integer units to perform integer operations.


Claim 10 of the present application differs from claim 5 of the patent application in that claim 10 of the present application is broader in scope than claim 5 of the patent application, thus encompasses that of the patent application.  Additionally, claim 10 of the present application recites, “wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands” which is not recited by the patent application.  However, Narvaez et al. disclose the first set of processing cores of the first type (first set within cores 202A-202N implemented as special purpose cores) includes circuitry (Figure 1 illustrates example of a core 190, e.g. one of cores 202A-N, comprises execution clusters 160 with execution units 162, where [0032] notes execution units 162 may perform various operations, such as shifts, addition, subtraction, multiplication, and on various types of data, such as scalar floating point, packed integer, packed floating point, vector integer, vector floating point) to execute instructions to perform matrix operations ([0035] notes the core 190 may support one or more instruction sets) on the first set of operands in the first set of registers of the register file (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers), the first set of operands including one or more 64-bit operands (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers).

It would have been obvious to one of ordinary skill in the art at the time of the invention to modify the patent application’s multiprocessors comprising a plurality of processing cores to include processing cores of different types as described in Narvaez et al.’s to expand the capabilities and functionality of the multiprocessors to perform additional operations, thus enhancing the performance of the system.

Present Application #17/385,693  Claim 11
U.S. Patent #10,902,547  Claim 5


A method to facilitate processing of data at a graphics processing unit (GPU), the method comprising…
wherein performing the GPGPU operations at the second set of processing cores comprises:
…wherein performing the GPGPU operations at the second set of processing cores comprises:
executing instructions at a set of floating point units (FPUs), to perform floating point
operations, wherein executing instructions at the set of floating point units (FPUs) comprises performing 32-bit floating point (FP32) operations and 16-bit floating point (FP16) operations; and
executing instructions at a set of floating
point units (FPUs), to perform floating point operations, wherein executing instructions at the set of floating point units (FPUs) comprises performing 32-bit floating point (FP32) operations at a first subset of FPU and performing 64-bit floating point
(FP64) operations at a second subset of FPUs; and
executing instructions at a set of integer units to perform integer operations.
executing instructions at a set of
integer units to perform integer operations.


Claim 11 of the present application differs from claim 5 of the co-pending application in that claim 11 of the present application recites its second set of processing cores to include floating point units (FPUs) to perform “16-bit floating point (FP16) operations” where the co-pending application recites its second set of processing cores to include FPUs to perform “64-bit floating point (FP64) operations.”  However, it is well known that floating point units (FPUs) may be capable of performing in various precision floating point formats, where 16-bit floating point is known as half-precision floating point, 32-bit floating point are known as single precision floating point, and 64-bit floating point is known as double-precision floating point format, thus each are just different variations of each other, where it would be obvious to replace one precision for another according to the needs of the system.

Present Application #17/385,693 Claim 12
U.S. Patent #10,902,547  Claim 5

A method to facilitate processing of data at a graphics processing unit (GPU), the method comprising…
performing 32-bit floating point (FP32) operations at first FPUs of the set of FPUs and performing 16-bit floating-point operations at second FPUs of the set of FPUs.
…executing instructions at a set of floating
point units (FPUs), to perform floating point operations, wherein executing instructions at the set of floating point units (FPUs) comprises performing 32-bit floating point (FP32) operations at a first subset of FPU and performing 64-bit floating point
(FP64) operations at a second subset of FPUs…


As to claim 13, Narvaez et al. disclose the first set of processing cores of the first type (first set within cores 202A-202N implemented as special purpose cores) is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers; further modified with Agarwal, column 9, lines 48-50 notes neural cores perform parallel vector matrix multiplication, matrix vector multiplication, and parallel rank-1 outer-product updates), the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file ([0153] notes register index field 1544 specifies the locations of source and destination operands in registers, e.g. register file).

[0148] notes example of vector instruction format supporting different operand length or sizes, or different data element widths or sizes (may be considered stride, number of elements, element size), [0153] notes register index field specifying locations, e.g. addresses, of source and destination operands, (may be considered register address start limit)). 

As to claim 16, Narvaez et al. disclose the first set of processing cores of the first type are associated with a first memory channel, and the second set of processing cores of the second type are associated with a second memory channel (modified with Narvaez, Figure 2 illustrates each of cores 202 comprising one or more cache units 204, further in communication with shared cache units 206).

Present Application #17/385,693  Claim 18
U.S. Patent #10,902,547  Claim 8
A graphics processing system comprising:

a graphics processing unit comprising one or more multiprocessors, at least one
of the one or more multiprocessors including
A multiprocessor comprising:
a register file to store a plurality
of different types of operands and
a register file to store operands for a plurality of different types of operands; and
a plurality of processing cores, including:
a plurality of processing cores, including:
a first set of processing cores of a first type to perform multi-dimensional matrix math operations on a first set of operands in a first set of registers of the register file
a first set of processing cores of a first type to perform multi-dimensional matrix math operations including deep learning matrix operations on a first set of operands in a first set of registers of the register file; and
wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands;


a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, 
the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file,
the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file,
wherein the second set of processing cores comprises:
wherein the second set of processing cores comprises:
a set of floating point units (FPUs) to
execute instructions to perform floating point operations, the set of FPUs to perform 32-bit floating point (FP32) operations and perform 16-bit floating (FP16) operations; and
a set of floating point units (FPUs) to
execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to perform 64-bit floating (FP64) operations; and
a set of integer units to execute
instructions to perform integer operations.
a set of integer units to execute
instructions to perform integer operations.


Claim 18 of the present application is directed to a graphics processing system comprising a graphics processing unit further comprising one or more multiprocessors, where the patent application is directed to a single multiprocessor, but is understood that the multiprocessor may be one of the multiprocessors as recited in the present application, and may be further encompassed within a system, such as the graphics system of the present application, thus would render predictable results, without changing the scope of the invention.  
Additionally, Claim 18 of the present application differs from claim 8 of the co-pending application in that claim 18 of the present application recites its second set of processing cores to include floating point units (FPUs) to perform “16-bit floating point (FP16) operations” where the co-pending application recites its second set of processing cores to include FPUs to perform “64-bit floating point (FP64) operations.”  However, it is well known that floating point units (FPUs) may be capable of performing in various precision floating point formats, where 16-bit floating point is known as half-precision floating point, 32-bit floating point are known as single 
Lastly, claim 18 of the present application recites, “wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands” which is not recited by the patent application.  However, Narvaez et al. disclose the first set of processing cores of the first type (first set within cores 202A-202N implemented as special purpose cores) includes circuitry (Figure 1 illustrates example of a core 190, e.g. one of cores 202A-N, comprises execution clusters 160 with execution units 162, where [0032] notes execution units 162 may perform various operations, such as shifts, addition, subtraction, multiplication, and on various types of data, such as scalar floating point, packed integer, packed floating point, vector integer, vector floating point) to execute instructions to perform matrix operations ([0035] notes the core 190 may support one or more instruction sets) on the first set of operands in the first set of registers of the register file (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers).



As to claim 19, Narvaez et al. disclose the first set of operands including one or more 64-bit operands (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers).

Present Application #17/385,693  Claim 20
U.S. Patent #10,902,547  Claim 8
The graphics system as in claim 18, wherein
A multiprocessor comprising…
the set of FPUs includes first FPUs to perform 32-bit floating point (FP32) operations and second FPUs to perform 16-bit floating point (FP16) operations.
…a set of floating point units (FPUs) to
execute instructions to perform floating point operations, the set of FPUs comprising a first subset of FPUs to perform 32-bit floating point (FP32) operations and a second subset of FPUs to perform 64-bit floating (FP64) operations…


As to claim 21, Narvaez et al. disclose the first set of processing cores of the first type (first set within cores 202A-202N implemented as special purpose cores) is configured to perform an in-Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers; further modified with Agarwal, column 9, lines 48-50 notes neural cores perform parallel vector matrix multiplication, matrix vector multiplication, and parallel rank-1 outer-product updates).

As to claim 22, Narvaez et al. disclose the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file ([0153] notes register index field 1544 specifies the locations of source and destination operands in registers, e.g. register file).

As to claim 23, Narvaez et al. disclose the source includes a register address start limit, stride, number of elements, and element size ([0148] notes example of vector instruction format supporting different operand length or sizes, or different data element widths or sizes (may be considered stride, number of elements, element size), [0153] notes register index field specifying locations, e.g. addresses, of source and destination operands, (may be considered register address start limit)). 


U.S. Patent #10,902,547  Claim 10
The graphics processing system as in claim 18, wherein the at least one of the one or
more multiprocessors further comprises
The multiprocessor of claim 9, further comprising
an instruction cache to store a first
instruction associated with the first set of operands and a second instruction
associated with the second set of operands.
an instruction cache to store the first instructions and the second instructions.


As to claim 25, Narvaez et al. disclose the first set of processing cores of the first type are associated with a first memory channel, and the second set of processing cores of the second type are associated with a second memory channel (modified with Narvaez, Figure 2 illustrates each of cores 202 comprising one or more cache units 204, further in communication with shared cache units 206).

Claim Objections

Claim 19 is objected to because of the following informalities:  
Claim 19 does not end in the proper period (.) punctuation mark.
Appropriate correction is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-10, and 13-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Merrill, III (US 2016/0179574) in view of Narvaez et al. (US 2014/0189704) and Agarwal et al. (US 10,776,684).

As to claim 1, Merrill, III discloses a graphics processing unit (parallel processing unit (PPU) 200) comprising one or more multiprocessors (general processing clusters (GPC) 250(X), further comprising texture processing clusters (TPC) 320(V) as illustrated in Figure 3A, further comprising streaming multiprocessors (SM) 340, further illustrated in Figure 4), at least one of the one or more multiprocessors (SM 340) including a register file (Figure 4, register file 420) to store a plurality of different types of operands ([0054] notes register file 420 temporarily stores operands for each functional unit of the SM, thus obvious to different types of operands) and a plurality of processing cores (Figure 4, processing cores 450(L)), including: a set of processing cores (processing core 450(L)) to perform operations on a set of operands in a set of registers of the register file (register file 420), wherein the set of processing cores (processing core 450(L)) includes circuitry ([0055] notes each L processing cores 450 may include a fully-pipelined, single precision processing unit that includes a floating point arithmetic logic unit and an integer arithmetic logic unit and may also include a floating point arithmetic logic unit) to execute instructions to perform operations ([0078] thru [0116] notes performing sparse matrix vector multiplication) on the set of operands in the set of registers of the register file ([0054] notes register file 420 provides set of registers and temporary storage of operands for the functional units of the SM 340, e.g. processing cores).

Merrill, III differs from the invention defined in claim 15 in that Merrill, III disclose its plurality of processing cores as a single type of processing cores to perform operations such as floating point and integer operations ([0055] notes each L processing cores 450 may include a fully-pipelined, single precision processing unit that includes a floating point arithmetic logic unit and an integer arithmetic logic unit and may also include a floating point arithmetic logic unit) on a set of operands in a set of registers of the register file ([0054] notes register file 420 provides set of registers and temporary storage of operands for the functional units of the SM 340, e.g. processing cores) and further describes the system to perform sparse matrix vector multiplication ([0078] thru [0116]), but do not disclose its “plurality of processing cores including: a first set of processing cores of a first type to perform multi-dimensional matrix math operations on a first set of operands in a first set of registers of the register file, wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file, the first set of operands including one or more 64-bit operands; and a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file.”

Narvaez et al. disclose a graphics processing unit (e.g. Figure 2, processor 200, [0039] notes processor 200 may be a graphics processor) comprising a plurality of processing cores (cores 202A-N), including: a first set of processing cores of a first type to perform math operations (e.g. a first set within cores 202A-202N implemented as special purpose cores) on a first set of operands in a first set of registers of the register file ([0153] notes register index field 1544 specifies the locations of source and destination operands in registers, e.g. register file, Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension, while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers), wherein the first set of processing cores (first set within cores 202A-202N implemented as special purpose cores) of the first type includes circuitry (Figure 1 illustrates example of a core 190, e.g. one of cores 202A-N, comprises execution clusters 160 with execution units 162, where [0032] notes execution units 162 may perform various operations, such as shifts, addition, subtraction, multiplication, and on various types of data, such as scalar floating point, packed integer, packed floating point, vector integer, vector floating point) to execute instructions to perform matrix operations ([0035] notes the core 190 may support one or more instruction sets) on the first set of operands in the first set of registers of the register file (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers), the first set of operands including one or more 64-bit operands (Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers); and a second set of processing cores of a second type (e.g. a second set within cores 202A-202N implemented as general purpose cores), the second set of processing cores being different from the first set of processing cores ([0039] notes different implementations of processor 200, where cores 202A-202N may be implemented as general purpose cores, such as general purpose in-order cores and/or general purpose out-of-order cores, or special purpose cores, such as graphics and/or scientific cores, [0042] notes cores 202A-N may be heterogeneous and include both “small” cores” and “big” cores, Figure 8 further illustrates and associated text, e.g. [0076] notes “small” cores as power-efficient cores, and “big” cores as high-performance cores), the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations (e.g. a second set within cores 202A-202N implemented as general purpose cores, where [0039] notes implementing general purpose graphics processing) on a second set of operands in a second set of registers of the register file ([0153] notes register index field 1544 specifies the locations of source and destination operands in registers, e.g. register file, Figure 17 illustrates register architecture 1700 comprising various registers, such as general purpose registers 1725, [0221] notes general purpose registers 1725 used along with addressing modes to address memory operands).

It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Merrill, III’s multiprocessors comprising a plurality of processing cores to include processing cores of different types as described in Narvaez et al.’s to expand the capabilities and functionality of the multiprocessors to perform additional operations, thus enhancing the performance of the system.

As noted above, Merrill, III describes its system, e.g. PPU 200, to perform sparse matrix vector multiplication ([0078] thru [0116]), but Merrill, III modified with Narvaez et al. do not explicitly express its first set of processing cores of a first type to perform “multi-dimensional matrix math operations.”

Agarwal et al. disclose a processing unit comprising a plurality of processing cores (Figure 1, processor unit 104, further illustrated in Figure 7, processor unit 700), including: first set of processing cores of a first type (neural cores 106; 712, 714, 716, and 718) to perform multi-dimensional matrix math operations (column 9, lines 48-50 notes neural cores perform parallel vector matrix multiplication, matrix vector multiplication, and parallel rank-1 outer-product updates); and a second set of processing cores of a second type (digital cores 108; 704, 706, 708, and 710), the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose processing unit operations (column 3, lines 61 thru column 4, lines 10 notes neural cores and digital cores as different cores, where the digital cores may perform operations such as arithmetic operations on data).

It would have been obvious to one of ordinary skill in the art at the time of the invention to further modify Merrill, III modified with Narvaez et al.’s first set of processing cores to perform multi-dimensional matrix math operations as described by Agarwal et al. as Merrill, III notes its PPU capable of performing such operations, thus further enhancing the capabilities and functionalities of the processing cores. 

Claim 10 is similar in scope to claim 1 above, and is therefore rejected under similar rationale.

As to claims 4 and 13, Merrill, III modified with Narvaez et al. and Agarwal et al. disclose the first set of processing cores of the first type (Merrill, III, processing core 450(L); modified with Narvaez, first set within cores 202A-202N implemented as special purpose cores; further modified with Agarwal, neural cores 106; 712, 714, 716, and 718) is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file (Merrill, III, [0054] notes register file 420 provides set of registers and temporary storage of operands for the functional units of the SM 340, e.g. processing cores, [0078] thru [0116] notes performing sparse matrix vector multiplication; modified with Narvaez, Figure 17 illustrates register architecture 1700 comprising various registers, such as vector registers 1710 and scalar floating point stack register file (x87 stack) 1745 aliased with integer flat register file 1750, [0222] notes scalar floating point stack register file 1745 aliased with integer flat register file 1750, the x87 stack is an eight-element stack used to perform scalar floating-point operations on 32/64/80-bit floating point data using the x87 instruction set extension; while the MMX registers are used to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers; further modified with Agarwal, column 9, lines 48-50 notes neural cores perform parallel vector matrix multiplication, matrix vector multiplication, and parallel rank-1 outer-product updates; further modified with Agarwal, column 9, lines 48-50 notes neural cores perform parallel vector matrix multiplication, matrix vector multiplication, and parallel rank-1 outer-product updates).

As to claims 5 and 13, Merrill, III modified with Narvaez et al. and Agarwal et al. disclose the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file (modified with Narvaez, [0153] notes register index field 1544 specifies the locations of source and destination operands in registers, e.g. register file).

As to claims 6 and 14, Merrill, III modified with Narvaez et al. and Agarwal et al. disclose the source includes a register address start limit, stride, number of elements, and element size (modified with Narvaez, [0148] notes example of vector instruction format supporting different operand length or sizes, or different data element widths or sizes (may be considered stride, number of elements, element size), [0153] notes register index field specifying locations, e.g. addresses, of source and destination operands, (may be considered register address start limit)). 

As to claims 7 and 15, Merrill, III modified with Narvaez et al. and Agarwal et al. disclose the at least one of the one or more multiprocessors (Merrill, III, SM 340) further comprises an Merrill, III, instruction cache 405) to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands (Merrill, III, Figure 4 illustrates input from instruction cache to scheduler unit 410(K) comprising dispatch units 415, where [0052] notes scheduler unit 410 manages instruction scheduling for one or more groups of threads assigned to the SM 340, scheduling the warps for execution and then dispatching instruction from the plurality of different warps to the various functional units, e.g. cores 450, [0053] notes dispatch units transmit instructions to the one or more functional units, e.g. cores 450, [0054] notes register file 420 provides operands for functional units, e.g. cores 450; modified with Narvaez, Figures 15A and 15B illustrate and associated text notes vector instruction set which specifies operands)

As to claims 8 and 16, Merrill, III modified with Narvaez et al. and Agarwal et al. disclose the first set of processing cores of the first type are associated with a first memory channel, and the second set of processing cores of the second type are associated with a second memory channel (modified with Narvaez, Figure 2 illustrates each of cores 202 comprising one or more cache units 204, further in communication with shared cache units 206).

As to claims 9 and 17, Merrill, III modified with Narvaez et al. and Agarwal et al. disclose the one or more multiprocessors have a single instruction multiple threads (SIMT) architecture (Merrill, III, [0046] notes SM 340 implements a SIMT (Single-Instruction, Multiple Thread) architecture).

Allowable Subject Matter

Claims 2, 3, 11, and 12 would be allowable if the Double Patenting rejection may be overcome AND if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claims 18-25 would be allowable if the Double Patenting rejection may be overcome.

The following is a statement of reasons for the indication of allowable subject matter:  As to claims 2, 11, and 18, the prior art of record fails to teach or suggest, singly or combined, the limitations “the second set of processing cores comprises: a second set of FPUs to execute instructions to perform floating point operations, the set of FPUs to perform 32-bit floating point (FP32) operations and 16-bit floating point (FP16) operations; and a set of integer units to execute instructions to perform integer operations.”  Dependent claims 3 and 12 are indicated allowable for depending upon indicated allowable claims 2 and 11, respectively; dependent claims 19-25 are indicated allowable for depending upon indicated allowable claim 18.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACINTA M CRAWFORD whose telephone number is (571)270-1539.  The examiner can normally be reached on 9:00 a.m. to 5:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on (571)272-2976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JACINTA M CRAWFORD/Primary Examiner, Art Unit 2612