Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER'S AMENDMENT

An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with applicant’s representative, Andrew C. Milhollin, on 06/19/2021.

The claims has been amended as follows: 
1.  (Currently Amended) A method comprising:
receiving at a parallel processing unit parallel processing unit comprising a plurality of compute units (CUs) the set of commands including a plurality of matrix multiplication operations, each of the plurality of matrix multiplication operations including a corresponding plurality of submatrix multiplications;
in response to receiving a set of commands, scheduling a first matrix multiplication operation of the plurality of matrix multiplication operations at a first subset of CUs and a second matrix multiplication operation of the plurality of matrix multiplication operations at a second subset of the CUs, the second subset of CUs different from the first subset of CUs; and
executing the first and second matrix multiplication operations at the respective first subset and second subset of CUs, wherein the first matrix multiplication operation corresponds to a matrix multiplication of a first whole matrix and a second whole matrix and the second matrix multiplication operation corresponds to a matrix multiplication of a third whole matrix and a fourth whole matrix.
2.  (Original) The method of claim 1, further comprising:
providing results of the first matrix multiplication operation from the first subset of CUs to the second subset of CUs to perform the second matrix multiplication operation.
3.  (Original) The method of claim 2, further comprising:
providing results of the second matrix multiplication operation to a third subset of CUs of the plurality of CUs to perform a third matrix multiplication operation, the third subset of CUs different from the first subset and the second subset of CUs.
4.  (Original) The method of claim 3, further comprising:
providing results of the third matrix multiplication operation from the third subset of CUs to the first set of CUs to perform a fourth matrix multiplication operation.
5.  (Original) The method of claim 2, wherein:
the first matrix multiplication operation comprises a first multiplication and a second multiplication;
the second matrix multiplication operation comprises a third multiplication; and
wherein executing the first and second matrix multiplication operations comprises executing the second multiplication concurrent with the third multiplication.
6.  (Original) The method of claim 5, wherein:
the third multiplication multiplies a result of the first multiplication.
7.  (Original) The method of claim 2, wherein:
the first matrix multiplication operation comprises a first multiplication and a second multiplication;

8.  (Original) The method of claim 7, wherein:
executing the first matrix multiplication operation comprises executing the first multiplication concurrent with the second multiplication.
9.  (Original) The method of claim 1, further comprising:
generating an output of a recurrent neural network (RNN) based on the first and second matrix multiplication operations.
10.  (Currently Amended) A method, comprising:
receiving, at a parallel processing unit , wherein each matrix multiplication operation of the plurality of matrix multiplication operations corresponds to a respective multiplication of two whole matrices;
in response to receiving the plurality of matrix multiplication operations, scheduling different ones of the plurality of matrix multiplication operations at different corresponding subsets of the plurality of CUs; and
pipelining results of the plurality of matrix multiplication operations between the different subsets of the plurality of CUs.
11.  (Original) The method of claim 10, further comprising:
concurrently executing portions of the plurality of matrix multiplication operations at different subsets of the plurality of CUs.
12.  (Currently Amended) A parallel processing unit
a plurality of CUs, including a first subset of CUs and a second subset of CUs, the second subset of CUs different from the first subset of CUs;
a scheduler configured to:

in response to receiving the set of commands, schedule a first matrix multiplication operation of the plurality of matrix multiplication operations at the first subset of CUs and a second matrix multiplication operation of the plurality of matrix multiplication operations at the second subset of the CUs; and
wherein the first subset of CUs and second subset of CUs are configured to execute the first and second matrix multiplication operations, wherein the first matrix multiplication operation corresponds to a matrix multiplication of a first whole matrix and a second whole matrix and the second matrix multiplication operation corresponds to a matrix multiplication of a third whole matrix and a fourth whole matrix.
13.  (Currently Amended) The parallel processing unit of claim 12, wherein:
the first subset of CUs is configured to provide results of the first matrix multiplication operation to the second subset of CUs to perform the second matrix multiplication operation.
14.  (Currently Amended) The parallel processing unit of claim 13, wherein:
the second subset of CUs is configured to provide results of the second matrix multiplication operation to a third subset of CUs of the plurality of CUs to perform a third matrix multiplication operation, the third subset of CUs different from the first subset and the second subset of CUs.
15.  (Currently Amended) The parallel processing unit of claim 14, wherein:
the third subset of CUs is configured to provide results of the third matrix multiplication operation to the first set of CUs to perform a fourth matrix multiplication operation.
16.  (Currently Amended) The parallel processing unit of claim 13, wherein:

the second matrix multiplication operation comprises a third multiplication; and
wherein the first subset of CUs is configured to execute the second multiplication concurrent with the second subset of CUs configured executing the third multiplication.
17.  (Currently Amended) The parallel processing unit of claim 16, wherein:
the third multiplication multiplies a result of the first multiplication.
18.  (Currently Amended) The parallel processing unit of claim 13, wherein:
the first subset of CUs comprises a first cluster of CUs and a second cluster of CUs, the second cluster different from the first cluster;
the first matrix multiplication operation comprises a first multiplication and a second multiplication;
wherein the first subset of CUs is configured to execute the first multiplication at the first cluster of the first subset of CUs and the second multiplication at the second cluster of the first subset of CUs.
19.  (Currently Amended) The parallel processing unit of claim 18, wherein:
the first subset of CUs is configured to execute the first matrix multiplication operation concurrent with the second multiplication.
20.  (Currently Amended) The parallel processing unit of claim 12, wherein the parallel processing unit is configured to:
generate an output of a recurrent neural network (RNN) based on the first and second matrix multiplication operations.



The following is an examiner’s statement of reasons for allowance: 
The prior art of record does not teach or fairly suggest a parallel processing unit  comprising a plurality of compute units configured to,schedule different ones of a plurality of matrix multiplication operations at different corresponding subsets of the plurality of CUs to perform the plurality of matrix multiplication operations in combination with other features recited in the claims, wherein each matrix multiplication operation corresponds to a respective multiplication of two whole matrices.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chuong D Ngo whose telephone number is (571)272-3731.  The examiner can normally be reached on Monday-Friday (9-5).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/CHUONG D NGO/Primary Examiner, Art Unit 2182