DETAILED ACTION
This office action is in response to application filed on 3/31/2021.
Claims 1 – 18 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 6, 7 and 16 – 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altevogt et al (US 20190095214, hereinafter Altevogt), in view of Maiyuran et al (US 20190362460, hereinafter Maiyuran).

As per claim 1, Altevogt discloses: Circuitry comprising: 
two or more clusters of execution units, each cluster comprising one or more execution units to execute processing instructions; and scheduler circuitry to maintain one or more queues of processing instructions, the scheduler circuitry comprising picker circuitry to select a queued processing instruction for issue to an execution unit of one of the clusters of execution units for execution; in which: (Altevogt figure 1, execution units 101, instruction decode unit 102 and [0031].)
the scheduler circuitry is configured to maintain dependency data associated with each queued processing instruction, (Altevogt [0028])
to inhibit issue by the scheduler circuitry of the given queued processing instruction to an execution unit in a cluster of execution units other than a cluster of execution units containing an execution unit which generated at least one of those last awaited source operands. (Altevogt [0037]: “In step 205, it may be identified (e.g. by control unit 106) in the dependency cache 103, which execution unit is assigned to a previous instruction of the dependency chain on which depends the received current instruction (e.g. and how many instructions of that dependency chain have been scheduled to each of the execution units). For example, if the dependency chain includes a stream of instructions A-C, where C depends on B in that C requires the result of B as input, and B depends on A in that B requires the result of A as input. If for example, the current instruction is C it may be determined if instruction B is stored in the dependency cache. The identification of the instruction B in the cache results also to the identification of the execution unit on which instruction B is scheduled because the execution unit identifier (or indicator) of that execution unit is stored in association with the instruction B”.)
Altevogt did not explicitly disclose:
the dependency data for a queued processing instruction indicating any source operands which are required to be available for use in execution of that queued processing instruction and to inhibit issue of that queued processing instruction until all of the required source operands for that queued processing instruction are available and is configured to be responsive to an indication of the availability of the given operand as a source operand for use in execution of queued processing instructions;
and the scheduler circuitry is responsive to an indication of availability of one or more last awaited source operands for a given queued processing instruction,
However, Maiyuran teaches:
the dependency data for a queued processing instruction indicating any source operands which are required to be available for use in execution of that queued processing instruction and to inhibit issue of that queued processing instruction until all of the required source operands for that queued processing instruction are available and is configured to be responsive to an indication of the availability of the given operand as a source operand for use in execution of queued processing instructions; (Maiyuran [0186]: “If software scoreboard information is encoded at block 2305, the logic 2300 can decode the scoreboard information hint to determine instruction dependency. The logic 2300 can then block execution of the instruction until completion of identified dependencies, as shown at block 2308… the logic 2300 can configure a scheduler to delay the scheduling of the instruction until the identified dependencies are complete. Dependencies can be identified via register distance information or scoreboard IDs as described herein.”.)
and the scheduler circuitry is responsive to an indication of availability of one or more last awaited source operands for a given queued processing instruction, (Maiyuran [0189]: “In one embodiment, execution of the decoded instruction can include configuring GPGPU logic to interrupt or trigger other internal signals within the GPGPU upon completion of the identified dependencies. When the dependency is satisfied, the GPGPU can then retire the decoded instruction and allow further instruction execution within the thread. While completion of the decoded instruction is blocked, other threads or contexts can continue to execute on the GPGPU. Blocking completion of the decoded instruction can include, in one embodiment, blocking the retirement of the decoded instruction.”) 
It would have been obvious for one of ordinary skill in the art at the effective filing date of the claimed invention to incorporate the teaching of Maiyuran into that of Altevogt in order to have the dependency data for a queued processing instruction indicating any source operands which are required to be available for use in execution of that queued processing instruction and to inhibit issue of that queued processing instruction until all of the required source operands for that queued processing instruction are available and is configured to be responsive to an indication of the availability of the given operand as a source operand for use in execution of queued processing instructions; and the scheduler circuitry is responsive to an indication of availability of one or more last awaited source operands for a given queued processing instruction. Altevogt [0028] teaches using a dependency check prior to scheduling to improve the efficiency of the scheduling process, Maiyuran has shown that the claimed limitations are merely commonly known and used methods for dependency-based scheduling and thus applicants have merely claimed the combination of known parts in the field to achieve predictable results and is therefore rejected under 35 USC 103.

As per claim 2, the combination of Altevogt and Maiyuran further teach:
The circuitry of claim 1, in which the scheduler circuitry comprises respective instruction picker circuitry to select queued processing instructions for issue to execution units of the two or more clusters of execution units. (Altevogt [0039])

As per claim 6, the combination of Altevogt and Maiyuran further teach:
The circuitry of claim 1, in which the execution units comprise one or more execution stages configured to operate according to successive execution clock cycles. (Altevogt [0030])

As per claim 7, the combination of Altevogt and Maiyuran further teach:
The circuitry of claim 6, in which each given cluster of execution units comprises a first data path configured to transfer an operand generated by an execution unit of that cluster during a given clock cycle to an input of all execution units of the given cluster of execution units, the first data path having a first data path latency between generation of the operand and completion of transfer of the operand to the input of all execution units of the given cluster. (Altevogt [0026] – [0027])

As per claim 16, the combination of Altevogt and Maiyuran further teach:
The circuitry of claim 1, in which at least some of the two or more clusters of execution units comprises two or more execution units to execute processing instructions. (Altevogt figure 1.)

As per claim 17, the combination of Altevogt and Maiyuran further teach:
The circuitry of claim 1, in which one of (i) the scheduler circuitry and (ii) an execution unit generating the given operand is configured to provide the indication of the availability of the given operand as a source operand for use in execution of queued processing instructions. (Maiyuran [0189].)

As per claim 18, it is the method variant of claim 1 and is therefore rejected under the same rationale.
Claim(s) 3 – 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altevogt and Maiyuran, and further in view of Zhao et al (US 20170031728, hereinafter Zhao).

As per claim 3, the combination of Altevogt and Maiyuran did not teach:
The circuitry of claim 2, in which the scheduler circuitry is configured to associate readiness data, indicating readiness for issue, with a queued processing instruction for which all of the source operands are available, the instruction picker circuitry being configured to select a queued processing instruction for which the readiness data indicates readiness for issue
However, Zhao teaches:
The circuitry of claim 2, in which the scheduler circuitry is configured to associate readiness data, indicating readiness for issue, with a queued processing instruction for which all of the source operands are available, the instruction picker circuitry being configured to select a queued processing instruction for which the readiness data indicates readiness for issue. (Zhao [0051])
It would have been obvious for one of ordinary skill in the art at the effective filing date of the claimed invention to incorporate the teaching of Zhao into that of Altevogt and Maiyuran in order to have the scheduler circuitry is configured to associate readiness data, indicating readiness for issue, with a queued processing instruction for which all of the source operands are available, the instruction picker circuitry being configured to select a queued processing instruction for which the readiness data indicates readiness for issue. Altevogt teaches scheduling instructions upon receiving them, however, one of ordinary skill in the art can easily see that a queued model maybe applied here as well without deviating from the existing prior art, thus applicants have merely claimed the combination of known parts in the field to achieve predictable results and is therefore rejected under 35 USC 103.

As per claim 4, the combination of Altevogt, Maiyuran and Zhao further teach: 	
The circuitry of claim 3, in which the scheduler circuitry is configured to inhibit the instruction picker circuitry associated with a cluster of execution units other than the cluster of execution units containing the execution unit which generated the last awaited one of the source operands from detecting the readiness data for the given queued processing instruction. (Altevogt [0039])

As per claim 5, the combination of Altevogt, Maiyuran and Zhao further teach: 	
The circuitry of claim 4, in which the instruction picker circuitry is configured to select a longest-queued or oldest processing instruction for which the readiness data indicates readiness for issue. (Zhao [0051])


Claim(s) 8 – 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altevogt and Maiyuran, and further in view of Levi et al (US 20200162397, hereinafter Levi).

As per claim 8, the combination of Altevogt and Maiyuran did not teach:
The circuitry of claim 7, in which the first data path latency is such that an execution unit of the given cluster may execute a processing instruction requiring an operand transferred by the first data path during a next following clock cycle after the given clock cycle.
However, Levi teaches:
The circuitry of claim 7, in which the first data path latency is such that an execution unit of the given cluster may execute a processing instruction requiring an operand transferred by the first data path during a next following clock cycle after the given clock cycle. (Levi [0008])
It would have been obvious for one of ordinary skill in the art at the effective filing date of the claimed invention to incorporate the teaching of Levi into that of Altevogt and Maiyuran in order to have the first data path latency is such that an execution unit of the given cluster may execute a processing instruction requiring an operand transferred by the first data path during a next following clock cycle after the given clock cycle. Levi has shown that it is commonly known to use the scheduling path with the smaller latency would accelerate the processing of the task, thus applicants have merely claimed the combination of known parts in the field to achieve predictable results and is therefore rejected under 35 USC 103.

As per claim 9, the combination of Altevogt and Maiyuran did not teach:
The circuitry of claim 7, in which the two or more clusters of execution units comprise a second data path configured to transfer an operand generated by an execution unit during a given clock cycle to an input of all execution units of the two or more clusters of execution units, the second data path having a second data path latency between generation of the operand and completion of transfer of the operand to the input of all execution units of the two or more clusters of execution units.
However, Levi teaches:
The circuitry of claim 7, in which the two or more clusters of execution units comprise a second data path configured to transfer an operand generated by an execution unit during a given clock cycle to an input of all execution units of the two or more clusters of execution units, the second data path having a second data path latency between generation of the operand and completion of transfer of the operand to the input of all execution units of the two or more clusters of execution units. (Levi [0008])
It would have been obvious for one of ordinary skill in the art at the effective filing date of the claimed invention to incorporate the teaching of Levi into that of Altevogt and Maiyuran in order to have a second data path configured to transfer an operand generated by an execution unit during a given clock cycle to an input of all execution units of the two or more clusters of execution units, the second data path having a second data path latency between generation of the operand and completion of transfer of the operand to the input of all execution units of the two or more clusters of execution units. Levi has shown that it is commonly known to use the scheduling path with the smaller latency would accelerate the processing of the task, thus applicants have merely claimed the combination of known parts in the field to achieve predictable results and is therefore rejected under 35 USC 103.

As per claim 10, the combination of Altevogt, Maiyuran and Levi further teach: 	
The circuitry of claim 9, in which the second data path latency is longer than the first data path latency. (Levi [0008])

As per claim 11, the combination of Altevogt, Maiyuran and Levi further teach: 	
The circuitry of claim 10, in which the second data path comprises one or more processor registers configured to store operands generated by an execution unit. (Altevogt [0054])

As per claim 12, the combination of Altevogt, Maiyuran and Levi further teach: 	
The circuitry of claim 10, in which the second data path latency is such that an execution unit of the given cluster may execute a processing instruction requiring an operand transferred by the second data path no earlier than at least two clock cycles after the given clock cycle. (Levi [0008], examiner notes that the limitation “no earlier than at least two clock cycles after the given clock cycle” is an obvious design choice and is not given patentable weight here.)

As per claim 13, the combination of Altevogt, Maiyuran and Levi further teach: 	
The circuitry of claim 10, in which the scheduler circuitry is configured to inhibit issue of the given queued processing instruction to an execution unit in a cluster of execution units other than the cluster of execution units containing the execution unit which generated that last awaited one of the source operands, until at least a predetermined non-zero number of clock cycles after receipt of the indication of availability of that last awaited one of the source operands. (Altevogt [0037])

Claim(s) 14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altevogt, Maiyuran and Levi, and further in view of Frazier et al (US 20130179886, hereinafter Frazier).

As per claim 14, the combination of Altevogt, Maiyuran and Levi did not teach:
The circuitry of claim 13, in which the scheduler circuitry is configured to generate a data mask to inhibit detection that the given queued processing instruction is ready for issue to an execution unit in a cluster of execution units other than the cluster of execution units containing the execution unit which generated that last awaited one of the source operands.
However, Frazier teaches:
The circuitry of claim 13, in which the scheduler circuitry is configured to generate a data mask to inhibit detection that the given queued processing instruction is ready for issue to an execution unit in a cluster of execution units other than the cluster of execution units containing the execution unit which generated that last awaited one of the source operands. (Frazier [0020])
It would have been obvious for one of ordinary skill in the art at the effective filing date of the claimed invention to incorporate the teaching of Frazier into that of Altevogt, Maiyuran and Levi to generate a data mask to inhibit detection that the given queued processing instruction is ready for issue to an execution unit in a cluster of execution units other than the cluster of execution units containing the execution unit which generated that last awaited one of the source operands. Levi has shown that using data mask is commonly known and used in resource allocation and scheduling, thus applicants have merely claimed the combination of known parts in the field to achieve predictable results and is therefore rejected under 35 USC 103.


As per claim 15, the combination of Altevogt, Maiyuran, Levi and Frazier further teach: 	
The circuitry of claim 14, in which the scheduler circuitry is configured to remove the data mask a predetermined number of clock cycles after generation of the last awaited one of the source operands. (Frazier [0020])

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Crum et al (US 20120023314) teaches “A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler”.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES M SWIFT whose telephone number is (571)270-7756. The examiner can normally be reached Monday - Friday: 9:30 AM - 7PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached on 5712723652. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES M SWIFT/Primary Examiner, Art Unit 2196