DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending.


Claim Objections

Claim 1-2, 13, 15 objected to because of the following informalities:  

-- DRAM -- is abbreviated without reciting full form in claims 2 and 15.
-- (a) (b) (x) (y) (z) -- should be either -- (a) (b) (c) (d) (e) -- in claim 1 (i.e. consecutively numbered or use different numbering e.g. a, b, c and i, ii, iii etc). Similar deficiency exist in claim 13.
-- system from implementing -- should be -- system for implementing -- in claim 15 line 1.
-- SRAM -- is abbreviated without reciting full form in claim 15.
-- DMA -- is abbreviated without reciting full form in claim 15.

Appropriate correction is required.

Specification

The disclosure is objected to because of the following informalities: 
-- whether or to what confidence level objects -- should be rephrased in [0002].
-- SRAM -- is abbreviated in [0020] without full form.
-- stall-til-completed -- should be --stall-till-completed --  in [0083].  
-- localility -- should be -- locality -- in [0083].
Appropriate correction is required.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.




Claims 1-20 are rejected under 35 U.S.C. 112 (b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or joint inventor regards as the invention.

The following claim language is not clearly understood:

Claim 1 recites “generating a computer program”. It is unclear if the generation of the program is based on a given input or the program is generated without any input to the method.

Claim 1 recites “receiving a description of the machine learning network”. It is unclear if the received description is used in the method or not (.e.g. generating a computer program that implements the machine learning network on the MLA based on the received description).
 
Claim 11 recites “known duration for the MLA instructions”. It is unclear if the known duration is referring to the processing duration or not. Similar deficiency exists in claim 12. 

Claim 14 recites “substantially more cycles”, which is indefinite. It is unclear if the cycles referring to the CPU cycles or accelerator cycles or execution cycles.

Claims 13 and 15 recite elements of claim 1 and have similar deficiency as claim 1. Therefore, they are rejected for the same rational. Remaining dependent claims are also rejected due to their dependency on the rejected independent claims.

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 13-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more or integrating into practical application. 
 
Based upon at least the decision by the United States Supreme Court in Alice Corp. v. CLS Bank Int'l, 134 S. Ct. 2347, 2354 (2014), post-Alice precedential court decisions, and 2019 Revised Patent Subject Matter Eligibility Guidance, claims 13-14 are determined to be directed to an abstract idea.  Examples of abstract ideas include at least Mathematical concepts, Mental process and Certain Methods of organizing human activity.

Step 1: 
Claims 1, 13 recite a method, which falls within the “process” category of 35 U.S.C. § 101. Claim 15 recites a system … comprising off-chip memory and a machine learning accelerator, which falls within the “machine” category of 35 U.S.C. § 101. Thus, the analysis determines whether the claims recite a judicial exception and fail to integrate the exception into practical application. See Memorandum, 84 Fed. Re. 54-55. If both elements are satisfied, the claims are directed to a judicial exception under the first step of the Alice/Mayo test, See id.

Step 2A, Prong One

Independent claim 13 recites the following steps:
[i] 	method for generating a computer program comprising (a) deterministic instructions for execution by a plurality of interconnected data handling units; and (b) memory access instructions for retrieving data used by the deterministic instructions, the data stored in remote memory;
[ii]	scheduling the deterministic instructions according to a static schedule
[iii]	scheduling the memory access instructions so that (x) there is no contention among memory access instructions for access to the remote memory, and (y) the memory access instructions are scheduled early enough that the retrieved data is available before the retrieved data is used by the deterministic instructions executing according to the static schedule.

The overall process described by steps [ii]-[iii] describes “concepts performed in the human mind” or “observation, evaluation, judgement, opinion.” Memorandum, 84 Fed. Reg, 52. Thus steps [ii]-[iii] recite the abstract concept of [m]ental processes.” Id. For example, in step [ii], “scheduling the deterministic instructions according to a static schedule” similar to the idea of assigning the time/resource (scheduling) to the instructions (task) and is based on combination of observation, evaluation, judgement and opinion i.e. scheduling can be performed by human mind alone or with the help of pen and paper. Similarly, step [iii] recites scheduling of instruction with certain constraints and at a high level of generality may be performed by human mind alone or with the help of pen and paper and is a combination of observation, evaluation, judgement and opinion. Thus, claim 13 recites a judicial exception. 

Step 2A, Prong Two
Because claims 13 recite a judicial exception, Analysis determines if the claims recites additional elements that integrate the judicial exception into practical application.
In addition to the limitations of claim 13 discussed above that recite the abstract concepts, claim 13 further recites in step [i], “method for generating a computer program comprising (a) deterministic instructions for execution by a plurality of interconnected data handling units; and (b) memory access instructions for retrieving data used by the deterministic instructions, the data stored in remote memory”.  The additional elements of claim 13 of “generating computer program and program comprising instructions that is being executed by the data handling units and instructions for retrieving data from the remote memory” are generic method of computing and are not inventive and may not make the abstract idea patentable either alone or in combination. The Specification doesn’t provide additional details that would distinguish the additional limitations from a generic implementation of the abstract idea. The method for generating program can be broadly categorized as generic computing methods as recited in the independent claims. Thus, these additional claim elements, under broadest reasonable interpretation, do not integrate the judicial exception into a practical application.
Thus, claims 13 is directed to a judicial exception because claims 13 do not recite additional elements that integrate the judicial exception into a practical application.
Step 2B
Because claims 13 is directed to judicial exception, analysis must determine, according to Alice, whether these claims recite an element, or combination of elements that is enough to ensure that the claim is directed to significantly more than a judicial exception. 
The Memorandum, Section III (B) (footnote 36) states:
In accordance with existing guidance, an Examiner’s conclusion that an additional element (or combination of elements) is well understood, routine, conventional activity must be supported with a factual determination. For more information concerning evaluation of well-understood, routine, convention activity, see MPEP 2106.05(d), as modified by the USPTO Berkheimer Memorandum.

The Berkheimer Memorandum, Section III(A)(1) states:
A Specification demonstrates the well-understood, routine, conventional nature of additional elements when it describes the additional elements as well-understood or routine or conventional (or an equivalent term), as a commercially available product, on in a manner that indicates that the additional elements are sufficiently well-known that the specification does not need to describe the particulars of such additional elements to satisfy 35 §U.S.C. 112(a). A finding that an element is well-understood, routine, or conventional cannot be based only on the fact that the specification is silent with respect to describing such element.

Regarding the additional claim elements, the Specification doesn’t provide additional details that would distinguish the additional limitations as recited in the claim from a generic implementation of the abstract idea. Further the additional limitations of “[i] method for generating a computer program comprising (a) deterministic instructions for execution by a plurality of interconnected data handling units; and (b) memory access instructions for retrieving data used by the deterministic instructions, the data stored in remote memory” is similar to the idea of generating computer program for instructions to be executed by the processing unit and instructions for retrieving data according to broadest reasonable interpretation of the claim and is resembles the idea of generic computing method. There is no indication that the recited claim elements override the conventional use of known features or involve an unconventional arrangement or combination of elements such that the particular combination of generic technology results in anything beyond well-understood, routine, and conventional data gathering and output. Alice, 573 U.S. at 223 (“[T]he mere recitation of a generic computer cannot transform a patent ineligible abstract idea into a patent-eligible invention.”) See also Customedia Techs. LLC v. Dish Network Corp., 951 F.3d 1359, 1366(Fed. Cir. 2020) (“[T]he invocation of ‘already-available computers that are not themselves plausibly asserted to be an advance…amounts to a recitation of what is well-understood, routine, and conventional.”)(quoting SAP Am., Inc. v. InvestPic, LLC, 898F3.d 1161, 1170 (Fed. Cir. 2018)); and buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355(Fed. Cir 2014)(“That a computer receives and sends the information over a network -- with no further specification -- is not even arguably inventive.”).

Thus, Claim 13 is not directed to significantly more than a patent ineligible concept. 
Dependent claim 14 further recites the abstract idea of mental process because it recites retrieving data from the remote memory requires substantially more cycles that executing deterministic instructions, which is combination of observation and evaluation.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-12 are rejected under 35 U.S.C. 103 as being unpatentable over Sridharan et al. (US 2019/0205745 A1, hereafter Sridharan)  in view of Knowles (US 10,585,716 B2 A1) and further in view of Asghar (US 2019/0213160 A1).

As per claim 1, Sridharan teaches the invention substantially as claimed including a 
computer-implemented method for implementing a machine learning network on a machine learning accelerator (MLA) ([0167] deploying neural networks for machine learning [0212] deep learning accelerator), the method comprising: 
receiving a description of the machine learning network ([0164] machine learning software stack, machine learning application 1502, train, neural network [0165] machine learning application, enabled, machine learning framework, provide library of machine learning primitives [0166] input data received from the machine learning application); and 
generating a computer program that implements the machine learning network on the MLA ([0167] deploying neural networks for machine learning [0211] fig. 21 neural network topologies CNN, RNN, DNN [0212] fig. 21A implement functionality, machine learning architecture 2100 hardware 2114 includes deep learning accelerator [0379] computer program); 
wherein the MLA comprises a mesh of interconnected data handling units (DHUs) implemented on a semiconductor die ([0143] fig. 11 B implementations of accelerators, multiple units of hardware logics, substrate, include one or more portions of any of the processor cores, graphics processor, within a semiconductor die, interconnect structure 1173 [0147] system on chip, one or more cores [0174] CNN, fully connected layers fig. 18 trained neural network 1808); 
the computer program comprises (a) MLA instructions for execution by the DHUs for implementing the machine learning network ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0081] scheduling and management tasks, graphics core, graphics microcontroller 538 perform workloads scheduling on the various graphics parallel engines with execution unit arrays 502 504 [0167] parallel processing, training and deploying neural networks for machine learning); and 
(b) memory access instructions for retrieving data from off- chip memory for use by the MLA instructions ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory), wherein the DHUs share access to the off-chip memory ([0080] global memory, shared, between the graphics core 500 [0310] graphics multiprocessor, access off-chip global memory, local processor memory and/or system memory); and 
generating the computer program comprises ([0379] computer program): statically scheduling the MLA instructions ([0081] perform various scheduling and management tasks for the graphics core 500 workload scheduling [0167] deploying neural networks for machine learning); and 
scheduling the memory access instructions so that (x) there is no contention by DHUs for access to the off-chip memory ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0081] scheduling and management tasks, graphics core [0310] graphics multiprocessor, access, off-chip global memory), (y) because there is no contention, the memory access instructions have known duration at time of scheduling, and (z) given the known duration, the memory access instructions are scheduled early enough that, at time of scheduling, it is known that the retrieved data will be available in the DHUs before the retrieved data is used by the statically scheduled MLA instructions.

Sridharan doesn’t specifically teach statically scheduling instructions so that (x) there is no contention by DHUs, (y) because there is no contention, the memory access instructions have known duration at time of scheduling, and (z) given the known duration, the memory access instructions are scheduled early enough that, at time of scheduling, it is known that the retrieved data will be available in the DHUs before the retrieved data is used by the statically scheduled MLA instructions.

Knowles, however, teaches statically scheduling instructions so that (x) there is no contention by DHUs, (y) because there is no contention (col 9 lines 23-40 tasklet, causally scheduled, deterministic program behavior free from deadlocks i.e. no contention), the memory access instructions have known duration at time of scheduling (col 9 lines 40-60 different tasklets, differing complexity, take different amounts of time to complete), and (z) given the known duration (col 9 lines 40-60 tasklets, amount of time to complete), the memory access instructions are scheduled early enough that, at time of scheduling, it is known that the retrieved data will be available in the DHUs before the retrieved data is used by the statically scheduled MLA instructions (col 9 lines 23-31 tasklets, vertices, must be causally scheduled i.e. scheduled so that, for every tasklet which has a non-pipelined edge directed to another tasklet, the former is completed in its entirety before the latter begins col 18 lines 57-67 code let, output/input values, distributed in memory, retrieved by the codelet). 

It would have been obvious to one of ordinary skills in the art before the effective filing date of the invention was made to combine the teachings of Sridharan with the teachings of Knowles of causally scheduling deterministic program and providing guaranteed deterministic behavior without deadlock, amount of time to complete the tasklet and output of the completed tasklet as input before the beginning of the latter tasklet to improve efficiency and allow scheduling instructions so that (x) there is no contention by DHUs, (y) because there is no contention, the memory access instructions have known duration at time of scheduling, and (z) given the known duration, the memory access instructions are scheduled early enough that, at time of scheduling, it is known that the retrieved data will be available in the DHUs before the retrieved data is used by the scheduled MLA instructions to the method of Sridharan as in the instant invention.

Sridharan and Knowles, in combination, do not specifically teach statically scheduled instruction.

Asghar, however, teaches instruction statically scheduled instruction ([0035] scheduler, static, control scheduling).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teachings of Sridharan and Knowles with the teachings of Asghar of static scheduler control scheduling to improve efficiency and allow instruction static schedule to the method of Sridharan and Knowles as in the instant invention.

As per claim 2, Sridharan teaches wherein the off-chip memory is DRAM ([0052] fig. 1 processors 102 memory  device 120 DRAM). 

As per claim 3, Sridharan teaches as a result of scheduling the memory access instructions ([0081] scheduling and management tasks [0055] each processor core, access to one or more internal/shared cache unit [0057] memory controllers, manage, access, external memory), the DHUs effectively each have a dedicated port to the off-chip memory ([0095] data port, memory access mechanism [0117] execution units, interconnect via data port, perform memory access).  
Knowles teaches remaining claim elements of dedicated port (col 5 lines each computing unit, processing unit, memory having at least two memory ports, assign port to input/output region).

As per claim 4, Sridharan teaches executing the MLA instructions, assuming that data retrieved from off-chip memory is available ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0071] processing, instructions, graphics core array [0306] issue instructions to a set of processing engines [0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory).  

As per claim 5, Sridharan teaches wherein executing the MLA instructions occurs without first confirming availability of the data retrieved from off-chip memory ([0071] processing, instructions, graphics core array [0306] issue instructions to a set of processing engines [0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory).  

As per claim 6, Sridharan teaches executing the memory access instructions, assuming that access to the off-chip memory is available ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory).  

As per claim 7, Sridharan teaches wherein executing the memory access instructions occurs without run-time contention resolution, arbitration or congestion avoidance for access to the off-chip memory ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory).  
Knowles teaches remaining claim elements of memory access without run-time contention resolution, arbitration or congestion avoidance (col 9 lines 23-40 tasklet, causally scheduled, deterministic program behavior free from deadlocks i.e. no contention).

As per claim 8, Sridharan teaches wherein the DHTUs comprise processing elements 
(PEs) and storage elements (SEs) (fig. 5 graphic processor core 500 execution unit array 502 [0096] fig. 6B graphics execution unit 608  FPU, ALU, registers); and the MLA instructions ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic ) comprise (a) compute instructions for execution by the PEs for executing computations in the machine learning network ([0068] graphics execution units to process the 3D and media threads [0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0201] machine learning compute operations, performed, compute noes) and (b) data transfer instructions for execution by the PEs and/or SEs for data transfer between PEs and/or SEs ([0201] data transfer, machine learning computations, multiple nodes [0315] instructions, access, local, shared, global address space); and the memory access instructions are for execution by SEs ([0315] instructions, access, local, shared, global address space).

As per claim 9, Sridharan teaches statically scheduling the MLA instructions comprises statically scheduling instructions within each of a plurality of blocks of MLA instructions ([0081] perform various scheduling and management tasks for the graphics core 500 workload scheduling [0167] deploying neural networks for machine learning [0077] graphics core, blocks of general purpose and fixed function logic), wherein each block of MLA instructions comprises a set of (i) compute instructions ([0077] blocks of general purpose and fixed function logic) and (ii) corresponding data transfer instructions that transfer data used by the compute instructions ([0079] fixed function block includes media pipeline, decoding, encoding, pre-processing, and/or post-processing of multimedia data, including image and video data); and 31scheduling the memory access instructions comprises, for each block of MLA instructions ([0081] perform various scheduling and management tasks for the graphics core 500 workload scheduling ): scheduling the corresponding memory access instructions that retrieve data used by the set of MLA instructions in the block ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0081] scheduling and management tasks, graphics core [0310] graphics multiprocessor, access, off-chip global memory ([0079] fixed function block includes media pipeline, decoding, encoding, pre-processing, and/or post-processing of multimedia data, including image and video data).  

Asghar teaches remaining claim elements of instruction statically scheduled instruction ([0035] scheduler, static, control scheduling).

As per claim 10, Sridharan teaches wherein the computer program does not expressly include the schedule for executing ([0294] scheduler, allocate, work to cluster i.e. separate scheduler for scheduling and schedule is not included [0155] global scheduler  ) the MLA instructions and memory access instructions ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0071] processing, instructions, graphics core array [0306] issue instructions to a set of processing engines, [0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory).  

As per claim 11, Sridharan teaches statically scheduling the MLA instructions ([0081] perform various scheduling and management tasks for the graphics core 500 workload scheduling [0167] deploying neural networks for machine learning [0077] graphics core, blocks of general purpose and fixed function logic) is based on a known duration for the MLA instructions (col 9 lines 40-60 tasklets, amount of time to complete, different tasklets, differing complexity, take different amounts of time to complete); and is further based on a known topology of data transfer paths between different DHUs ([0162] selecting a network topology [0245] communication schedule, network topology, communication pattern fig. 16A fully connected CNN layers), and further comprises scheduling the MLA instructions in a manner that avoids collisions and a need for arbitrations for the data transfer paths ([0081] perform various scheduling and management tasks for the graphics core 500 workload scheduling [0167] deploying neural networks for machine learning).  
Knowles teaches remaining claim elements of scheduling in a manner that avoids collisions and a need for arbitrations for the data transfer paths (col 9 lines 23-40 tasklet, causally scheduled, deterministic program behavior free from deadlocks i.e. no contention).
Asghar teaches remaining claim elements of instruction statically scheduled instruction ([0035] scheduler, static, control scheduling).

As per claim 12, Sridharan teaches wherein statically scheduling the MLA instructions comprises: determining a duration for the MLA instructions, wherein the duration is independent of any run-time conditions (col 9 lines 40-60 tasklets, amount of time to complete, different tasklets, differing complexity, take different amounts of time to complete); and statically scheduling the MLA instructions based on the determined durations ([0081] perform various scheduling and management tasks for the graphics core 500 workload scheduling [0167] deploying neural networks for machine learning [0077] graphics core, blocks of general purpose and fixed function logic).
Asghar teaches remaining claim elements of instruction statically scheduled instruction ([0035] scheduler, static, control scheduling).


Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Sridharan in view of Knowles and further in view of Asghar, as applied to above claims.

As per claim 13, Sridharan teaches the invention substantially as claimed including a
method for generating a computer program ([0379] computer program) comprising (a) deterministic instructions for execution by a plurality of interconnected data handling units ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0071] processing, instructions, graphics core array [0306] issue instructions to a set of processing engines); and (b) memory access instructions for retrieving data used by the deterministic instructions ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory ), the data stored in remote memory (fig. 4 graphics processing engine 410 from memory i.e. off-chip memory [0080] SoC, use of and/or implements global memory atomics [0310] multiprocessor, off-chip global memory); the method comprising: 
scheduling the deterministic instructions according to a static schedule ([0155] scheduler, distribute execution threads, commands, set of compute clusters [0277] instructions, sequence/serialize, scheduler, set of operations and/or mico-operations); and scheduling the memory access instructions so that (x) there is no contention among memory access instructions for access to the remote memory ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory), and (y) the memory 32access instructions are scheduled early enough that the retrieved data is available before the retrieved data is used by the deterministic instructions executing according to the static schedule.  

Sridharan doesn’t specifically teach program comprising deterministic instructions, scheduling the deterministic instructions so that (x) there is no contention among memory access instructions, and (y) the memory 32access instructions are scheduled early enough that the retrieved data is available before the retrieved data is used by the deterministic instructions executing according to the static schedule. 

Knowles, however, teaches program comprising deterministic instructions (col 9 lines 35-38 guaranteed deterministic program behavior col 10 lines 5-10 deterministic program), scheduling the deterministic instructions so that (x) there is no contention among memory access instructions (col 9 lines 23-40 tasklet, causally scheduled, deterministic program behavior free from deadlocks i.e. no contention), and (y) the memory 32access instructions are scheduled early enough that the retrieved data is available before the retrieved data is used by the deterministic instructions executing according to the static schedule (col 9 lines 23-31 tasklets, vertices, must be causally scheduled i.e. scheduled so that, for every tasklet which has a non-pipelined edge directed to another tasklet, the former is completed in its entirety before the latter begins col 18 lines 57-67 code let, output/input values, distributed in memory, retrieved by the codelet). 

It would have been obvious to one of ordinary skills in the art before the effective filing date of the invention was made to combine the teachings of Sridharan with the teachings of Knowles of causally scheduling deterministic program and providing guaranteed deterministic behavior without deadlock and output of the completed tasklet as input before the beginning of the latter tasklet to improve efficiency and allow program comprising deterministic instructions, scheduling the deterministic instructions so that (x) there is no contention among memory access instructions, and (y) the memory 32access instructions are scheduled early enough that the retrieved data is available before the retrieved data is used by the deterministic instructions executing according to the schedule as in the instant invention.

Sridharan and Knowles, in combination, do not specifically teach instruction static schedule.

Asghar, however, teaches instruction static schedule ([0035] scheduler, static, control scheduling).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teachings of Sridharan and Knowles with the teachings of Asghar of static scheduler control scheduling to improve efficiency and allow instruction static schedule to the method of Sridharan and Knowles as in the instant invention.

As per claim 14, Sridharan teaches retrieving data from the remote memory requires substantially more cycles than executing deterministic instructions ([0309] processing, performed, clock cycles).
Asghar, teaches remaining claim elements of retrieving data requires substantially more cycles that executing ([0014] overall processing time may be reduced [0043] different network standards utilize different bandwidth, read/write access based on speed associated with bandwidth ).


Claims 15-16, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sridharan in view of Asghar, as applied to above claims.

As per claim 15, Sridharan teaches the invention substantially as claimed including  system
from implementing a machine learning network ([0167] deploying neural networks for machine learning), the system comprising: 
off-chip memory (fig. 4 graphics processing engine 410 from memory i.e. off-chip memory [0080] SoC, use of and/or implements global memory atomics [0310] multiprocessor, off-chip global memory); and 
a machine learning accelerator (MLA) comprising a mesh of interconnected data handling units (DHUs) implemented on a semiconductor die ([0080] fig. 5 graphics processor core 500  within SoC [0081] execution units arrays 502 [0084] machine learning acceleration logic [0085] graphics sub-core include multiple EU arrays 502 inter-thread communication 503 [0070] Each graphics core includes a set of graphics execution resources that includes general purpose and graphics specific execution logic to perform graphics and compute operations, as well as fixed function texture processing and/or machine learning and artificial intelligence acceleration logic); 
the DHTUs executing (a) statically scheduled MLA instructions for implementing the machine learning network ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0081] scheduling and management tasks, graphics core, graphics microcontroller 538 perform workloads scheduling on the various graphics parallel engines with execution unit arrays 502 504 [0167] parallel processing, training and deploying neural networks for machine learning), and (b) memory access instructions for retrieving data from the off-chip memory for use by the MLA instructions ([0080] graphics core, on-package DRAM, SoC interface 537, enable communication with fixed function devices within the SoC, enable the use of and/or implements global memory [0310] graphics multiprocessor, access, off-chip global memory); 
wherein the DHTUs share access to the off-chip memory ([0080] global memory, shared, between the graphics core 500 [0310] graphics multiprocessor, access off-chip global memory, local processor memory and/or system memory), but the memory access instructions are executed according to a schedule without a need for run-time contention resolution among the DHTUs for access to the off-chip memory ([0063] graphics core, discrete graphics processing unit, processor, processing core, memory interface to access memory, local memory/system memory [0089] execution units, memory access [0095] data port, memory access mechanism [0117] [0310] multiple instance, share common instructions and data).

Sridharan doesn’t specifically teach statically scheduled instructions, and memory access according to a schedule without a need for run-time contention resolution.

Asghar, however, teaches statically scheduled instructions ([0035] scheduler, static), and memory access according to a schedule without a need for run-time contention resolution ([0035] control scheduling, avoid collisions, accessing, shared memory, static i.e. no run-time contention resolution required).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the invention was made to combine the teachings of Sridharan with the teachings of Asghar for static scheduler controlling memory access scheduling to avoid collisions to improve efficiency and allow statically scheduled instructions, and memory access according to a schedule without a need for run-time contention resolution to the method of Sridharan as in the instant invention.

As per claim 16, Sridharan teaches the off-chip memory is DRAM ([0052] fig. 1 memory  device 120), and the DHTUs contain SRAM ([0147] fig. 12 IP cores, processors ,  SRAM memory devices 1265).  

As per claim 18, Sridharan teaches wherein the system is configured to execute the MLA 
instructions without circuitry for confirming availability of the data retrieved from off-chip memory ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0080] global memory, shared, between the graphics core 500 [0310] graphics multiprocessor, access off-chip global memory, local processor memory and/or system memory i.e. there is no circuit for confirming available of the data).  

As per claim 19, Asghar teaches wherein the system is configured to execute the memory access instructions without circuitry for run-time contention resolution, arbitration or congestion avoidance for access to the off-chip memory ([0035] control scheduling, avoid collisions, accessing, shared memory, static i.e. no run-time contention resolution required).
  
As per claim 20, Sridharan teaches wherein the DHTUs comprise processing elements 
(PEs) and storage elements (SEs) (fig. 5 graphic processor core 500 execution unit array 502 [0096] fig. 6B graphics execution unit 608  FPU, ALU, registers); and the MLA instructions ([0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic ) comprise (a) compute instructions for execution by the PEs for executing computations in the machine learning network ([0068] graphics execution units to process the 3D and media threads [0070] graphics core, set of graphics execution resources, perform, machine learning and artificial intelligence acceleration logic [0201] machine learning compute operations, performed, compute noes) and (b) data transfer instructions for execution by the PEs and/or SEs for data transfer between PEs and/or SEs ([0201] data transfer, machine learning computations, multiple nodes [0315] instructions, access, local, shared, global address space); and the memory access instructions are for execution by SEs ([0315] instructions, access, local, shared, global address space).


Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Sridharan in view of Asghar, as applied to above claims, and further in view of Nicol et al. (US 2019/0138373 A1, hereafter Nicol).

As per claim 17, Sridharan teaches comprising a DMA controller ([0051] memory controller 116), wherein the DHTUs have direct memory access to the DRAM ([0052] memory device 120, DRAM, memory controller 116 processor 108).  

Sridharan and Asghar, in combination, do not specifically teach DMA controller and direct memory access to the DRAM.
Nicol, however, teaches DMA controller and direct memory access to the DRAM ([0066] DMA controller [0067] processor to gain direct access to the data RAMs, DMA transfers).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention was made to combine the teachings of Sridharan and Asghar with the teachings of Nicol of DMA controller and processor gaining direct access to data RAM for data transfer to improve efficiency and allow DMA controller and direct memory access to the DRAM to the method of Sridharan and Asghar as in the instant invention.


Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Esmaeilzadeh, V; Hadi	US-20190287017-A1
Ross; Jonathan		US-10685295-B1
Sakharshete; Swapnil P.	US-20210026686-A1
Ghosh; Tapabrata		US-20210181974-A1
LO; Yun-Chen		US-20210173648-A1
Kazakov; Maxim V.		US-20210374607-A1
Khaitan; Harshit		US-20220092408-A1

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABU ZAR GHAFFARI whose telephone number is (571)270-3799. The examiner can normally be reached Monday-Thursday 9:00 - 17:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai AN can be reached on 571-272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ABU ZAR GHAFFARI
Primary Examiner
Art Unit 2195



/ABU ZAR GHAFFARI/Primary Examiner, Art Unit 2195