DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Attorney Thomas W. Kelton on 07/13/2022.














The application has been amended as follows: 
1.	(Currently Amended) A computer system comprising:
	a computer subsystem configured to act as a work accelerator, and
	a gateway connected to the computer subsystem, the gateway enabling data transfer to the computer subsystem from external storage in relation to pre-compiled data exchange synchronization points attained by the computer subsystem, which act as barriers between compute phases and exchange phases of the computer subsystem, wherein a plurality of processing units of the computer subsystem are configured to:
in response to entering a first of the compute phases, perform computations using data received from the gateway during a preceding one of the exchange phases; and
in response to entering a first of the exchange phases, exchange data with the gateway, wherein the computer subsystem comprises a plurality of memories associated with the plurality of processing units, at least one of the memories including a first compiled code sequence comprising at least one instruction executable by at least one of the plurality of processing units to:
	pull first data from a gateway transfer memory of the gateway during the first of the exchange phases, by issuing at least one read request to the gateway, and in response to a first of the pre-compiled data exchange synchronization points attained by the computer subsystem,
	wherein the first compiled code sequence comprises a sync instruction which, when executed by at least one of the plurality of processing units, causes the computer subsystem to participate in the first of the pre-compiled data exchange synchronization points,
	wherein the gateway comprises at least one processor configured to perform at least one operation to pre-load a first portion of the first dataa [[the]] first memory of the gateway to the gateway transfer memory in advance of execution of the sync instruction and, in response to the at least one read request, load a remaining portion of the first data into the gateway transfer memory from the first memory at a same time that the first portion of the first data is being pulled from the gateway transfer memory during the first of the exchange phases. 

2.	(Previously presented) The computer system as claimed in claim 1, wherein the first data belongs to a plurality of streams.

3.	(Previously presented) The computer system as claimed in claim 2, wherein the gateway transfer memory comprises a plurality of buffers, wherein each of the buffers is configured to store data belonging to an associated one of the plurality of streams.

4.	(Previously presented) The computer system as claimed in claim 3, wherein each of the buffers is a virtual data buffer, wherein at least one of the virtual data buffers store data in a physically discontiguous space in the gateway transfer memory. 

5.	(Previously presented) The computer system as claimed in claim 2, wherein the first compiled code sequence is configured to cause only one of the plurality of processing units to issue read requests to pull data of a first of the plurality of streams from the gateway transfer memory.

6.	(Previously presented) The computer system as claimed in claim 1, wherein the at least one processor is configured to, in advance of the first of the pre-compiled data exchange synchronization points attained by the computer subsystem, pre-load the first data to be pulled from the gateway transfer memory in response to each of a plurality of upcoming pre-compiled data exchange synchronization points attained by the computer subsystem.

7.	(Canceled) 

8.	(Canceled) 

9.	(Currently Amended) The computer system as claimed in claim 1 [[7]], wherein the at least one read request comprises at least one of:
an address of the first memory; and
a number of bytes to be pulled from the gateway transfer memory.

10.	(Previously presented) The computer system as claimed in claim 1, wherein the first compiled code sequence comprises at least one instruction executable by the computer subsystem to pull second data from the first memory in response to the first of the pre-compiled data exchange synchronization points attained by the computer subsystem.

11.	(Previously presented) The computer system as claimed in claim 10, wherein the at least one processor is configured to pre-load data of a first data stream from the first memory to the gateway transfer memory in advance of the first of the pre-compiled data exchange synchronization points attained by the computer subsystem, wherein the second data comprises data of a second data stream.

12.	(Previously presented) The computer system as claimed in claim 10, wherein the at least one processor of the gateway is configured to:
check whether memory availability requirements are met for pre-loading the first data and the second data into the gateway transfer memory.

13.	(Previously presented) The computer system as claimed in claim 1, wherein the at least one processor of the gateway comprises a field programmable gate array.

14.	(Previously presented) The computer system as claimed in claim 1, wherein the gateway comprises at least one instruction memory configured to store a second compiled code sequence expressing the at least one operation, wherein the first and second compiled code sequences are generated as a related set at compile time.

15.	(Previously presented) The computer system as claimed in claim 1, wherein the gateway comprises a streaming engine configured to execute a set of data transfer instructions to stream data through the gateway from the external storage to the computer subsystem, wherein the streaming engine comprises the at least one processor.

16.	(Previously presented) The computer system as claimed in claim 1, wherein the computer subsystem is configured to, in response to attaining the first of the pre-compiled data exchange synchronization points, transmit a synchronization request to the gateway,
	wherein the gateway is configured to, in response to receiving the synchronization request, transmit a synchronization acknowledgment to the computer subsystem,
	wherein the computer subsystem is configured to pull the first data from the gateway transfer memory in response to receiving the synchronization acknowledgement.

17.	(Previously presented) The computer system as claimed in claim 16, wherein the gateway is configured to:
	store a number of credits indicating availability of data for transfer to the computer subsystem at each of the pre-compiled data exchange synchronization points; and
	transmit the synchronization acknowledgment to the computer subsystem in response to determining that the number of credits comprises a non-zero number of credits.

18.	(Previously presented) The computer system as claimed in claim 1, wherein the gateway is configured to interface the computer subsystem with a host to enable the computer subsystem to act as a work accelerator to the host, wherein the computer system comprises an accelerator interface configured to connect the computer subsystem to the gateway to enable the data transfer from the gateway to the computer subsystem.

19.	(Currently Amended) A method performed by a system having a gateway connected to a work accelerator, wherein the work accelerator is implemented on a chip and the gateway is external to the chip, the gateway enabling transfer of data to the work accelerator from external storage in relation to data exchange synchronization points attained by the work accelerator, the data exchange synchronization points acting as barriers between compute phases and exchange phases of the work accelerator, the method comprising:
in response to entering a first of the compute phases, performing computations using data received from the gateway during a preceding one of the exchange phases; and
	in response to entering a first of the exchange phases, exchanging data with the gateway, including issuing at least one read request to the gateway,
	pre-loading a first portion of first dataa [[the]] first memory in the gateway to a gateway transfer memory in advance of receipt from the work accelerator of a synchronisation request corresponding to a first one of the data exchange synchronization points;
receiving the synchronization request at the gateway from the work accelerator after the first one of the data exchange synchronization points is attained and responding to the synchronization request with a synchronization acknowledgement and, in response to the at least one read request, loading a remaining portion of the first data into the gateway transfer memory from the first memory at a same time that the first portion of the first data is being pulled from the gateway transfer memory during the first of the exchange phases; and
	at the work accelerator, executing an instruction to pull the first data from the gateway transfer memory to the work accelerator after the receiving the synchronization acknowledgement and during the first of the exchange phases. 

20.	(Currently Amended) The method as claimed in claim 19, wherein executing the [[an]] instruction to pull comprises:
	pulling the first data via remote direct memory access (RDMA).

21.	(Previously presented) The method as claimed in claim 19, wherein the first data belongs to a plurality of streams.

22.	(Previously presented) The method as claimed in claim 19, wherein the gateway transfer memory comprises a plurality of buffers, the method further comprising: 
	each of the buffers storing data belonging to an associated one of a plurality of streams.

23.	(Previously presented) The method as claimed in claim 22, wherein each of the buffers comprises a virtual data buffer, wherein at least one of the virtual data buffers stores data in a physically discontiguous space in the gateway transfer memory.

24.	(Previously presented) The method as claimed in claim 19, further comprising:
	pre-loading further data to the gateway transfer memory in response an upcoming subsequent data exchange synchronization point.

25.	(Canceled)

26.	(Currently Amended) The method as claimed in claim 19 [[25]], wherein the at least one read request comprises an item selected from a list consisting of:
	a memory address; and
	a number of bytes to be pulled from the gateway transfer memory.

27.	(Currently Amended) The method as claimed in claim 19, further comprising :
	pulling second data from the first memory to the work accelerator in response to determining that memory availability requirements are not met for pre-loading the second data. 

28.	(Currently Amended) The method as claimed in claim 19, further comprising:
	storing N credits indicating availability of data for transfer to the work accelerator at each of the data exchange synchronization points; and
	wherein transferring N comprises a non-zero number of credits.

29.	(Currently Amended) A plurality of non-transitory machine-readable media having stored thereon instructions for performing a method for enabling data transfer from a gateway to a work accelerator in relation to data exchange synchronization points that act as barriers between compute phases and exchange phases of the work accelerator, the machine-readable media comprising machine executable code which when executed by at least one machine, causes the machine to:
in response to entering a first compute phase, perform computations using data received from the gateway during a preceding exchange phase;
in response to entering a first exchange phase, exchange data with the gateway including issuing at least one read request to the gateway;
	pre-load a first portion of first dataa [[the]] first memory of the gateway to a second memory of the gateway in advance of receipt from the work accelerator of a synchronisation request corresponding to a first one of the data exchange synchronization points;
	receive the synchronization request from the work accelerator after the first one of the data exchange synchronization points is attained and generate a synchronization acknowledgement and, in response to the at least one read request, load a remaining portion of the first data into the second memory of the gateway from the first memory of the gateway at a same time that the first portion of the first data is being pulled from the second memory of the gateway during the first exchange phase; and
	pull the first data, by the work accelerator, from the second memory after the synchronization acknowledgement and during the first exchange phase. 

30.	(Previously presented) The non-transitory machine-readable media of claim 29, further comprising machine executable code, which causes the machine to:
	pre-load further data to the second memory in response an upcoming subsequent data exchange synchronization point.

31.	(Canceled)

32.	(Currently Amended) The non-transitory machine-readable media of claim 29 [[31]], wherein the at least one read request comprises an item selected from a list consisting of:
	a memory address; and
	a number of bytes to be pulled from the second memory.

33.	(Currently Amended) The non-transitory machine-readable media of claim 29, further comprising machine executable code, which causes the machine to[[ ]]:
	pull[[ing]] second data from the first memory to the work accelerator in response to determining that memory availability requirements are not met for pre-loading the second data.

34.	(Currently Amended) The non-transitory machine-readable media of claim 29, further comprising machine executable code, which causes the machine to:
	store N credits indicating availability of data for transfer to the work accelerator at each data exchange synchronization point; and
	wherein allowing the work accelerator to pull N comprises a non-zero number of credits.

35.	(Currently Amended) The computer system as claimed in claim 1, wherein the gateway transfer memory comprises a plurality of buffers, and the gateway comprises: 
a streaming engine comprising the at least one processor; 
at least one instruction memory configured to store a second compiled code sequence executable by the at least one processor of the streaming engine to pre-load at least some [[of]] data from the first buffer of the plurality of buffers in advance of execution of the sync instruction by the at least one of the plurality of processing units,
wherein the first compiled code sequence and the second compiled code sequence[[s]] are generated as a related set at compile time to make at least one buffer in a predetermined order.



Reasons for Allowance
Claims 1-6, 9-24, 26-30, and 32-35 are allowed.
The following is an examiner’s statement of reasons for allowance: 
The known prior art of record, taken alone or in combination, was not found to teach, in combination with other limitations in the claims, a gateway connected to a computer subsystem enabling data transfer in relation to pre-compiled data exchange synchronization points which acts as barriers between compute and exchange phases, in response to a pre-compiled data exchange synchronization point attained by the computer subsystem pulling data from a gateway transfer memory of the gateway during a first exchange phase by issuing at least one read request to the gateway, wherein the gateway pre-loads a first portion of first data from a first memory of the gateway to the gateway transfer memory in advance of execution of a sync instruction and loads a remaining portion of the first data into the gateway transfer memory from the first memory at a same time that the first portion of the first data is being pulled from the gateway transfer memory during the first exchange phase, as required by claims 1, 19, and 29.
The closest prior art of record was found to be:
US 2012/0198214 (hereinafter, Gadre) which teaches a computer subsystem (Fig. 2 208), a gateway (Fig. 3A MMU 328), a gateway transfer memory (Fig. 3A L1.5 cache 335), and a first memory (Fig. 3B L2 cache 350). However, Gadre does not teach the gateway transferring data to the computer subsystem in relation to pre-compiled data exchange synchronization points or pre-loading data from L2 in advance of execution of a sync instruction causing the computer subsystem to participate in a first pre-compiled data exchange synchronization point as required by claims 1, 19, and 29.
US 5,659,713 (hereinafter, Goodwin) which teaches a controller that refills a stream buffer in response to the stream buffer indicating an empty condition (col 9 lines 20-25). However, Goodwin does not teach refilling the stream buffer at a same that data is being pulled from the stream buffer, or the stream buffer being pre-loaded with a first portion of data in advance of a sync instruction as required by claims 1, 19, and 29.
US 5,239,387 (hereinafter, Stein) which teaches a buffer that is filled at the same time it is emptied (col 3 lines 42-45). However, Stein does not teach the buffer being pre-loaded with a first portion of data in advance of a sync instruction and loaded with a remaining portion of the data at a same time that the first portion if being pulled during a first exchange phase. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KASIM ALLI/Examiner, Art Unit 2183                                                                                                                                                                                                        
/JYOTI MEHTA/Supervisory Patent Examiner, Art Unit 2182