EXAMINER’S AMENDMENT


The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

An examiner's amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it must be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in a telephone interview with Attorney Mr. Thedford I. Hitaffer, Reg. No. 38,490 on 05/18/2022.


Please amend the claims  1, 3, 5-6, 13-14 and 18  as following:

1. (Currently amended) A method of managing task dependencies at runtime in a parallel computing system of a hardware processing system, said parallel computing system comprising a multi-core processor running runtime software, a hardware acceleration processor, a communication module, and a gateway, the method comprising:
	initializing the parallel computing system;
	allocating data buffers in system memory for each thread running in each multi-core processing element;
	sending system memory address and length of buffers used in a communication to the hardware acceleration processor using buffered and asynchronous communication;
	the hardware acceleration processor directly accessing the buffers bypassing the threads running in the multi-core processing elements, the multi-core processing elements being parallel to one another;
	the hardware acceleration processor reading a new task buffer comprising information of a new task and upon sensing a memory full condition or a memory conflict in a dedicated local memory attached to the hardware acceleration processor, instructing the gateway to stop receiving new tasks;
	the hardware acceleration processor processing a last read task and dependencies of the last read task; and
	having finished processing the last read task and the dependencies of the last read task, memory space of the hardware acceleration processor is freed and processing continues with a next task, and
	where the tasks are nested tasks, and where at least one of the new tasks or the last read task is a first child task, the method further comprising:
	if the first child task has been processed, the runtime software allowing the hardware acceleration processor to continue processing tasks;
	if the first child task has not been processed, waiting until a threshold time is reached or checking the new task buffer status before instructing the thread to lock the new task buffer and remove child tasks from the buffer, including at least the first child task;
	the runtime software reconstructing the new task buffer by updating all corresponding pointers if the new task buffer has any remaining tasks that have been created by other tasks;
	the thread executing tasks in order or submitting them as a whole with their child dependencies to a software task dependency graph manager; and 
	the runtime software reverting to the hardware acceleration processor for task and dependency management when the memory full or the memory conflict signal is cleared.


3. (Currently amended) The method according to claim 1, wherein the hardware acceleration processor comprises a dedicated local memory for storing task dependencies, a dedicated local memory for storing task producer and creator threads, and a dedicated local memory for storing task identification (ID) and number of dependencies, and a memory for using once at least one of the dedicated local memories becomes full or gets into memory conflicts, the method further comprising using a memory full signal to alter execution of the flow of tasks until memory space of the hardware acceleration processor is freed.

5. (Currently amended) The method according to claim 1, further comprising:
	receiving at the gateway, metadata and dependencies of the new task, said task having been sent from a thread controlled by the runtime software running at the interconnected multi-core processor;
	informing the gateway of free memory space in a dedicated local task memory in the hardware acceleration processor; 
	the gateway distributing the new task to an internal task reservation module and to an internal dependence chain tracker module, each being a part of the hardware acceleration processor;
		the hardware acceleration processor keeping track of task processing in the multi-core processors, task dependencies, dedicated local memory and free 
memory space of the hardware acceleration processor.

6. (Currently amended) A hardware acceleration processor for runtime task dependency management in a multi-core processor, said multi-core processor comprising interconnected processing elements, where one of the multi-core processing elements is a master and the other multi-core processing elements are slaves, and said hardware acceleration processor comprising:
	data communication logic to communicate with the master interconnected processing elements of the multi-core processor instead of with each one of the slave interconnected processing elements, a system memory, and other peripheral modules;
	a gateway element reading new and finished tasks from a plurality of threads running in the processing elements of the multi-core processor through communications buffers and providing the information to internal components of the hardware acceleration processor;
	dedicated local memory to store data characterizing tasks and task dependencies managed by the hardware acceleration processor; 
	the control logic to resolve deadlocks in multi-task parallel operation, said control logic enforcing task dependencies and retaining task data, and
	where the control logic for resolving deadlocks in multi-task parallel operation is in communication with runtime software running at the multi-core processor, said processor further comprising logic:
	when a memory full condition arises in the dedicated local memory of the hardware acceleration processor, the runtime software to check that the first child task has been successfully read by said processor, and to allow default operation with hardware runtime task dependency management, or if the first child task has not been read successfully and after a time threshold has been reached or the memory full condition has been checked in the new task buffer, the runtime software to allow the threads to lock the new task buffer, remove all its children, reconstruct the buffer by updating corresponding pointers if the new task buffer has any remaining tasks that have been created by other running tasks, and either execute child tasks in order by a thread without allowing them to create other tasks or submit them as a whole to a software task dependency graph manager, which maintains graphs of tasks with dependencies in separate task dependence graphs and schedules when the tasks are executed;
	when the memory full condition in the dedicated local memory of the hardware acceleration processor is cleared, the runtime software reverts to the hardware acceleration processor 

13. (Currently Amended) The hardware acceleration processor according to claim 6, wherein the hardware acceleration processor comprises:
a set of processors and a set of accelerators;
bypass logic to directly submit ready tasks to the set of accelerators.

14. (Currently amended) A non-transitory computer program product that causes a hardware acceleration processor to perform hardware-based runtime task dependency management in a multi-core processor interconnected with the hardware acceleration processor, the non-transitory computer program product having instructions to:
	receive at a gateway, metadata and dependencies of a new task, said task having been sent from a thread controlled by runtime software running at the interconnected multi-core processor;
	cause the gateway to sense free memory space in a dedicated local task memory in the hardware acceleration processor;
	cause the gateway to distribute the new task to an internal task reservation module and to an internal dependence chain tracker module in the hardware acceleration processor; 
	cause the hardware acceleration processor to keep track of task processing in the multi-core processors, task dependencies, dedicated local memory and free memory space of the hardware acceleration processor;
	cause the hardware acceleration processor to detect an internal dedicated memory full condition, and to instruct the gateway to stop receiving any new tasks from runtime software controlling runtime of the multi-core processors;
	resolve the memory full condition and any deadlocks created by the memory full condition 
	execute software to select software dependency management depending on detection of the memory full condition and any deadlocks the memory full condition creates; and 
	execute software to use hardware dependency management after the runtime software detects memory full condition is over, and
	where the tasks are nested tasks, and where at least one of the new tasks or the last read task is a first child task, further comprising:
	if the first child task has been processed, the runtime software allowing the hardware acceleration processor to continue processing tasks;
	if the first child task has not been processed, waiting until a threshold time is reached or checking the new task buffer status before instructing the thread to lock the new task buffer and remove child tasks, including at least the first child task;
	the runtime software reconstructing the new task buffer by updating all corresponding pointers if the new task buffer has any remaining tasks that have been created by other tasks;
	the thread executing tasks in order or submitting them as a whole with their child dependencies to a software task dependency graph manager; and 
	the runtime software reverting to the hardware acceleration processor for task and dependency management when the memory full or the memory conflict signal is cleared.


18. (Currently amended) A non-transitory computer program product that causes a hardware acceleration processor to perform hardware-based runtime task dependency management in a multi-core processor interconnected with the hardware acceleration processor, the non-transitory computer program product having instructions to:
	receive at a gateway, metadata and dependencies of a new task, said task having been sent from a thread controlled by runtime software running at the interconnected multi-core processor;
	cause a gateway to sense free memory space in a dedicated local task memory in the hardware acceleration processor;
	cause the gateway to distribute the new task to an internal task reservation module and to an internal dependence chain tracker module in the hardware acceleration processor; 
	cause the hardware acceleration processor to keep track of task processing in the multi-core processors, task dependencies, dedicated local memory and free memory space of the hardware acceleration processor;
	cause the hardware acceleration processor to detect an internal dedicated memory full condition, and to instruct the gateway to stop receiving any new tasks from runtime software controlling runtime of the multi-core processors;
	resolve the memory full condition and any deadlocks created by the memory full condition 
	execute software to select between hardware or software dependency management depending on detection of the memory full condition and any deadlocks the memory full condition creates; and 
	execute software to use hardware dependency management after the runtime software detects memory full condition is over, and
	where the control logic for resolving deadlocks in multi-task parallel operation is in communication with runtime software running at the multi-core processor, said processor further comprising logic:
			when a memory full condition arises in the dedicated local memory of the hardware acceleration processor, the runtime software to check that the first child task has been successfully read by said processor, and to allow default operation with hardware runtime task dependency management, or if the first child task has not been read successfully and after a time threshold has been reached or the memory full condition has been checked in the new task buffer, the runtime software to allow the threads to lock the new task buffer, remove all its children, reconstruct the buffer by updating corresponding pointers if the new task buffer has any remaining tasks that have been created by other running tasks, and either execute child tasks in order by a thread without allowing them to create other tasks or submit them as a whole to a software task dependency graph manager, which maintains graphs of tasks with dependencies in separate task dependence graphs and schedules when the tasks are executed;
	when the memory full condition in the dedicated local memory of the hardware acceleration processor is cleared, the runtime software s to the hardware acceleration processor 


Reasons for Allowance


The following is an examiner’s statement of reasons for allowance:

Interpreting the claims in light of the specification examiner finds the claimed invention is patentably distinct from the prior art of record. The prior art of record does not expressly teach or render obvious the invention as recited in amended independent claims.

Raman (US 2016/0217016 A1) teaches a method of managing task dependencies at runtime in a parallel computing system of a hardware processing system comprising a multi-core processor running runtime software, a hardware acceleration processor, a communication module, and a gateway, the method comprising: initializing the parallel computing system; allocating data buffers in system memory for each thread running in each multi-core processing element; sending memory address and length of buffers used in a communication to the hardware acceleration processor; the hardware acceleration processor directly accessing the buffers bypassing the threads running in the multi-core processing elements, the multi-core processing elements being parallel to one another; the hardware acceleration processor reading a new task buffer and upon sensing a memory full condition  or a memory conflict in a dedicated local memory attached to the hardware acceleration processor; the hardware acceleration processor processing dependencies of a last read task; and having finished dependency processing of the last read task, processing continues with a next task.
Pappalardo et al. (US 2008/0294871 A1) teaches multidimensional processor architecture including sending data using buffered and asynchronous communication.
Le Grand (US 9,058,678 B1) teaches system and method for reducing the complexity of performing broad-phase collision detection on GPUs including instructing the gateway to stop receiving new task if no core is asserting ready signal.
Busaba et al. (US 2015/0378899 A1) teaches transactional execution processor having co-processor accelerator sharing a higher level cache and freeing memory space after successful ending of the transaction.

The combination of prior art of record does not expressly teach or render obvious the limitations of “if the first child task has been processed the runtime software allowing the hardware acceleration processor to continue processing tasks; and if the first child task has not been processed, waiting until a threshold time is reached or checking the new task buffer status before instructing the thread to lock the new task buffer and remove child tasks from the buffer, including at least the first child task; the runtime software reconstructing the new task buffer by updating all corresponding pointers if the new task buffer has any remaining tasks that have been created by other tasks; the thread executing tasks in order or submitting them as a whole with their child dependencies to a software task dependency graph manager; and the runtime software reverting to the hardware acceleration processor for task and dependency management when the memory full or the memory conflict signal is cleared, wherein the task is nested task”, when taken in the context of the claims as a whole, as recited in claim independent claims 1, 6, 14 and 18 were not disclosed in the prior art of record.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABU ZAR GHAFFARI whose telephone number is (571)270-3799.  The examiner can normally be reached on Monday-Thursday 9:00 - 17:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai AN can be reached on 571-272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


ABU ZAR GHAFFARI
Primary Examiner
Art Unit 2195


/ABU ZAR GHAFFARI/Primary Examiner, Art Unit 2195