DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in this application.

Information Disclosure Statement
The IDS filed on 09/10/2019 has been considered. 

Specification
The disclosure is objected to because of the following informalities: paragraph [0032] recites “preemptive utilize” but it should be “preemptively utilize”.  
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
As per claims 1, 10, and 16 (line numbers refer to claim 1):

	Line 5 recites “identifying, by the CAD, a set of memory pages needed for completing the set of work elements” but it is unclear how the CAD identifies which memory pages are needed. 

Claims 2-9, 11-15, and 17-20 are dependent claims of 1, 10, and 16, respectively, so they are rejected for the same reasons as claims 1, 10, and 16 above. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 7, 10-12, 14, 16-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hou et al.  (US 9870256 B2 hereinafter Hou) in view of Tian et al. (US 10853118 B2 hereinafter Tian).

As per claim 1, Hou teaches the claimed invention substantially as claimed including a computer-implemented method (abstract lines 1-2 Provided is a hardware accelerator and method, central processing unit, and computing device) comprising: 
retrieving, by a coherent accelerator device (CAD), a set of work elements needed for completion from a work queue (Col. 3 lines 59-62 when a hardware accelerator receives a task E, there are still other tasks A, B, C, D existing in the hardware accelerator that have not been completed; Col. 6 lines 21-22 The import detector 10354 detects a new task imported into a queue in the task accelerating unit); 
determining, by the CAD, a length of time required to complete the set of work elements (Col. 3 lines 43-45 As shown in FIG. 5, as compared to FIG. 1, there is a task time prediction unit 1035 added in the hardware accelerator 103; Col. 3 lines 62-65 The task E needs two minutes to complete its execution, and tasks A, B, C, D need ten minutes to complete their execution. Thus, the total waiting time of new task E is 12 mins=10 mins+2 mins; Col. 6 lines 30-32 The calculation writer 10355 stores therein the time required to complete all tasks that have not yet been completed in the task accelerating unit); and 
communicating, by the CAD, the length of time required to complete the set of work elements to a memory (Col. 4 lines 50-60 The task loader 10321 sends the task execution time to the accumulator 10352, which stores therein the time required to complete all tasks that have not yet been completed at present in the task accelerating unit. The accumulator 10352 accumulates the task execution time to the accumulation result it stored for notifying the newly accumulated result to the hardware thread as the total waiting time of the new task. The accumulator 10352 returns the total waiting time of the new task to the task loader 10321, which in turn returns it to the specified address associated with the hardware thread via a bus interface; The CAD communicates the length of time because the accumulator, which notifies the hardware thread the length of time via a bus, is within the hardware accelerator.).

identifying, by the CAD, a set of memory pages needed for completing the set of work elements; and communicating, by the CAD, the set of memory pages required to complete the set of work elements to a virtual memory manager.

However, Tian teaches identifying, by the CAD, a set of memory pages needed for completing the set of work elements (abstract lines 13-16 the command parser to responsively update the first shadow page table responsive to determining a set of page table entries predicted to be used based on the analysis of the working set of commands; claim 3 a page request agent to determine memory pages which will be required for the GPU to execute one or more of the commands in the working set of commands; Col. 28 lines 12-15 hypervisor 1910 may construct SGGTTs on demand, which may include all the to-be-used translations for graphics memory virtual addresses from different GPU components' owner VMs); and 
communicating, by the CAD, the set of memory pages required to complete the set of work elements to a virtual memory manager (Fig. 14, 19; Col. 28 lines 36-44 In various embodiments, command parser 1918 may scan a command from a VM and determine if the command contains memory operands. If yes, the command parser may read the related graphics memory space mappings, e.g., from a GTT for the VM, and then write it into a workload specific portion of the SGGTT. After the whole command buffer of a workload gets scanned, the SGGTT that holds memory address space mappings associated with this workload may be generated or updated; Col. 19 lines 59-63 A virtualization stub module 1411 running in the hypervisor 1410 extends memory management to include extended page tables (EPT) 1414 for the user VMs 1431-1432 and a privileged virtual memory management unit (PVMMU) 1412; claim 5 the page request agent is to…directly invoke a hypervisor memory management unit (MMU) to retrieve The hypervisor is involved in virtual memory management and it reads required memory addresses from the GTT of a GPU so therefore required memory pages are communicated from the CAD to the virtual memory manager.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Hou with the teachings of Tian because Tian’s teaching of determining which memory pages are required for a plurality of commands prevents page faults thus increasing performance (see Tian, Col. 33 lines 45-52 a Page-Request-Agent (PRA) 2008 can be utilized to speculatively trigger a CPU page fault or directly invoke the hypervisor IOMMU 2010 to allocate/swap-in required pages to the shadow page tables 2004, before the commands are executed by the GPU 2000. This can remove a significant amount of unnecessary IOMMU page faults and thus increase the overall performance.).
	
As per claim 2, Hou and Tian teach the computer-implemented method of claim 1. Hou specifically teaches wherein communicating the length of time required to complete the set of work elements comprises: storing, by the CAD, a working set list comprising the length of time required to complete the set of work elements in a save area (Fig. 6; Col. 4 lines 50-60 The task loader 10321 sends the task execution time to the accumulator 10352, which stores therein the time required to complete all tasks that have not yet been completed at present in the task accelerating unit. The accumulator 10352 accumulates the task execution time to the accumulation result it stored for notifying the newly accumulated result to the hardware thread as the total waiting time of the new task. The accumulator 10352 returns the total waiting time of the new task to the task loader 10321, which in turn returns it to the specified address associated with the hardware thread via a bus interface).
wherein communicating the set of memory pages required to complete the set of work elements comprises: storing a working set list comprising the set of memory pages required to complete the set of work elements in a co-processor save area, wherein the co-processor save area is accessible by the virtual memory manager (Col. 32 lines 48-52 the CMD parser 2005 records the working set (i.e. a list of memory pages) associated with each submitted command buffer which are provided to shadow page tables 2004 (i.e., storing those PTEs which are utilized by the working set); Col. 33 lines 45-49 a Page-Request-Agent (PRA) 2008 can be utilized to speculatively trigger a CPU page fault or directly invoke the hypervisor IOMMU 2010 to allocate/swap-in required pages to the shadow page tables 2004; Col. 31 lines 26-27 a comprehensive shadow page table needs to track all the page table pages; Col. 33 line 33 the CPU page table is shared with the GPU 2000; Col. 28 lines 36-44 In various embodiments, command parser 1918 may scan a command from a VM and determine if the command contains memory operands. If yes, the command parser may read the related graphics memory space mappings, e.g., from a GTT for the VM, and then write it into a workload specific portion of the SGGTT. After the whole command buffer of a workload gets scanned, the SGGTT that holds memory address space mappings associated with this workload may be generated or updated).

As per claim 3, Hou and Tian teach the computer-implemented method of claim 1. Hou specifically teaches further comprising: updating, by the CAD, the length of time required to complete the set of work elements when one or more work elements have been completed (Fig. 6; Col. 7 lines 14-20 storing the time required to complete all tasks that have not yet been completed as an accumulation result, accumulating the evaluated task execution time to the subtracting the corresponding task execution time from the stored accumulation result after completing the task.).
Additionally, Tian teaches updating the set of memory pages needed for completing the set of work elements (Col. 28 lines 41-44 After the whole command buffer of a workload gets scanned, the SGGTT that holds memory address space mappings associated with this workload may be generated or updated; Col. 34 lines 4-5 the GPU scheduler performs lazy updates to the shadow page table based on the working set of commands; claim 3 a page request agent to determine memory pages which will be required for the GPU to execute one or more of the commands in the working set of commands).
	
As per claim 5, Hou and Tian teach the computer-implemented method of claim 1. Hou specifically teaches wherein the set of work elements comprises one or more current work elements and one or more future work elements (Col. 6 lines 30-32 The calculation writer 10355 stores therein the time required to complete all tasks that have not yet been completed (as future) in the task accelerating unit; Col. 6 line 44 an accelerating task is completed (as current)).

As per claim 7, Hou and Tian teach the computer-implemented method of claim 5. Tian specifically teaches wherein the set of memory pages comprises current working memory pages and future working memory pages for completing the set of work elements (col. 29 lines 51-56 GPU scheduler 1912 may integrate the in-executing and to-be-executed graphic memory working sets together. In some embodiments, a resulting SGGTT for the in-executing and to-be-executed graphic memory working sets for the particular render engine may also be generated and stored; Col. 29 lines 64-65 hypervisor 1910 may write corresponding SGGTT .

As per claim 10, it is a system claim of claim 1, so it is rejected for the same reasons as claim 1 above. Additionally, Hou teaches a coherent computer system, comprising: a processor; a coherent accelerator device (CAD); and a computer-readable storage medium communicatively coupled to the processor and the CAD, storing program instructions which, when executed by the CAD, cause the CAD to perform a method (Fig. 4; abstract lines 1-2 Provided is a hardware accelerator and method, central processing unit, and computing device; Col. 3 lines 18-21 As shown in FIG. 4, the computer system 400 may include: CPU (Central Processing Unit) 401, RAM (Random Access Memory) 402, ROM (Read Only Memory) 403; Col. 3 lines 41-41-42  The hardware accelerator 103 and hardware thread 101 shown in FIG. 5 are included in the CPU shown in FIG. 4; Col. 8 lines 7-10 aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.).

As per claims 11, 12, and 14, they are system claims of claims 2, 3, and 5, so they are rejected for the same reasons as claims 11, 12, and 14 above.

As per claim 16, it is a computer program product claim of claim 1, so it is rejected for the same reasons as claim 1 above. Additionally, Hou teaches a computer program product comprising a computer-readable storage medium having program instructions embodied therewith, wherein the computer-readable storage medium is not a transitory signal per se, the program instructions executable by a coherent accelerator device (CAD) to cause the CAD to perform a method (Fig. 4; abstract lines 1-2 Provided is a hardware accelerator and method, central processing unit, and computing device; Col. 3 lines 18-21 As shown in FIG. 4, the computer system 400 may include: CPU (Central Processing Unit) 401, RAM (Random Access Memory) 402, ROM (Read Only Memory) 403; Col. 3 lines 41-41-42  The hardware accelerator 103 and hardware thread 101 shown in FIG. 5 are included in the CPU shown in FIG. 4; Col. 8 lines 7-10 aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.).

As per claims 17, 18, and 20, they are computer program product claims of claims 2, 3, and 5, so they are rejected for the same reasons as claims 2, 3, and 5 above. 

Claims 4, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Hou and Tian, as applied to claims 1, 10, and 16 above, in view of Roy et al. (US 20210349826 A1 hereinafter Roy).

As per claim 4, Hou and Tian teach the computer-implemented method of claim 1. Tian specifically teaches indicate the set of memory pages needed to complete the set of work elements (claim 3 a page request agent to determine memory pages which will be required for the GPU to execute one or more of the commands in the working set of commands).

a sliding window is used to indicate the set of memory pages.

However, Roy teaches a sliding window is used to indicate the set of memory pages ([0054] lines 3-5 the first memory location range being selected by a sliding address window, in which the memory locations of the first memory location range are addressable).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Hou and Tian with the teachings of Roy because Roy’s teaching of a sliding window that is mapped to multiple physical memory addresses allows for memory to seem larger (see Roy, [0204-0205] Instead of declaring the memory to be 4 times larger, it would be necessary to declare it even greater, which is generally not possible, because it corresponds to memory configurations not supported by the DRAM memory controller. As illustrated in FIG. 3, a solution is to use an access window 305, called sliding window, of which the location 304 in the PIM memory 302 (and possibly the size) can be configured via interface registers, this sliding window being mapped many times in a large range of physical addresses 306 called multiple window.).

As per claims 13 and 19, they are system and computer program product claims of claim 4, so they are rejected for the same reasons as claim 4 above.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Hou and Tian, as applied to claim 1 above, in view of Tang et al. (US 20200379807 A1 hereinafter Tang).

As per claim 6, Hou and Tian teach the computer-implemented method of claim 5. Hou specifically teaches the set of work elements with current work elements and future work elements (Col. 6 lines 30-32 The calculation writer 10355 stores therein the time required to complete all tasks that have not yet been completed (as future) in the task accelerating unit; Col. 6 line 44 an accelerating task is completed (as current)).

Hou and Tian fail to teach teaches wherein a work element descriptor groups the set of work elements into current work elements and future work elements.

However, Tang teaches wherein a work element descriptor groups the set of work elements into current work elements and future work elements (Fig. 2; [0021] lines 6-12 The workload model here comprises a job description and associations between future workloads associated with the job description. With example implementations of the present disclosure, not only a current workload may be determined based on a group of jobs received currently, but also a future possible workload may be determined based on the workload model.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Hou and Tian with the teachings of Tang because Tang’s teaching of grouping current and future workloads allows for jobs to be managed more effectively and reduce latency (see Tang, abstract lines 12-16 The group of jobs in the processing system are managed based on the current workload and the future workload. With the .

Claims 8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Hou and Tian, as applied to claims 1 and 10 above, in view of Sevigny (US 9772878 B2).

As per claim 8, Hou and Tian teach the computer-implemented method of claim 1. Tian specifically teaches the virtual memory manager and does not utilize the page until the length of time required to complete the set of work elements decrements to zero (Col. 19 lines 59-63 A virtualization stub module 1411 running in the hypervisor 1410 extends memory management to include extended page tables (EPT) 1414 for the user VMs 1431-1432 and a privileged virtual memory management unit (PVMMU) 1412; Col. 25 lines 18-23 In one embodiment, the batch buffer pages are write-protected, and the commands are audited before submitting to the GPU for execution, to close the attack window. The write-protection is applied per page on demand, and is removed after the execution of commands in this page is completed by the GPU; Col. 31 line 65-Col. 32 line 1 Prediction of the current GPU working set can greatly reduce the write protection faults, because there is no need to write-protect PTEs which are not directly related to the current working set).

Hou and Tian fail to teach memory manager does not utilize the set of memory pages until the length of time required to complete the set of work elements decrements to zero.

Sevigny teaches memory manager does not utilize the set of memory pages until the length of time required to complete the set of work elements decrements to zero (Col. 2 line 66-Col. 3 line 2 The job scheduler 130 employs conventional software memory locks (e.g., mutexes), for example, to manage access by processors to a common memory location; Col. 12 lines 17-20 If the job counter for the job group 242 is zero, then the job scheduler 230 atomically increments a generation counter for that job group 242 at operation 344 and notifies the client that the job group 242 is finished at operation 346; Col. 12 lines 29-38 Referring now to FIG. 3C, the job scheduler 230 recycles job group containers (e.g., the memory region of an empty, completed job group 242). More specifically, in the example embodiment, the job scheduler 230 checks the state of the job group container at operation 352. If, at operation 354, the associated job group 242 is not finished, then the container is held at operation 356, and the job scheduler cycles to operation 352. If the job group is finished, then the job scheduler 230 puts the job group container on a recycling stack for later reuse at operation 358.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Hou and Tian with the teachings of Sevigny because Sevigny’s teaching of allowing reuse of a memory region only after the job group using the memory region has finished increases efficiency (see Sevigny, Col. 5 lines 63-67 Recycling a job group 242 improves the efficiency of the job scheduler 230 since the creation (e.g., memory allocation) of a new job group would require an OS system call, and would incur an associated latency).

As per claim 15, it is a system claim of claim 8, so it is rejected for the same reasons as claim 8 above.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Hou and Tian, as applied to claim 1 above, in view of Hasegawa et al. (JP2002244869A hereinafter Hasegawa).
The claim mappings are made with a translation of JP2002244869A.

As per claim 9, Hou and Tian teach the computer-implemented method of claim 1. Tian specifically teaches the virtual memory manager (Col. 19 lines 59-63 A virtualization stub module 1411 running in the hypervisor 1410 extends memory management to include extended page tables (EPT) 1414 for the user VMs 1431-1432 and a privileged virtual memory management unit (PVMMU) 1412).

Hou and Tian fail to teach wherein the memory manager ignores the communicating and utilizes the set of memory pages prior to the length of time required to complete the set of work elements decrementing to zero.

However, Hasegawa teaches wherein the memory manager ignores the communicating and utilizes the set of memory pages prior to the length of time required to complete the set of work elements decrementing to zero ([0001] 13 lines 1-3 The present invention relates to a memory management device, and more particularly to a memory management device capable of determining a job group for page stealing based on the 146 lines 6-7 page stealing is performed from the development job with low importance, and the actual page is assigned to the job B, so that the response of the job A job with high importance is not adversely affected; [0016] lines 9-10 The size a307, the page steal count 308 indicating the number of actual pages stealed from the job group;[0017] 183 lines 4-7 In the job management table 320, a NEXT pointer b321 for queuing another job management table 320, a memory size b322 indicating the number of actual pages possessed by the job, and a memory management table 330 for managing the actual pages possessed are provided; [0024] 263 3-14 calculates the target value of the number of real pages to be stealed in order to secure the free real page ( Step 801). In step 802, the job group having the lowest importance is selected, and it is determined whether or not the free main storage target value is achieved by selecting the job group… . In step 805, the memory supply instruction unit 113 is instructed to page steal the number of pages indicated by the steel instruction value 406 from the job group obtained in step 802; [0030] 375 lines 1-2 a group of important production jobs and a group of less important development jobs are operated simultaneously in one computer system; Hasegawa teaches utilizing the set of memory pages prior to the length of time required to complete the set of work elements decrementing to zero because memory pages allocated to a job group are stolen before the job group has finished executing. ).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined with the teachings of Hou and Tian because Hasegawa’s teaching of page stealing allows for jobs without adequate pages to steal pages from low priority jobs (see Hasegawa, [0014] 146 lines 3-8 As shown in the upper part of the figure, when the actual page usage of the job of business B system increases and the actual 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HSING CHUN LIN whose telephone number is (571)272-8522.  The examiner can normally be reached on Mon - Fri 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571)272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to 
/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        



/H.L./Examiner, Art Unit 2195