DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event a determination of the status of the application as subject to AIA  35 U.S.C. 102, 103, and 112 (or as subject to pre-AIA  35 U.S.C. 102, 103, and 112) is incorrect, any correction of the statutory basis for a rejection will not be considered a new ground of rejection if the prior art relied upon and/or the rationale supporting the rejection, would be the same under either status.  

Notice of Claim Interpretation
Claims in this application are not interpreted under 35 U.S.C. 112(f) unless otherwise noted in an office action.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 4 January 2021, 12 January 2021, and 18 January 2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities: “connect to the uncore 180” in paragraph 0050 should be --connect to the core 120--.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 11, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US 2013/0297906) in view of Jeddeloh (US 7,620,789).
In regards to claim 1, Loh teaches a processing system for scheduling, comprising:
at least one core, used to develop a plurality of tasks (Core 225, figure 2);
a plurality of accelerator function units (AFUs) (“The staged memory scheduler 245 can also be employed in other system organizations with more than one GPU 230, with any number of CPUs 220, with direct memory access (DMA) engines, peripheral devices, hardware accelerators, or any plurality of computing devices that send requests to the memory 240.”, paragraph 0048), wherein each of the AFUs is used to execute at least one task of the tasks correspondingly, and the task corresponds to a plurality of memory access requests (“Each request from a given source (e.g., processing unit or thread) is initially inserted into its respective source queue 310 upon arrival at the staged memory scheduler 245. A batch is generally designated as one or more memory requests from the same source that access the same memory row. … For example, the CPU cores 225 may have one batch size and/or age threshold, while 
a memory access unit (staged memory scheduler 245, figure 2), comprising:
a plurality of schedulers, wherein each of the schedulers corresponds to each one of the AFUs (“The batch unit 300 includes one source queue 310 for each processing unit source that can issue memory requests to access the system memory 240 (i.e., a CPU core 225, GPU 230, and/or other processing units).”, paragraph 0029), used to schedule the memory access requests based on the sequence in which the memory access requests were received from the corresponding AFU (“If batches are still tied after evaluating bank level parallelism, the batch scheduler 320 prioritizes the oldest batch.”, paragraph 0033); and
a pipeline resource, used to receive and execute the memory access requests transmitted by the schedulers (“The memory command scheduler 330 includes one bank queue 340 per bank in the system memory 240 (e.g., eight banks/FIFOs per rank for DDR3). The batch scheduler 320 places the memory requests directly into the bank queues 340. Note that because batches are moved into the bank queues 340 one batch at a time, any row-buffer locality within a batch is preserved within a particular bank queue 340. At this point, any higher-level policy decisions have already been made by the batch scheduler 320, so the memory command scheduler 330 can focus on issuing low-level memory commands and ensuring DDR protocol compliance.”, paragraph 0036), and after executing the memory access requests, to transmit execution results of 
Loh fails to teach transmit execution results in the sequence in which the memory access requests were received from the corresponding AFU.  Jeddeloh teaches transmit execution results in the sequence in which the memory access requests were received from the corresponding AFU (“After execution, data read from the memory device by the execution of the read-type memory access requests are transferred to the respective requesters in the order in which the read requests were originally received.”, abstract).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Loh with Jeddeloh to include transmit execution results in the sequence in which the memory access requests were received from the corresponding AFU in order to simplify the design of the functional unit.
In regards to claim 11, Loh teaches a memory access method for scheduling, for use in a processing system that includes at least one core, a plurality of accelerator function units (AFU), and a memory access unit, wherein the memory access unit comprises a plurality of schedulers and a pipeline resource, the memory access method comprising:
arranging the core to develop a plurality of tasks (Core 225, figure 2);
arranging each of the AFUs to execute at least one task of the tasks correspondingly, and the task corresponds to a plurality of memory access requests 
arranging each of the schedulers corresponding to each one of the AFUs (“The batch unit 300 includes one source queue 310 for each processing unit source that can issue memory requests to access the system memory 240 (i.e., a CPU core 225, GPU 230, and/or other processing units).”, paragraph 0029), used to schedule the memory access requests based on the sequence in which the memory access requests were received from the corresponding AFU (“If batches are still tied after evaluating bank level parallelism, the batch scheduler 320 prioritizes the oldest batch.”, paragraph 0033);
receiving and executing the memory access requests transmitted by the schedulers (“The memory command scheduler 330 includes one bank queue 340 per bank in the system memory 240 (e.g., eight banks/FIFOs per rank for DDR3). The batch scheduler 320 places the memory requests directly into the bank queues 340. Note that because batches are moved into the bank queues 340 one batch at a time, any row-
transmitting execution results of the memory access requests to the corresponding AFU through each of the schedulers after executing the memory access requests (“The memory command scheduler is operable to receive the selected batch from the batch scheduler and issue the memory requests in the selected batch to a memory interfacing with the memory controller.”, abstract).
Loh fails to teach transmitting execution results in the sequence in which the memory access requests were received from the corresponding AFU after executing the memory requests.  Jeddeloh teaches transmitting execution results in the sequence in which the memory access requests were received from the corresponding AFU after executing the memory requests (“After execution, data read from the memory device by the execution of the read-type memory access requests are transferred to the respective requesters in the order in which the read requests were originally received.”, abstract).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Loh with Jeddeloh to include transmitting execution results in the sequence in which the memory access requests were received from the corresponding AFU after executing the memory requests in order to simplify the design of the functional unit.
In regards to claim 5 and 15, Loh further teaches that the memory access unit further comprises:
an arbitrator, selecting one of the schedulers using a round-robin method at each clock period and transmitting one of the memory access requests of the AFU corresponding to the selected scheduler to the pipeline resource (“In a round-robin approach, the batch scheduler 320 cycles through each of the per-source source queues 310, ensuring that high memory-intensity applications receive adequate service.”, paragraph 0033), and
executing the memory access request through the pipeline resource to read and write the data related to the task (“The memory command scheduler 330 includes one bank queue 340 per bank in the system memory 240 (e.g., eight banks/FIFOs per rank for DDR3). The batch scheduler 320 places the memory requests directly into the bank queues 340. Note that because batches are moved into the bank queues 340 one batch at a time, any row-buffer locality within a batch is preserved within a particular bank queue 340. At this point, any higher-level policy decisions have already been made by the batch scheduler 320, so the memory command scheduler 330 can focus on issuing low-level memory commands and ensuring DDR protocol compliance.”, paragraph 0036).

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US 2013/0297906) in view of Jeddeloh (US 7,620,789) and Nvidia (CUDA C Programming Guide).
In regards to claims 2 and 12, Loh in view of Jeddeloh teaches claims 1 and 11.  Loh in view of Jeddeloh fails to teach that the core and the AFUs share a plurality of virtual addresses of the processing system to access a memory through the memory .

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US 2013/0297906) in view of Jeddeloh (US 7,620,789), Nvidia (CUDA C Programming Guide), and Persson et al. (US 2015/0089495).
In regards to claim 3, Loh in view of Jeddeloh and Nvidia teaches claim 2.  Loh in view of Jeddeloh and Nvidia fails to teach a microcontroller (MCU), coupled between the core and the AFUs, wherein the core transmits an acceleration interface instruction about the task to the microcontroller, the acceleration interface instruction comprises the id.).
In regards to claim 13, Loh in view of Jeddeloh and Nvidia teaches claim 12. Loh in view of Jeddeloh and Nvidia fails to teach arranging a microcontroller between the core and the AFUs, wherein an acceleration interface instruction about the task is received by the microcontroller, the acceleration interface instruction comprises the virtual address indicating where the task is stored;
accessing the virtual address to read and analyze the task with the microcontroller; and
dispatching the task to the corresponding AFU based on the features of the task.
Persson teaches arranging a microcontroller (paragraph 0259) between the core and the AFUs, wherein an acceleration interface instruction about the task is received by the microcontroller, the acceleration interface instruction comprises the virtual address indicating where the task is stored (“Each physical register input/output interface 8 can be used for the submission and dispatch of tasks to the accelerator 12 and comprises one or more registers in which data needed to submit a task to the accelerator can be stored. In the present embodiment this data comprises a pointer to a descriptor in main memory where the data required for the task in question is stored.”, paragraph 0162);
accessing the virtual address to read and analyze the task with the microcontroller (“Each physical register input/output interface 8 can be used for the submission and dispatch of tasks to the accelerator 12 and comprises one or more registers in which data needed to submit a task to the accelerator can be stored. In the present embodiment this data comprises a pointer to a descriptor in main memory where the data required for the task in question is stored.”, paragraph 0162); and
dispatching the task to the corresponding AFU based on the features of the task (“As shown in FIG. 2, the data processing system 20 also includes an accelerator task scheduler 9 (that is part of the accelerator 12) that arbitrates between tasks submitted to the physical register input/output interfaces 8 and causes the execution unit 2 of the accelerator 12 to execute the tasks as required. Depending on the capabilities of the execution unit 2, tasks from several physical register input/output interfaces 8 may be executed in parallel, if desired.”, paragraph 0164)
such that “the dispatching of tasks to the accelerator can be achieved without, for example, the need to go through the operating system” (paragraph 0032).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Loh with Jeddeloh, Nvidia, and Persson to include arranging a microcontroller between the core and the AFUs, wherein an acceleration interface instruction about the task is received by the microcontroller, the acceleration interface instruction comprises the virtual address indicating where the task is stored;
accessing the virtual address to read and analyze the task with the microcontroller; and
dispatching the task to the corresponding AFU based on the features of the task
such that “the dispatching of tasks to the accelerator can be achieved without, for example, the need to go through the operating system” (id.).

Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US 2013/0297906) in view of Jeddeloh (US 7,620,789), Nvidia (CUDA C Programming Guide), Persson et al. (US 2015/0089495), and Ausavarungnirun et al. (“MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency”).
In regards to claims 4 and 14, Loh in view of Jeddeloh and Nvidia teaches claims 2 and 12.  Loh in view of Jeddeloh and Nvidia fails to teach a microcontroller, coupled between the core and the AFUs, wherein the core transmits an acceleration interface instruction about the task to the microcontroller, the acceleration interface instruction comprises a page directory base address of the task, and the page directory base address is used to index a page table, wherein the page table comprises a plurality of page table entries for storing a mapping between each of the virtual addresses and a physical address.
Persson teaches a microcontroller (paragraph 0259), coupled between the core and the AFUs, wherein the core transmits an acceleration interface instruction about the task to the microcontroller (“As shown in FIG. 2, the data processing system 20 also includes an accelerator task scheduler 9 (that is part of the accelerator 12) that arbitrates between tasks submitted to the physical register input/output interfaces 8 and causes the execution unit 2 of the accelerator 12 to execute the tasks as required. id.).
Loh in view of Jeddeloh, Nvidia, and Persson fails to teach the acceleration interface instruction comprises a page directory base address of the task, and the page directory base address is used to index a page table, wherein the page table comprises a plurality of page table entries for storing a mapping between each of the virtual addresses and a physical address.  Ausavarungnirun teaches the acceleration interface instruction comprises a page directory base address of the task, and the page directory base address is used to index a page table (“Unlike previously-proposed GPU sharing techniques that do not enable memory protection [44, 60, 79, 96, 101, 127, 141], MASK id.).

Claims 6, 7, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US 2013/0297906) in view of Jeddeloh (US 7,620,789) and Danilak (US 7,623,134).
In regards to claims 6 and 16, Loh in view of Jeddeloh teaches claims 1 and 11.  Loh in view of Jeddeloh fails to teach that the memory access unit uses a translation 
In regards to claims 7 and 17, Danilak further teaches that the memory access unit further comprises:
a plurality of tablewalk engines, each of which corresponds to a respective scheduler (GPU page walker 133, figure 1), wherein each of the memory access requests comprises a second identification code, when a memory access request fails to access the memory through the TLB, the pipeline resource transmits the memory access request to the corresponding tablewalk engine based on the second identification code to load a corresponding page table entry from a system memory of the processing system (“If the GPU TLB 132 does not include a translation entry for a requested virtual-to-physical translation, the GPU memory management unit 131 refers the translation request to the GPU page walker 133, which searches a GPU frame buffer page table 142 for the virtual-to-physical translation. If the GPU page walker 133 finds the requested translation, processing proceeds using the virtual-to-physical translation from the GPU frame buffer page table.”, Col. 3, line 62 - Col. 4, line 2).

Claims 8, 9, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US 2013/0297906) in view of Jeddeloh (US 7,620,789), Danilak (US 7,623,134), and Ausavarungnirun et al. (“MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency”).
In regards to claims 8 and 18, Danilak further teaches that the corresponding tablewalk engine searches the page table to load the corresponding page table entry based on the virtual address included in the memory access request (“That is, memory access requests in the virtual memory space of the GPU 130, independent of the CPU virtual memory space, are presented to the GPU memory management unit 131, which translates virtual addresses to physical addresses.”, Col. 3, lines 47-51; “If the GPU TLB id.).
In regards to claims 9 and 19, Danilak further teaches that the corresponding tablewalk engine loads the corresponding page table entry from a system memory id.).

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US 2013/0297906) in view of Jeddeloh (US 7,620,789), Danilak (US 7,623,134), Nvidia (CUDA C Programming Guide), and Ausavarungnirun et al. (“MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency”).
In regards to claims 10 and 20, Loh in view of Jeddeloh and Danilak teaches claims 7 and 17.  Loh in view of Jeddeloh and Danilak fails to teach that each of the AFUs, each of the corresponding schedulers, and each of the corresponding tablewalk engines has the same page directory base address.  Nvidia teaches that each of the AFUs, each of the corresponding schedulers, and each of the corresponding tablewalk engines has the same address space (“When the application is run as a 64-bit process, a single address space is used for the host and all the devices of compute capability 2.0 and higher.”, section 3.2.7, paragraph 1).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Loh in view of Jeddeloh, Danilak, and Nvidia such that each of the AFUs, each of the corresponding schedulers, and each of the corresponding tablewalk engines has the same address space in order to simplify application programming.  Loh in view of Jeddeloh, Danilak, and Nvidia fails to teach that the address space has the same page directory base address.  Ausavarungnirun teaches that the address space has the same page directory base address (“MASK uses per-core page table root registers (similar to the CR3 register in x86 systems [52]) to set the current address space on each core.”, .

Response to Arguments
Applicant’s arguments, see pages 12-13, filed 6 April 2021, with respect to the objections have been fully considered and are persuasive.  Therefore, the objections have been withdrawn.  However, upon further consideration, a new objection is made in view of the amendment to paragraph 0050.
Applicant's arguments filed 6 April 2021 with respect to the prior art rejections have been fully considered, but they are not partially not persuasive and are partially moot because the new ground of rejection does not rely on the reference applied in the prior rejection of record for the teaching or matter specifically challenged in the argument.  With respect to the amended limitation, the arguments are moot in view of Jeddeloh.
With respect to the claimed plurality of schedulers, the Examiner disagrees.  Loh’s source queues 210 can be considered to fall within the broadest reasonable interpretation of the claimed schedulers.  The term scheduler does not denote a specific hardware structure in the way that the term memory array would.  Instead, the term scheduler is used to denote a set of logic/transistors used to implement the function of scheduling memory access requests.  In much the same way that Applicant’s 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NATHAN SADLER whose telephone number is (571)270-7699.  The examiner can normally be reached on Monday - Friday 9am - 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on (571)272-4204.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Nathan Sadler/Primary Examiner, Art Unit 2139                                                                                                                                                                                                        12 April 2021