DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to Applicant’s Amendment filled on 10/12/2021.
Claims 1, 3-6, 8, 10-13, 15 and 17-20 are presented for examination. Claims 1, 3, 8, 10-12, 15 and 17-18 have been amended.
Applicant’s amendments to the specification and claims have overcome claim objection set forth in the non-Final Office Action mailed 6/14/2021.

Examiner Notes
Examiner cites particular columns, paragraphs, figures and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirely as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/12/2021.  The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. 

Claim Objections
Claims 6, 13 and 20 are objected to because of the following informalities:
 “a memory queue” at lines 2-3 of Claim 6 should be “the memory queue” (note: line 11 of Claim 1 already includes “a memory queue”).
Claims 13 and 20 are also objected due to same reason as stated at I above.
  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a)  IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same,  and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.


Claims 1, 3-6, 8, 10-13, 15 and 17-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way 

Regarding to Claim 1, according to current claim language, current Claim 1 requires the claimed actions of divide step and initiate step are performed by claimed processor while claimed detect step, claimed terminate step, claimed store step, claimed create step and claimed launch step are performed by claimed processing resource of parallel processor, i.e., the claimed steps/actions are performed by different processors respectively. However, there is not support from the specification to provide evidence of the claimed steps/actions are performed by different processors. Such as, Figs. 14-15 and [00207]-[00215] from the specification as the support for the claimed steps/actions do not include any statement describes or implies both of the claimed divide and initiate steps are performed by a processor while all of the claimed detect, terminate, store, create and launch are performed by another parallel processor. [00214] from the specification may describe a hardware element or a graphics processor to perform the detect step; however, [00208] from the specification also describes that the operations depicted in Figs. 14-15, i.e., the claimed actions/steps may be implemented in a graphics processor. Thereby, it is reasonable to state a processor performs all of the claimed actions/steps or a graphics/parallel processor performs all of the claimed actions/steps; however it is not reasonable to state a processor performs the claimed divide and initiate steps while all other claimed steps are performed by another processor.
Claims 3-6 are rejected for failing to cure the deficiency from their respective parent claim by dependency.

Regarding to Claim 8, Claim 1 is rejected under the same reason set forth in the rejection of Claim 1 above.
Claims 10-13 are rejected for failing to cure the deficiency from their respective parent claim by dependency.

Regarding to Claim 15, Claim 1 is rejected under the same reason set forth in the rejection of Claim 1 above.
Claims 17-20 are rejected for failing to cure the deficiency from their respective parent claim by dependency.


The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4-5, 12 and 17-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Regarding to Claim 4, the meanings of “the processor to: initiate …; and queue … is resolved” is not clear. Claim 4 depends on Claim 3 and Claim 3 depends on Claim 1. Currently, 
For the purpose of examination, examiner interprets the claimed initiate step and queue step are performed by “the at least one of processing resources of the parallel processor”.
Note: based on [00211] from the specification, if the claimed terminate step from Claim 1 is performed by “the at least one of processing resources of the parallel processor”, then the at least claimed queue step from Claim 4 should also be performed by “the at least one of processing resources of the parallel processor”. If Applicant disagrees such interpretation, Applicant is suggested to provide clear and exactly support from the specification that claimed initiate step and queue step from Claim 4 cannot be performed by the claimed parallel processor at the same time the claimed parallel processor is able to performed the claimed detect, terminate store, create and launch actions/steps from Claim 1.

Claim 5 is rejected for failing to cure the deficiency from its respective parent claim by dependency. In addition, Claim 5 is also rejected under the similar reason set forth in the rejection of Claim 4 above, i.e., it is not clear that “the processor to” perform claimed load step/action should be “the processor” at line 2 of Claim 1 or “the parallel processor” at line 8 of Claim 1. For the purpose of examination, examiner interprets “the processor” at line 1 of Claim 5 as “the at least one of processing resources of the parallel processor”.

Regarding to Claim 12, Claim 12 is rejected under the same reason set forth in the rejection of Claim 5 above.

Regarding to Claim 19, Claim 19 is rejected under the same reason set forth in the rejection of Claim 5 above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to 

Claims 1, 6, 8, 13 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Deming et al. (US PGPUB 20140281263 A1, hereafter Deming) in view of Grossman (US PGPUB 20170004647 A1), Kacevas et al. (US PGPUB 20180047131 A1, hereafter Kacevas) and Srinivasan et al. (US UPGPUB 20120036509 A1, hereafter Srinivasan).
Deming, Grossman and Kacevas were recited on the previous office action.

Regarding to Claim 1, Deming discloses: An apparatus comprising: 
a processor (see Figs. 1-5 and [0025]; “CPU 102”);

divide an execution thread of a graphics workload into a set of transactions which are to be executed atomically (see [0093]; “threads executing within the SM 310(0) each generate a stream of virtual memory transactions from the SM 310(0)”, emphasis added. In addition, see Figs. 1-3, [0020]-[0023], “the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes one or more parallel processing units (PPUs) 202”. Furthermore, see [0069]; “the type of access that was attempted (e.g., read, write, or atomic), the virtual memory address for which an attempted access caused a page fault”. At certain embodiments, the threads executing in SM310 of the parallel processing subsystem 112 are graphics workloads and the set of transactions of a thread generated are transactions for atomic type of accessing); and
initiate the execution of the thread on a parallel processor comprising a plurality of graphics processing resources by assigning the set of transactions to the plurality of graphics processing resources (see Fig. 3, [0091] and [0093]; “The PPU 202 includes any number N of first virtual memory transaction of the thread, and thus it would inherently require assigning the generated transactions to SM 310(0) of the plurality of processing resources to initiate the execution of the thread);
wherein at least one of the processing resources of the parallel processor is to:
detect a page fault in the execution of at least one transaction in the set of transactions (see Figs. 1, 3-4, [0009] and [0097]-[0098]; “if uTLB 430 is unable to map … then the uTLB 430 generates a memory access fault. The fault detector 450 processes the memory access fault—sending a fault signal”), and in response to detection, to:
terminate the execution of the at least one transaction in the set of transactions (see Figs 1, 3-4, step 508 of Fig. 5 and [0105], “the fault detector 450 included in the replay unit 350(0) … stalls the SM 310(0), and adds the faulting virtual memory transaction to the replay buffer 460”. Stalling the SM that originally executes the faulting transaction and adding the faulting transaction to a buffer imply the corresponding faulting transaction is terminated);
store an execution state of the thread in a memory queue (see Fig. 2 and [0097]-[0098]; “the fault detector 450 causes a fault buffer entry to be written to the fault buffer 216 of FIG. 2”, emphasis added. The fault buffer entry, i.e., claimed execution state of the thread, is written to fault buffer in the system memory 104 in response to detecting there is a page fault in the execution of the transaction).
Deming further discloses: and in response to the detection, the processor to:
create a set of instructions to resolve the page fault (see Fig. 2 and [0099]);
 set of instructions for execution on the CPU (see Fig. 2 and [0099]).

Deming does not disclose: the processor to perform the divide and the initiate steps/actions; 
the creation and launch steps are also performed by the at least one of the processing resources of the parallel processor i.e., the processing resources of the parallel processor to perform the detection of the page fault, instead of the processor performs the creation and launch steps; 
the set of instructions to resolve the page fault are created in a command batch format, launch step is performed in a manner of launching the command batch in a hardware command streamer for execution the parallel processor.

However, Grossman discloses: an apparatus comprising a parallel processor:
in response to detect a page fault in execution of transaction at the parallel processor, the at least one of the processing resources of the parallel processor to:
create a set of instructions comprising instructions to resolve the page fault; launch the set of instructions for execution on the parallel processor (see Fig. 1 and [0015]; “In response to the GPU 112 experiencing a page fault, the GPU MMU 118 can interrupt the GPU context manager 114, to initiate handling the page fault and to inform the GPU fault handler 116 or the CPU fault handler 106 of the page fault”. Also see [0025]).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the PPU Fault handler 215 of CPU 102 to handle page fault occurred on GPU/PPU 202 from Deming by including GPU fault handler 116 of GPU 
Furthermore, Kacevas discloses: an apparatus comprising a parallel processor, the apparatus creates a set of instructions in command batch format, the apparatus launches the command batch in a hardware command streamer for execution on the parallel processor (see [0065]; “graphics processor receives batches of commands via ring interconnect 502. The incoming commands are interpreted by command streamer 503 in the pipeline front-end 534”. Also see “the commands may be issued as batch of commands in a command sequence, such that the graphics processor will process the sequence of commands in an at least partially concurrent manner” from [0097]).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the mechanism of GPU/PPU executes a set of instructions from the combination of Deming and Grossman by including a set of instructions are executed in a command batch format to be launched and interpreted by a command streamer of GPU for executing the command batch or set of instructions on GPU from Kacevas, since it is understood that a set of instructions can be formed in a command batch format to be executed.

Thereby, the combination of Deming, Grossman and Kacevas discloses: wherein at least one of the processing resources of the parallel processor is to: in response to detect the detection, to:

launch the command batch in a hardware command streamer for execution on the processor (see Fig. 2, [0099] from Deming; Fig. 1, [0015], [0025] from Grossman and [0065] from Kacevas. At the combination system, the graphics processor includes a hard command streamer to launch and then interpret received instructions in the command batch format, then such command batch or sets of instructions can be execution on the graphics processor).

The combination of Deming, Grossman and Kacevas does not diclsoe:
the processor to perform the divide and the initiate steps/actions.
However, Srinivasan discloses: one or more transactions from a thread are routed from an initiator processor to a target processor for further processing (see [0023]; “a transaction from a thread from an initiator IP core may be routed to a multiple channel aggregate memory target IP core” and “a second transaction from the same thread from a given initiator IP core being routed to the multiple channel aggregate memory target IP core”. Also see [0025], [0031]; “Each initiator IP core such as a CPU IP core 102”, “A target core, such as an OCP slave, should normally return responses to request transactions made by the initiator core”).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the executions of dividing and initiating steps 

Regarding to Claim 6, the rejection of Claim 1 is incorporated and further the combination of Demining, Grossman, Kacevas and Srivnivasan discloses: wherein: a hardware element detects the page fault and reports the page fault in a memory queue (see Figs. 1, 3-4, [0020] and [0097]-[0098] from Deming; “many graphics processing units (GPUs) are designed to perform parallel operations and computations and, thus, are considered to be a class of parallel processing unit (PPU)” and “The fault detector 450 processes the memory access fault” and “the fault detector 450 causes a fault buffer entry to be written to the fault buffer 216 of FIG. 2”. The fault detector 450 is resided at replay unit 350 which is part of PPU 202, and thus PPU/GPU, i.e., the claimed hardware element, detects the page fault and reports the page fault to a queued in the in-flight buffer 440” from [0098] of Deming and “re-queues the virtual memory transaction in the replay buffer 460” from [0100] of Deming (emphasis added), it is reasonable to consider a memory buffer of Deming as a memory queue).

Regarding to Claim 8, Claim 8 is a method claim corresponds to system Claim 1 and is rejected for the same reason set forth in the rejection of Claim 1 above.

Regarding to Claim 13, the rejection of Claim 8 is incorporated and further Claim 13 is a method claim corresponds to system Claim 6 and is rejected for the same reason set forth in the rejection of Claim 6 above.

Regarding to Claim 15, Claim 15 is a product claim corresponds to system Claim 1 and is rejected for the same reason set forth in the rejection of Claim 1 above.

Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Deming et al. (US PGPUB 20140281263 A1, hereafter Deming) in view of Grossman (US PGPUB 20170004647 A1), Kacevas et al. (US PGPUB 20180047131 A1, hereafter Kacevas) and Srinivasan et al. (US UPGPUB 20120036509 A1, hereafter Srinivasan) and further in view of Buzby et al. (US Patent 5905857 A, hereafter Buzby).
Deming, Grossman, Kacevas and Buzby were cited on the previous office action.

Regarding to Claim 3, the rejection of Claim 1 is incorporated and further the combination of Demining, Grossman, Kacevas and Srivnivasan discloses: the at least one of the processing resources to: generate a page fault signal for the at least one transaction in the set of transactions, the page fault signal comprising a thread identifier, a transaction identifier, a processor identifier, and a virtual function unit (see [0097]-[0098] from Deming; “the fault detector 450 causes a fault buffer entry to be written to the fault buffer 216 of FIG. 2”, emphasis added. Also see [0069] from Deming for details of fault buffer entry, i.e., claimed page fault signal for the transaction. Based on [0069], the fault buffer entry at least comprises: “an indication of a unit or thread that caused a page fault”, i.e., the claimed thread identifier; “the type of access that was attempted (e.g., read, write, or atomtic)”, i.e., the claimed transaction identifier, to identify the type of faulted transaction; “the virtual memory address for which an attempted access caused a page fault”, i.e., the claimed virtual function unit. In addition, see “a fault buffer 216, which includes entries written by the PPU 202 in order to inform the CPU 102 of a page fault generated by the PPU 202” from [0031] of Deming, and thus it is reasonable to consider that a fault buffer entry, i.e., the claimed page fault signal, should also include information of indicating it is PPU 202 instead of CPU (see “CPU-based page fault” at [0076] from Deming) generates the corresponding page fault, i.e., the claimed processor identifier).

The combination of Demining, Grossman, Kacevas and Srivnivasan does not disclose: discard any work performed on the at least one transaction in the set of transactions.
However, Buzby discloses: discard any work performed on the transaction having page fualt (see lines 43-47 of col. 6; “if the execution of the faulting instruction was partially 
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the page fault handling mechanism discussed at the combination of Demining, Grossman, Kacevas and Srivnivasan by including discarding the results of faulted instruction/transaction having page fault from Buzby, since it would provide it well-known and understood restarting a faulted instruction/transaction by discarding its intermediate results, and thus to use the safestored information only (see lines 39-50 of col. 6 from Buzby; “using the safestored information, but also a full restart when a valid copy of the missing page is brought into the private cache 3”).

Regarding to Claim 10, the rejection of Claim 9 is incorporated and further Claim 10 is a method claim corresponds to system Claim 3 and is rejected for the same reason set forth in the rejection of Claim 3 above.

Regarding to Claim 17, the rejection of Claim 16 is incorporated and further Claim 17 is a product claim corresponds to system Claim 3 and is rejected for the same reason set forth in the rejection of Claim 3 above.

Claims 4-5, 11-12 and 18 -20 are rejected under 35 U.S.C. 103 as being unpatentable over Deming et al. (US PGPUB 20140281263 A1, hereafter Deming) in view of Grossman (US PGPUB 20170004647 A1), Kacevas et al. (US PGPUB 20180047131 A1, hereafter Kacevas), Srinivasan et al. (US UPGPUB 20120036509 A1, hereafter Srinivasan) and Buzby et al. (US .
Deming, Grossman, Kacevas, Buzby and Lee were cited on the previous office action.

Regarding to Claim 4, the rejection of Claim 3 is incorporated and further the combination of Deming, Grossman, Kacevas, Srinivasan and Buzby discloses: the processor to initiate execution of a new thread (see [0022], [0029] and [0093] from Deming, there are multiple threads being executed, and thus the system of Deming inherently includes action of initiating execution of a new thread); and re-execute the transaction for execution after the page fault is resolved (see step 512 of Fig. 5 from Deming; “the replay unit 350(0) waits for the CPU 102 to signal that one or more faults have been resolved via the replay signal. Upon receiving the replay signal, the replay unit 350(0) invalidates the uTLB 430 and re-executes the virtual memory transactions that are stored in the replay buffer 460”).

The combination of Deming, Grossman, Kacevas, Srinivasan and Buzby does not disclose: the at least one transaction in the set of transactions for execution is queued after the page fault is resolved.
However, Lee discloses: a fault handling mechanism comprises to queue a program command for execution after the fault is resolved (see [0009]; “a recovery operation to a program command corresponding to the program fail, re-queues a recovered program command in the first queue and resumes providing the queued commands from the first queue”, emphasis added).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the queuing fault command/transaction having fault 

Regarding to Claim 5, the rejection of Claim 4 is incorporated and further the combination of Deming, Grossman, Kacevas, Srinivasan, Buzby and Lee discloses: the processor to: load a subsequent transaction for execution (see steps 506-502 of Fig. 5 and [0104] from Deming; “the method 500 returns to step 502”. The method returns to step 502 from 506 for receiving/loading the next or subsequent memory transaction of the thread for execution. Also see steps 514-516-502 of Fig. 5 and [0107] from Deming; “causes the SM 310(0) to resume issuing virtual memory transactions from the SM 310(0), and the method 500 returns to step 502”. The SM 310(0) begins to load/issue a next/subsequent transaction for execution).

Regarding to Claim 11, the rejection of Claim 10 is incorporated and further Claim 11 is a method claim corresponds to system Claim 4 and is rejected for the same reason set forth in the rejection of Claim 4 above.

Regarding to Claim 12, the rejection of Claim 11 is incorporated and further Claim 12 is a method claim corresponds to system Claim 5 and is rejected for the same reason set forth in the rejection of Claim 5 above.

Regarding to Claim 18, the rejection of Claim 17 is incorporated and further Claim 18 is a product claim corresponds to system Claim 4 and is rejected for the same reason set forth in the rejection of Claim 4 above.

Regarding to Claim 19, the rejection of Claim 18 is incorporated and further Claim 19 is a product claim corresponds to system Claim 5 and is rejected for the same reason set forth in the rejection of Claim 5 above.

Regarding to Claim 20, the rejection of Claim 19 is incorporated and further Claim 20 is a product claim corresponds to system Claim 6 and is rejected for the same reason set forth in the rejection of Claim 6 above.

Response to Arguments
Applicant’s arguments, filled 10/12/2021, with respect to rejections of Claims 1, 3-6, 8, 10-13, 15 and 17-20 under 35 U.S.C. 103 have been full considered. New grounds of rejections were made based on the amended limitations.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Duesterwald et al. (US PGPUB 20180108105 A1) discloses: a graphics processing unit to divide a first data block into a set of sub-blocks (see [0037]; “a policy can be a processing policy in which the first data block is divided by the first graphics processing unit 102 into a set of sub-blocks”). The first and second graphics processing units can receives a respective portion of data block of a data matrix from a CPU for performing further process on the respective portion of data block of the data matrix from the CPU (see Fig. 1 and [0035]-[0036] and [0039]).
Price et al. (US PGPUB 20170236244 A1) discloses: a GPU receives tasks from a host processor and partitions a task into subtasks, then the GPU distributes the subtasks for execution to other execution units of the GPU (see [0187]).
Li et al. (US PGPUB 20210064425 A1) discloses: a thread block scheduler splits a received task into a plurality of thread blocks and distributes the plurality of thread blocks to a plurality of computing cores for parallel computing (see [0076]).
Jia et al. (US PGPUB 20180075605 A1) discloses: a CPU host divides real-time input images into a plurality of image partitions and multiple GPUs performs some image processing algorithms on the plurality of image partitions (see [0008]). 

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHI CHEN whose telephone number is (571)272-0805.  The examiner can normally be reached on Monday-Friday 9:30AM-5PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached on (571)272-3652.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/Zhi Chen/
Patent Examiner, AU2196

/EMERSON C PUENTE/Supervisory Patent Examiner, Art Unit 2196