DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to application filed on 4/14/2019, wherein claims 1-20 are pending.

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 120 as follows:
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA  35 U.S.C. 112, except for the best mode requirement.  See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994)
The disclosure of the prior-filed application, Application No. 15164848, fails to provide adequate support or enablement in the manner provided by 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph for one or more claims of this application.  In particular, it does not provide support for “a scheduler comprising plurality of stages, … wherein each stage includes physical hardware implemented using digital logic gates, .
In addition, the disclosure of the prior-filed application, Application No. 16/270766, fails to provide adequate support or enablement in the manner provided by 35 USC 112(a) or pre-AIA  35 USC 112, first paragraph for one or more claims of this application.  In particular, it does not provide support for “…the output command buffer allocator and initializer further comprise an output command buffer write pointer update to update a write pointer (WP) during the clearing of output command buffer, further the .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 13 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The following claim terms are unclear and indefinite:
As for claim 13. it is unclear what is meant by “at the same time” and what is the basis for determining complete operation.  In particular, examiner note the claim limitation is merely repeated in the specification with no teaching of how to ensure the plurality of processors complete operation at the same time nor how to detect it.  Consequently it is entirely unclear what the basis is for “at the same time”.  Is it based on last instruction executed?  Is it based on output buffer finished filling?  Is it based on a pointer status update?  Is it based on setting status table, which would incur additional issue of where is this determined?  Who 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1-3, 6-14, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Duluk, JR. et al. (US PGPUB 2014/0176588) (hereafter as Duluk), in view of Burke (US PGPUB 2018/0300933).

As for claim 1, Duluk teaches a graph stream processing system (Abstract, “…”graphics processing unit…”), comprising:
a plurality of graph streaming processors operative to process a plurality of threads (paragraph 41, “…one or more…SM…each Sm…configured to process one or more thread groups…”), wherein each of the plurality of threads include a set of instructions operating on the plurality of graph streaming processors and operating on a set of input data and producing output data (paragraph 41, “…receive instructions...[to execute on] SM 310…” and paragraph 44, “…the input data set a thread is to process…an output dataset a thread is to produce or write…”); and
a scheduler comprising plurality of stages (paragraph 41, the pipeline managed by pipeline manager is understood as different stages of a pipeline, wherein each SM process one or more thread groups using a warp scheduler within the SM.  Thus, the scheduler comprises multiple stages), see, paragraph 4, “graphics processing pipeline that includes a sequence of graphics processing stages”), wherein each of the stages is coupled to an input command buffer and an output command buffer (Paragraphs 41, 35.  Current application specification teaches output command buffer of a stage is the input command buffer of the next stage and the command buffer can function to forward information required by every stage from the input command buffer to the output command buffer (specification, paragraph 89 and 88).  Thus, under BRI, command 
wherein each stage includes physical hardware implemented using digital logic gates, operative to schedule each of the threads, each stage comprising of a command parser, a thread generator and a thread scheduler (Fig. 3 – Warp Scheduler and Instruction Unit 312, and paragraph 41-42, Current application’s parser, generator, and scheduler are functional units of an overall scheduler for a stage (See, Fig. 1B), and are not separate executables, thus, they are understood to include functions performed within a overall scheduler unit.  Here, “…warp scheduler and instruction unit 312 receives instructions and constants from the instruction L1 cache 370 and controls local register file 304 and SM310 functional units according to the instructions and constants…” “the series of instructions transmitted to a particular GPC 208 constitutes a thread…the collection of a certain number of concurrently executing threads across the parallel processing engines…is referred to herein as a “warp”…”),

wherein the thread scheduler, coupled to the thread generator dispatches the plurality of threads for operating on the plurality of graph streaming processors, with one or more threads running one or more code blocks on different input data and producing different output data (paragraph 42, “…warp…executing the same program on different input data…”).

Duluk teaches receiving instructions which are then executed as threads on the SMs, thus, it would have been obvious Duluk would have a functional unit coupled to the command parser to operate to generate the plurality of threads to execute on the SM because doing so enables executing of threads on hardware from instructions asa taught by prior art.  However, in the interest of compact prosecution, Examiner note Duluk does not state a thread generator coupled to the command parser operate to generate the plurality of threads.
However, Burke teaches a known method of instruction execution for graphics including the thread generator coupled to the command parser operate to generate the plurality of threads (paragraph 74, “…the instruction unit 254 can dispatch instructions as thread group (e.g., wraps), whith each thread of the thread group assigned to a different execution unit within GPGPU core 262…”). This known technique is applicable to the system of Duluk as they both share characteristics and capabilities, namely, they are directed to instruction execution as threadgroups/warps in multi-core GPUs.


As for claim 2, Burke also teaches wherein the plurality of graph streaming processors simultaneously operate on a plurality of threads of different stages (paragraph 213, Output of one stage can be send to another stage/unit that already exists.  Thus, plurality of executing threads for different stages clearly coexist, and are not limited to sequentially deployed.  See, e.g., paragraphs 176 and 215).

As for claim 3, Burke also teaches the scheduler further comprising an output command buffer allocator and initializer to manage output command buffer size [return buffer size] (paragraph 305) and clearing of output command buffer before scheduling a thread for processing by the plurality of graph streaming processors (paragraph 301.  

As for claim 6, Duluk also teaches the plurality of graph streaming processors operating on a thread generate write commands to update the output command buffer (paragraph 64-65, the local vertex buffer is written to, and the data is subsequently transferred.  Thus, it would be inherent there is a trigger to update (i.e., write) to the output command buffer).

As for claim 7, Burke teaches wherein the plurality of graph streaming processors complete operation on at least one thread of a first stage before the thread scheduler can dispatch threads from a second stage for operation, wherein operations on the threads of the second stage start after the operations on the at least one thread of the first stage (paragraph 304.  pipeline synchronization where clear of data from one or more instructions before processing a next set of commands (next stage’s actions) is understood as waiting for complete operation on at least one thread of a first stage).

As for claim 8, Duluk also teaches wherein the commands to generate threads for the second stage is computed by the plurality of graph streaming processors operating on the at least one of threads of the first stage (paragraph 39.  tasks generating one or more child processing tasks is understood as generating tasks that follows the task, which is understood as a different stage).

As for claim 9, Burke also teaches the graph streaming processor system further comprising a compiler to generate the one or more code blocks for operating on the plurality of graph streaming processors (paragraph 152, “compiler”).

As for claim 10, Burke also teaches the compiler provides input commands to initiate processing of the graph streaming processor system (paragraph 316.  pre-compilation is understood as providing the instructions that the graph stream processor system processes on.  the start of a compiled program is understood as commands to initiate processing).
As for claim 11, Duluk teaches a method of graph stream processing (Abstract, “…graphics processing unit…”), comprising:
processing, by a plurality of graph streaming processors, a plurality of threads (paragraph 41, “…one or more…SM…each Sm…configured to process one or more thread groups…”), wherein each of the plurality of threads include a set of instructions operating on the plurality of graph streaming processors operating on a set of input data and producing output data (paragraph 41, “…receive instructions...[to execute on] SM 310…” and paragraph 44, “…the input data set a thread is to process…an output dataset a thread is to produce or write…”); and
scheduling the plurality of threads by a scheduler (paragraph 41, the pipeline managed by pipeline manager is understood as different stages of a pipeline, wherein each SM process one or more thread groups using a warp scheduler within the SM.  both the pipeline manager and each stage’s warp scheduler are part of the scheduler that schedule the plurality of threads),

further comprising:

dispatching by the thread scheduler, one or more threads for operating on the plurality of graph streaming processors, with each thread threads running one or more code blocks on different input data and producing different output data (paragraph 42, “…warp…executing the same program on different input data…”).

Duluk teaches receiving instructions which are then executed as threads on the SMs, thus, it would have been obvious Duluk would have a functional unit coupled to the command parser to operate to generate the plurality of threads to execute on the SM because doing so enables executing of threads on hardware from instructions asa taught by prior art.  However, in the interest of compact prosecution, Examiner note Duluk does not state scheduler generate the one or more threads.
However, Burke teaches a known method of instruction execution for graphics including generating by a thread scheduler, one or more threads (paragraph 74, “…the instruction unit 254 can dispatch instructions as thread group (e.g., wraps), whith each thread of the thread group assigned to a different execution unit within GPGPU core 262…”). This known technique is applicable to the system of Duluk as they both share characteristics and capabilities, namely, they are directed to instruction execution as threadgroups/warps in multi-core GPUs.
One of ordinary skill in the art before the effective filing date of the application would have recognized that applying the known technique of Burke would have yielded predictable results and resulted in an improved system.  It would have been recognized 

As for claim 12, Burke also teaches two or more stages of the plurality of stages operate simultaneously (paragraph 301, “…sequence…will process…in at least partial concurrence…”).

As for claim 13, wherein the plurality of graph streaming processors complete operation on a plurality of threads of a node corresponding to a stage at the same time (paragraph 291, “…entire geometric objects…”  The specification does not explain the meaning of “complete operation corresponding to a stage at the same time…” as discussed above.  and alternatively paragraph 284, “including synchronization instructions…”)

As for claims 14, 17-20, they contain similar limitations as claims 3, and 6-9 respectively.  Thus, they are rejected under the same rationales.

Claim 4-5 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Duluk and Burke, further in view of Mizrahi et al. (US PGPUB 2016/0291982).
As for claim 4, Duluk and Burke teaches thread processors returns after processing dispatched task and synchronization of data between stages, thus, it would have been obvious the system can do status reporting and completion indication.  Nevertheless, in the interest of compact prosecution, examiner note they do not explicitly teach a write pointer update and indicating a completion pointer for a next stage
However, Mizrahi teaches a known method of execution of instruction sequences on processor threads including thread processors [2nd hardware thread] including an output command buffer write pointer update to update a write pointer (WP) during the clearing of output command buffer, further the write pointer indicating a completion pointer for a next stage (Fig. 5 – steps 114-118 and paragraphs 97-98.  “if the instruction is found to be the last write operation…thread signals the last write, at a LWI signaling step 118...”). This known technique is applicable to the system of Duluk and Burke as they both share characteristics and capabilities, namely, they are directed to dependency based task dispatch and execution on plurality of processing cores.
One of ordinary skill in the art before the effective filing date of the application would have recognized that applying the known technique of Mizrahi would have yielded predictable results and resulted in an improved system.  It would have been recognized that applying the technique of Mizrahi to the teachings of Duluk and Burke would have yielded predictable results because the level of ordinary skill in the art 

As for claim 5, Duluk teaches plurality of graph streaming processors (Abstract).  Mizrahi teaches processors updates the completion pointer after completing operation on a thread (paragraph 98).

As for claims 15-16, they contain similar limitations as claims 4-5 above.  Thus, they are rejected under the same rationales.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN X LU whose telephone number is (571)270-1233.  The examiner can normally be reached on M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 5712723759.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEVIN X LU/
Examiner, Art Unit 2199

/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199