DETAILED ACTION

This action is in response to communications: Amendment filed June 15, 2022.

Claims 1-20 are pending in this case.  Claims 8-10, 17, and 18 have been newly amended.  No claims have been newly added or cancelled.  This action is made FINAL.

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections

Claim 10 is objected to because of the following informalities:  
Claim 10 has been amended to recite, “…providing, from the at least SPI, the work items…” but should recite, “…providing, from the at least one SPI, the work items…”
Appropriate correction is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (US 2012/0017062) in view of Nickolls et al. (US 2011/0074802).

As to claim 1, Goel et al. disclose an apparatus (Figure 1, computer system 100) comprising: at least one shader processor input (SPI) (shader core 106) configured to provide work items from a thread group (e.g. thread group as “wavefront”) for execution on at least one shader engine (e.g. one or more of single instruction multiple data (SIMD) processing units 112 of shader core 106)([0023] notes each SIMD processing unit is assigned a “wavefront” of threads as one or more threads from a thread group, [0029] notes shader core 106 includes plurality of processing units configured to execution instructions, such as shader programs, thus may be considered “shader engines,” configured to execute a plurality of threads); and a command processor (e.g. command processor 105 with wavefront dispatch module 130) configured to selectively dispatch the work items to the at least one SPI (e.g. shader core 106) based on a size of the thread group (e.g. size of wavefront) and a format of cache lines of a cache implemented in the at least one shader engine (e.g. availability of local memories 113 of shader cores)([0028] notes command processor receives instructions from control processor 101, interprets the instructions, and issue appropriate instructions to components of graphics processing device 102, including shader core 106, the instructions issued initiates a sequence of thread groups, [0034] and [0035] note wavefront dispatch module 130 is configured to assign sequence of wavefronts of threads to the processing units 112 (of shader core 106), where the wavefront dispatch module 130 includes logic for determining the memory available in the local memory of each processing unit, the sequence of thread wavefronts to be dispatched to each processing unit, and the size of the wavefront that is dispatched to each processing unit, see Figures 2-4 and associated text).

As noted above, Goel et al. disclose its “command processor” (e.g. command processor 105 with wavefront dispatch module 130) for dispatching sequences of wavefronts of threads to the processing units 112 of shader core 106 based on the size of wavefronts and memory requirements of local memory of shader core 106, where the local memory may include cache (see [0027]).  However, Goel does not specifically express the memory requirements includes a “format of cache lines of a cache.” 

Nickolls et al. disclose a command processor (e.g. Figure 3, warp scheduler and instruction unit 312 of streaming multiprocessor (SPM) 310) configured to selectively dispatch the work items to the at least one SPI (e.g. one or more SPMs 310)([0054] notes warp scheduler and instruction  unit 312 receives instructions and constants from the instruction L1 cache and controls local register file 304 and SPM 310 functional units according to the instructions and constants) based on a size of the thread group (e.g. size of cooperative thread array (CTA) as a “warp” or “thread group”) and a format of cache lines of a cache implemented in the at least one shader engine ([0042] and [0043] notes thread group, e.g. CTA, may include a particular size that may be determined by the programmer and the amount of hardware resources available to the CTA, [0060] notes PPUs (which comprises SPM 310) configured to read from and write to surfaces that are multi-dimensional arrays of formatted data stored in graphics memory, Figure 5B illustrates cache line 505 within a multi-dimensional surface block linear formatted graphics surface, [0074] and [0075] notes multi-dimensional formatted surfaces may be accessed as textures, render targets, and arrays utilizing surface load, store, reduce, and atomic surface instructions, where [0062] notes these surface access instructions are executed by the load-store units (LSUs) of each SPM 310).

It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Goel et al.’s system and method of dispatching wavefronts based on the size of wavefronts and memory requirements to also incorporate memory requirements associated with multi-dimensional arrays of formatted data stored in memory as described in Nickolls to efficiently support various applications, thus enhancing the performance of the system (see [0062] of Nickolls).

Claim 10 is similarly in scope to claim 1 above, and is therefore rejected under similar rationale.

Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nickolls et al. (US 2011/0074802) in view of Appu et al. (US 10,102,149).
 
As to claim 19, Nickolls disclose a method, comprising: multidimensional blocks that correspond to a format of cache lines of a cache (Figure 5B, cache line 505 of cache) implemented in at least one shader engine (e.g. streaming multiprocessor (SPM) 310)([0060] notes PPUs (which comprises SPM 310) configured to read from and write to surfaces that are multi-dimensional arrays of formatted data stored in graphics memory, Figure 5B illustrates cache line 505 within a multi-dimensional surface block linear formatted graphics surface, [0074] and [0075] notes multi-dimensional formatted surfaces may be accessed as textures, render targets, and arrays utilizing surface load, store, reduce, and atomic surface instructions,); and dispatching thread groups (e.g. via warp scheduler and instruction unit 312 of SPM 310 including the work items in the multidimensional blocks to the at least one shader engine for execution ([0054] notes warp scheduler and instruction  unit 312 receives instructions and constants from the instruction L1 cache and controls local register file 304 and SPM 310 functional units according to the instructions and constants, where [0062] notes these surface access instructions are executed by the load-store units (LSUs) of each SPM 310).

Nickolls et al. disclose cooperative thread arrays as “thread groups” or “warps” being assigned to a number of processing engines of each SPM, where the size of a CTA may be determined by the programmer and the amount of hardware resources available to the CTA (see [0042] and [0043]).  However, Nickolls et al. do not specifically disclose “adding work items having consecutive indices to multidimensional blocks…”

Appu et al. disclose adding work items having consecutive indices (column 24, lines 20-26 notes a thread group refers to a plurality of threads that are grouped with ordered (e.g. sequential or consecutive) thread indexes).

It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Nickolls et al.’s CTAs as thread groups or warps to be grouped according to ordered thread indexes as taught by Appu et al.’s, yielding predictable results, without changing the scope of the invention as noted above.

Allowable Subject Matter

Claims 2-9, 11-18, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:  The prior art fails to teach or suggest the limitations as recited by dependent claims 2, 11, and 20, where dependent claims 3-9 are indicated allowable for depending upon indicated allowable claim 2, and dependent claims 12-18 are indicated allowable for depending upon indicated allowable claim 11.

Response to Arguments

Applicant's arguments filed June 15, 2022 have been fully considered but they are not persuasive.  Regarding independent claims 1 and 10, Applicant argues on pages 7-9 of the Amendment filed that the prior art of record fails to teach the limitations of the claims.  More specifically, Applicant argues that “…The rejection is improper and should be withdrawn for at least two reasons. First, neither Goel nor Nickolls discloses the claim features of a command processor configured to selectively dispatch the work items to the at least one SPI based, at least in part, on “a format of the cache lines of a cache implemented in the at least one shader engine.” Second, neither of the references disclose selectively dispatching the work items to the at least one SPI based on a size of the thread group in combination with the first point highlighted above, i.e., based on a format of the cache lines of a cache implemented in the at least one shader engine. With respect to the first reason, the Office Action admits on page 4 that “Goel does not specifically express the memory requirements includes a ‘format of cache lines of a cache’” after stating Goel discloses that sequences of wavefronts of threads may be dispatched based on “the size of wavefronts and memory requirements of a local memory core 106”. /d. With respect to the memory requirements, Goel discloses that the sequences of wavefronts may be assigned to processing units based the memory available in the local memory of each processing unit. Goel at [0034]. One of ordinary skill in the art would understand that dispatching work items based on the availability of a memory resource and dispatching work items based on the format of a memory resource are not the same and are patentably distinct. In sum, Goel fails to disclose a command processor that selectively dispatches work items based, at least in part, on the format of cache lines of a cache. The Office Action attempts to use Nickolls to cure the deficiencies of Goel. Nickolls fails to cure these deficiencies…Furthermore, as none of the references individually disclose selectively dispatching the work items to the at least one SPI based on a format of cache lines of a cache implemented in the at least one shader engine, it logically follows that the references fail to disclose that the selective dispatching is additionally based on the “size of the thread group.” Put differently, since the references fail to disclose that the selective dispatching is based on a first criterion, the references cannot disclose that the selective dispatching is based on the first criterion and a second criterion. At least for this reason, the rejection to claim 1 is improper and should be withdrawn...” (second thru fourth paragraphs of page 8 and second paragraph of page 9 of the Amendment filed).
In reply, as noted in the rejection above, Goel et al. describe its command processor receives instructions from control processor 101, interprets the instructions, and issue appropriate instructions to components of graphics processing device 102, including shader core 106, the instructions issued initiates a sequence of thread groups, the wavefront dispatch module 130 is configured to assign sequence of wavefronts of threads to the processing units 112 (of shader core 106), where the wavefront dispatch module 130 includes logic for determining the memory available in the local memory of each processing unit, the sequence of thread wavefronts to be dispatched to each processing unit, and the size of the wavefront that is dispatched to each processing unit.  Therefore, it is noted that Goel et al. discloses selectively dispatching…based on a size of the thread group, but do not explicitly describe the dispatching is also based on the “format of cache lines of cache.”  However, as further noted above, Goel disclose the dispatching is based on more than just the size of the thread group alone, but also takes into account memory requirements, thus at least two conditions or criterions, similar to that of claim 1.  Although Goel describes its memory requirements may include the availability of memory, it’s still taking into account “memory,” where the claimed “cache” is a form of memory.  Nickolls et al. further describes thread groups, e.g. CTA, may include a particular size that may be determined by the programmer and the amount of hardware resources available to the CTA, where PPUs (which comprises SPM 310) are configured to read from and write to surfaces that are multi-dimensional arrays of formatted data stored in graphics memory.  Nickolls et al. further describes the memory resources as a cache, where Figure 5B illustrates cache line 505 within a multi-dimensional surface block linear formatted graphics surface, the multi-dimensional formatted surfaces may be accessed as textures, render targets, and arrays utilizing surface load, store, reduce, and atomic surface instructions, where these surface access instructions are executed by the load-store units (LSUs) of each SPM 310.  Therefore, the modification of Goel et al. with Nickolls et al. may render the memory requirements of Goel et al. may be modified to specifically include the “format of cache lines of cache” as described in Nickolls as the cache lines have a specific layout for storing data as noted above, thus teaching the limitations of the claims.
Regarding independent claim 19, Applicant argues on pages 10 and 11 of the Amendment filed that the prior art of record fails to teach the limitations of the claim.  More specifically, Applicant argues that “…With respect to the first reason, neither Nickolls nor Appu discloses “adding work items having consecutive indices to multidimensional blocks....” The Office Action at page 6 admits that Nickols does not specifically disclose “adding work items having consecutive indices to multidimensional blocks...” and turns to Appu to cure this deficiency. The Office Action cites Appu Column 24, lines 20-26 which discloses, inter alia: “...a “thread group” refers to a plurality of threads that are grouped with ordered (e.g., sequential or consecutive) thread indexes.” While Appu discloses that threads may be grouped with ordered thread indices, Appu is silent as to “adding work items having consecutive indices to multidimensional blocks...” (emphasis added), 1.e., Appu does not disclose the entire feature that the Office Action admits is not disclosed in Nickolls. At least for this reason, the rejection is improper should be withdrawn.  With respect to the second reason, the rejection improperly bifurcates the language of the claim feature “adding work items having consecutive indices to multidimensional blocks that correspond to a format of cache lines of a cache implemented in at least one shader engine” into multiple pieces and evaluates the invention in a piecemeal manner in direct contradiction with the evaluation of the claim as a whole as required under 35 USC §103. In doing so, the Office Action engages in a clear application of impermissible hindsight reasoning that relies on information gleaned solely from the Applicant’s specification. The MPEP and well-established case law clearly articulate that “impermissible hindsight must be avoided and the legal conclusion must be reached on the basis of the facts gleaned from the prior art” (MPEP §2142) and “[a]ny judgement on obviousness is in a sense necessarily a reconstruction based on hindsight reasoning, but so long as it takes into account only knowledge which was within the level of ordinary skill in the art at the time the claimed invention was made and does not include knowledge gleaned only from applicant’s disclosure, such a reconstruction is proper’” (MPEP § 2145(X)(A), quoting /n re McLaughlin, 443 F.2d 1392, 1395 (CCPA 1971) (emphasis added). In the present case, the Office Action relies on flawed reasoning to allegedly support the combination of Nickolls and Appu. The feature of “adding work items having consecutive indices to multidimensional blocks that correspond to a format of a cache line” is absent from both Nickolls and Appu and is only present in the Applicant’s own specification. Due to the lack of disclosure of this feature in the cited references and the fact that this feature is only present on the record in Applicant’s specification, it logically follows that any attempt to explain that Nickolls or Appu discloses this feature has been improperly gleaned from the Applicant’s specification and that the combination of Nickolls and Appu is an exercise of impermissible hindsight. At least for this reason, the rejection is improper should be withdrawn…” (last paragraph of page 10 continued to second paragraph of page 11).
In reply, as noted in the rejection above, Nickolls et al. describes its PPUs (which comprises SPM 310) configured to read from and write to surfaces that are multi-dimensional arrays of formatted data stored in graphics memory, more specifically, in cache, where Figure 5B illustrates cache line 505 within a multi-dimensional surface block linear formatted graphics surface, the multi-dimensional formatted surfaces may be accessed as textures, render targets, and arrays utilizing surface load, store, reduce, and atomic surface instructions.  Therefore, Nickolls et al. specifically writes to surfaces in cache, thus to multidimensional blocks.  However, it is noted that Nickolls et al. do not explicitly describe work items have “consecutive indices.”  Appu further describes a thread group refers to a plurality of threads that are grouped with ordered (e.g. sequential or consecutive) thread indexes, thus considered “adding work items having consecutive indices.”  Therefore, the modification of Nickolls et al. with Appu et al. may render “…adding work items having consecutive indices to multidimensional blocks...” as recited.
In response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).

Conclusion

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACINTA M CRAWFORD whose telephone number is (571)270-1539. The examiner can normally be reached 9:00 a.m. to 5:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on (571)272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JACINTA M CRAWFORD/Primary Examiner, Art Unit 2612