DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/08/2021 has been entered.

Information Disclosure Statement
The information disclosure statement (IDS) submitted is considered by the examiner.

Response to Arguments
Applicant's arguments filed 02/08/2021 have been fully considered. Claims 1, 4-9, 12-17 and 20 are pending. The amendment to the specification has been acknowledged.

Applicant’s arguments with respect to claim(s) 1, 4-9, 12-17 and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claims 6 and 14 are objected to because of the following informalities: in lines 2 of the claims, “processing data from [[a]] GPU tile of the plurality of GPU tiles…” should be changed to “processing data from the GPU tile of the plurality of GPU tiles”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-6, 8-9, 12-14, 16-17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over VanReenen et al. (US Publication Number 2018/0350036 A1, hereinafter “VanReenen”) and Frascati et al. (US Publication Number 2014/0267259), further in view of Arunkumar et al. (NPL “MCM-GPU: Mulit-Chip-Module GPUs for Continued Performance Scalability”, 2017).

(1) regarding claim 1:
As shown in fig. 1, VanReenen disclosed an apparatus (computing device 2, fig. 1) comprising: 
a memory (memory 10, fig. 1) for storage of data (para. [0040], note that system memory 10 may store program modules and/or instructions), the data including geometric data for graphic processing (para. [0067], note that additional processing stages such as domain shaders, tessellation, and/or geometry shaders can be added to GPU 12, and there is efficient overlap of binning and rendering due to time-separated geometry and rasterization processing (e.g., GPU 12 may render one image while performing binning on the next surface).), the memory including a distributed data structure (para. [0039], note that memory controller 8 facilitates the transfer of data going into and out of system memory 10); and 
one or more processors including a graphics processing unit (GPU) to process data (GPU 12, fig. 1), wherein the GPU includes a plurality of GPU tiles on a substrate (para. [0042], note that GPU 12 may include a plurality of processing elements that are configured to operate on multiple vertices or pixels in a parallel manner. Also see para. [0043], note that GPU 12 may, in some instances, be integrated into a motherboard of computing device 2. In other instances, GPU 12 may be present on a graphics card that is installed in a port in the motherboard of computing device 2 or may be otherwise incorporated within a peripheral device configured to interoperate with computing device 2. In further instances, GPU 12 may be located on the same microchip as CPU 6, forming a system on a chip (SoC).). 
VanReenen disclosed most of the subject matter as described as above except for specifically teaching each GPU tile being a chiplet having a respective tile-based storage, wherein, upon a set of geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the set of geometric data assigned to each of the plurality of screen tiles from the distributed data structure to a tile-based storage of a respective GPU tile of the plurality of GPU tiles, and wherein each set of geometric data is to be processed locally in the respective GPU tile of the plurality of GPU tiles to which the set of geometric data was transferred.
However, Frascati disclosed wherein, upon a set of geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the set of geometric data assigned to each of the plurality of screen tiles from the distributed data structure to a tile-based storage of a respective GPU tile of the plurality of GPU tiles (page 323, section 3.2, para. [0002], note that tile-based rendering may address the above-mentioned issues by subdividing a render target into a plurality of sub-regions (e.g., tiles or bins), and performing a separate rendering pass for each of the sub-regions. Each of the sub-regions may correspond to a subset of the pixels in the render target (e.g., a 16.times.16 tile of pixels). During each of the rendering passes, all of the image data associated with the corresponding sub-region may be rendered, which may include rendering each of the primitives that contributes pixel data to the sub-region. A high-bandwidth, on-chip memory that is large enough to store the data for a single sub-region of the render target may be used as a local render target for each of rendering passes), and wherein each set of geometric data is to be processed locally in the respective GPU tile of the plurality of GPU tiles to which the set of geometric data was transferred (para. [0032], note that by performing separate rendering passes on a per-tile basis, tile-based rendering schemes may be able to allow a high-bandwidth, on-chip memory to be used for merging rasterized image data even in area-limited applications that do not allow for large on-chip memories).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein, upon a set of geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the set of geometric data assigned to each of the plurality of screen tiles from the distributed data structure to a tile-based storage of a respective GPU tile of the plurality of GPU tiles, and wherein each set of geometric data is to be processed locally in the respective GPU tile of the plurality of GPU tiles to which the set of geometric data was transferred. The suggestion/motivation for doing so would have been so that by using bounding regions and data that is generated by at least one tessellation processing stage of an on-chip, tessellation-enabled graphics rendering pipeline to generate the binning data that is used to perform tile-based rendering, the performance of an on-chip, tessellation-enabled, tile-based rendering GPU may, in some cases, be improved without sacrificing the quality of the resulting rendered image (para. [0005]). Therefore, it would have been obvious for VanReenen and Frascati to obtain the invention as specified in claim 1.
In addition to that, Arunkumar teaches each GPU tile being a chiplet having a respective tile-based storage (fig. 1, page 321, para. [0002], Note that Multi-Chip Module GPU (MCM-GPU) architecture that enables continued GPU performance scaling despite the slowdown of transistor scaling and photoreticle limitations. Our proposal aggregates multiple GPU).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach each GPU tile being a chiplet having a respective tile-based storage. The suggestion/motivation for doing so would have been in order to optimize MCM-GPU design using three architectural innovations targeted at improving locality and minimizing inter-GPM communication: (i) hardware caches to capture remote traffic in the local GPM, (ii) distributed and batched co-operative thread array (CTA) scheduling to better leverage inter-CTA locality within a GPM, and (iii) first touch page allocation policy to minimize inter-GPM traffic (para. [0005]). Therefore, it would have been obvious for VanReenen, Frascati and Arunkumar to obtain the invention as specified in claim 1.

(2) regarding claim 4:
 VanReenen further disclosed the apparatus of claim 1, further comprising a display, wherein the apparatus is further to pull geometric data from a GPU tile of the plurality of GPU tiles for the display without consolidating the geometric data of the plurality of GPU tiles (para. [0103], note that GPU 12 may initially store the image content in its local memory 14. GPU 12 may then output the image content from local memory 14 and store it to system memory 10 (28). In the outputting of the image content, GPU 12 may not change the size of the image content or otherwise increase the size of the image content. This allows GPU 12 to limit the amount of image content that needs to be stored in system memory 10). 

(3) regarding claim 5:
VanReenen further disclosed the apparatus of claim 1, wherein the apparatus is to collect all triangles mapping to one or more screen tiles in a single render pass (para. [0206], note that rendering pass circuitry 72 may repeat these operations for each portion, and generate image content for each portion. In some examples, only a single rendering pass may be needed, where a rendering pass refers to the rendering of image content in all of the portions. Also see para. [0032]).

(4) regarding claim 6:
VanReenen further disclosed the apparatus of claim 1, further comprising a mesh shader to operate with the plurality of GPU tiles (para. [0067], note that as one example, additional processing stages such as domain shaders, tessellation, and/or geometry shaders can be added to GPU 12), wherein the mesh shader is to process data from a GPU tile of the plurality of GPU tiles without transfer of data across GPU tiles (para. [0209], note that in texture mapping the smaller-sized image content from system memory 10 to the mesh, texture circuit 82 may receive instructions that cause texture circuit 82 to map vertices to re-scale the image content, and also shift or rotate (e.g., warp) the image content. In this way, the warping may be done as part of the resizing). 

(5) regarding claim 8:
VanReenen further disclosed the apparatus of claim 6, further comprising a stream out circuit to read out mesh data from the mesh shader and write the data to memory in a structure of arrays for compression of the mesh data (para. [0104], note that GPU 12 may warp the image content (34). In some examples, GPU 12 may first resize the smaller-sized image content to the size of the portion with which it is associated to ensure there are no holes in the image, and then warp (e.g., rotate or shift) the resized image content to account for any changes in the movement of the user's head since the time when the image content information was requested. In some examples, GPU 12 may first warp the smaller-sized image content and then resize, or warp and resize the image content together).

The proposed rejection of VanReenen, Frascati and Arunkumar, as explained in the apparatus claims 1, 4-6, 8, renders obvious the steps of the non-transtitory computer-readable medium (para. [0040]) claims 9, 12-14 and 16 and the method (fig. 2) of claims 17 and 20 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claims 1, 4-6 and 8 are equally applicable to claims 9, 12-14, 16-17 and 20.

Claims 7 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over VanReenen, Frascati and Arunkumar, further in view of Fielding et al. (US Publication Number 2019/0080504 A1, hereinafter “Fielding”).

(1) regarding claims 7 and 15:
VanReenen disclosed most of the subject matter as described as above except for specifically teaching wherein the apparatus is to provide tile-based immediate mode rendering (TBIMR) with the mesh shader. 
However, Fielding disclosed wherein the apparatus is to provide tile-based immediate mode rendering (TBIMR) with the mesh shader (para. [0023], note that the GPU is a tile-based renderer. The GPU therefore produces tiles of a render output data array to be generated. The render output data array may be an output frame. Tile-based rendering differs from immediate mode rendering in that, rather than the entire render output being processed in one go, the render output is divided into a plurality of smaller sub-regions (or `areas`). Those sub-regions are referred to herein as tiles. Each tile is rendered separately. Also see para. [0028], the graphics processing pipeline 200 includes a number of stages, including a vertex shader 202, a hull shader 204). 
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to disclose wherein the apparatus is to provide tile-based immediate mode rendering (TBIMR) with the mesh shader. The suggestion/motivation for doing so would have been in order to automatically selecting a rendering mode for use by a graphics processing unit (GPU) to render graphics data for display (para. [0005]). Therefore, it would have been obvious to combine VanReenen, Frascati, Arunkumar and Fielding to obtain the invention as specified in claims 7 and 15.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Gruber et al. (US Publication Number 2013/0241938 A1) disclosed visibility-based state updates in graphical processing units.

Hui et al. (US Publication Number 2016/0247310 A1) disclosed systems and methods for reducing memory bandwidth using low quality tiles.

Any inquiry concerning this communication or earlier communication from the examiner should be directed to Hilina K Demeter whose telephone number is (571) 270-1676. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu could be reached at (571) 272- 7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about PAIR system, see http://pari-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HILINA K DEMETER/Primary Examiner, Art Unit 2674