DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 10 is objected to because of the following informalities:  Claim 10 recites “the plurality of copy engines”. It should be “the plurality of sub-copy engines” to be consistent with the parent claim 6. Appropriate correction is required.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 6, 11, 16 is/are rejected under 35 U.S.C. 102(a)(1) as being unpatentable by Liang et al. (US 2016/0321774 A1).
Regarding claim 1, Liang teaches:
An apparatus (FIG. 1, 100) comprising: 
copy engine circuitry (FIG. 2, 101) including a sub-block generator (FIG. 2 GPU 114 and MIF 104) to perform a fast clear of a surface (“GPU 114 may include a 2D dispatch processor 140, 2D sub-engines 142A-D (“2D sub-engines 142”), … 2D dispatch processor 140 is a processor within GPU 114 that may be configured to instruct and/or otherwise control the operation of 2D sub-engines 142 to perform a variety of 2D operations on a surface. Example 2D operations include …clear operations, … A clear operation may be used to initialize a surface by assigning a particular color value to the pixels of the surface…. The 2D operations discussed above may involve reads from memory (e.g., read-only operations), writes to memory (e.g., write-only operations), and/or both read and write operations (e.g., operations that involve both reading from, and writing to, memory).” [0066]) by dividing the surface into one or more blocks and dividing the one or more blocks into a plurality of sub-blocks having a plurality of sizes.([0087] teaches for clear operation, the surface will be divided: “2D dispatch processor 140 next determines if the surface is stored in a macro-tiling or tiling storage mode. If either macro-tiling or tiling storage mode is used, 2D dispatch processor 140 next determines the type of operation to be performed on the surface. If the operation is a clear or BLT operation, the surface is not sub-divided, such as is shown by surface 170. For any other type of operation, 2D dispatch processor 140 then determines the width and/or height of the surface to determine a sub-division.”[0052] teaches specific dividing operation, “Surface 120 may be arranged as a two dimensional (2D) array of pixel values, and GPU 114 may instruct MIF 104 to store the surface 120 or a number of surfaces in a linear, tiled or macro-tiled storage mode in memory system 107. … In a macro-tiling storage mode, pixel data is arranged into blocks, each block including multiple rectangular sub-blocks (i.e., tiles within tiles).”[0053] “GPU driver 116 may transmit instructions to cause GPU 114 to store each tile of the surface, referred to as surface tile, which is then stored in memory system 107, where a tile encompasses M×N pixel values of the surface.”)

Regarding claim 6, Liang teaches:
The apparatus of claim 1, further comprising a plurality of sub-copy engines (FIG. 4, 2D sub-engines 142A-142D) to operate in parallel to process the plurality of sub-blocks and perform memory accesses. (“[0062] Also, storing pixel values in the interleaved storage manner may allow GPU 114 to store pixel values via MIF 104 in memory system 107 in parallel. For example, GPU 114 may be able to store pixel values via MIF 104 in section 0 of portion 132A of physical page 132 within memory unit 108A via memory controller 106A at the same time or substantially the same time (e.g., in parallel) that GPU 114 may be able to store pixel values via MIF 104 in section 1 of portion 132B of physical page 132 within memory unit 108B via memory controller 106B.”. Specifically, “[0067] In the example of FIG. 4, four 2D sub-engines 142 and four caches 144 are depicted. In this example, four sections (or sub-primitives of a surface) may be operated on in parallel by 2D sub-engines 142.”)

	Claims 11 recites similar limitations of claim 1, thus is rejected using the same rejection rationale.

Regarding claim 16, Liang teaches:
A system to facilitate copying surface data, comprising: a memory to store surface data;(FIG. 2, 107 saves surface data of 120.) The rest of claim 16 recites similar limitations of claim 1, thus is rejected using the same rejection rationale.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2-5, 12-15, 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liang in view of Newburn (US 2017/0262174 A1).
Regarding claim 2, Liang teaches:
The apparatus of claim 1, 
However, Liang does not, but Newburn teaches: 
wherein each sub-block is walked with different strides to generate a minimum number of metadata update write cycles. ([0225]. “shifting the order in which data is accessed, or its layout in memory, can often help with avoiding dependence conflicts or adapting to nested loops which have offsetted indices. Embodiments of the present disclosure may skew data within any number of different dimensions. The lower and upper fields are used to define the boundaries of elements of interest, and elements can wrap around these. The stride field can have positive or negative values, and those values can extend beyond the [lower, upper] range. This allows wrapping in one or more dimensions.” [0135], “a buffer according to various embodiments of the disclosure may comprise a data object which collects a set of meta-data for a describable chunk of memory and having a member for each buffer property.” Since the dependence conflicts are reduced, the information related to the status of memory are less needed to be updated. [0227]-[0231] teaches a pseudo code to access different data blocks using read and write operand. For each data block, depending on the dimension of the data block, the reading source operand descriptor and the destination operand descriptor differ in locations, which corresponds to the walking with different strides. )
Liang teaches sub-blocks data access using copy engines, but does not teach the details of how the data is read/accessed. Newburn teaches a specific method of allowing different walk stride during memory access to reduce memory metadata update, in order to improve system operation efficiency.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Liang with the specific teachings of Newburn to allow different walk stride during memory access to reduce memory metadata update, in order to improve system operation efficiency.

Regarding claim 3, Liang in view of Newburn teaches:
The apparatus of claim 2, wherein the copy engine circuitry generates a largest possible region having a multiple of predetrmined block size.( Liang [0080] teaches dividing the sub-primitives region considering the cache sizes of each sub-engine, that is, the sub-primitive size is close to the size of a corresponding cache size to fully utilize the caches capacity. The multiple of predetrmined block size correspond to the multiple sizes of the caches. “In a further example of the disclosure, 2D dispatch processor 140 may further consider the size of caches 144 when determining the number and arrangement of sub-primitives when dividing a surface. For example, 2D dispatch processor 140 may be configured to divide a surface into sub-primitives such that the resultant size of the sub-primitives makes efficient use of caches 144. That is, if the size of the data in bytes of the sub-primitives is small relative to the size of caches 144, such a division would result in poor cache efficiency. For example, if a surface were divided into four sub-primitives and each of the sub-primitives included pixel data that was less than half the size of a cache 144 for a respective 2D sub-engine 142, cache efficiency would be poor. That is, more than half of each cache 144 would be unused for each operation. In this scenario, a division of the surface into two sub-primitives would be more optimal, as each of the two caches 144 that are used would be more fully utilized. Furthermore, only 2D sub-engines 142 would be needed, freeing up any additional 2D sub-engines 142 to perform additional operations.”)

Regarding claim 4, Liang in view of Newburn teaches:
The apparatus of claim 3, wherein the copy engine circuitry further finds aligned horizontal and vertical boundaries of the region. (Liang [0082], “In the example of FIG. 7, surface 160 is divided into four vertical sub-primitives where the boundary of each of the sub-primitives falls on the boundary line of one channel of a two-channel memory system (denoted by the A and B). In this example, two of 2D sub-engines 142 would begin scanning in memory channel A, while the other two 2D sub-engines 142 would begin scanning in memory channel B. It should be noted that the example of surface 160 is an optimal example, and such results may not be achieved for every surface size.” FIG. 8 shows the horizontal and vertical are aligned.)

Regarding claim 5, Liang in view of Newburn teaches:
The apparatus of claim 4, wherein the copy engine circuitry updates metadata associated with the region via a single write command.(Newburn, [0135] teaches updating metadata regarding memory. “As introduced above, a buffer according to various embodiments of the disclosure may comprise a data object which collects a set of meta-data for a describable chunk of memory and having a member for each buffer property. Buffer properties may have a default value, which can be overridden upon buffer creation. A subset of properties may be mutable, that is, they may be updated after buffer creation. For example, a buffer may be pinned while it is being used for a DMA, and later unpinned so that different memory can be pinned. Some properties may be immutable, such as the kind of memory (HBM vs. non-volatile) used to hold a given instance of the buffer.” It is well-known that a single write command can be used update a value using a program code. Here, Examiner take official notice. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Newburn with the well-known knowledge to use a single write command to update metadata. The benefit would be use an easy and clear coding style to update metadata. Liang teaches memory access using a plurality of copy engines. Newburn teaches metadata related to memory access and also metadata updating. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Liang with the specific teachings of Newburn to better and effectively manage memory access. )

Claims 12-15 recites similar limitations of claim 2-5 respectively, thus is rejected using the same rejection rationale respectively.

Claims 17-20 recites similar limitations of claim 2-5 respectively, thus is rejected using the same rejection rationale respectively.

Claim(s) 7-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liang in view of Duncan et al. (US 2014/0109102 A1).
Regarding claim 7, Liang teaches:
The apparatus of claim 6, further 
However, Liang does not, but Duncan teaches:
comprising a scheduler to receive the plurality of sub-blocks and schedule the plurality of surface data sub-blocks for processing. ([0068], “The copy engines 450 may execute concurrently with the processing cluster array 230. In order to perform copy operations,”.” [0071], “First, the long latency period could be addressed by adding pre-emption capabilities to each individual processing engine of PPU 202(0) (e.g., copy engine 450(0), copy engine 450(1), GPC 208(0), etc.),” [0075], “In contrast, host interface 206 of FIG. 4 implements a hardware pre-processor 410 that subdivides the copy operation into multiple subtasks associated with copy operations for small chunks of the block of memory specified by the initial copy operation. For example, a copy operation may request a large 256 MB block of memory to be copied from PP memory 204 to system memory 104. The pre-processor 410 transmits a copy command to copy engine 450(0) for the first 4 kB of memory of the 256 MB block of memory. The pre-processor 410 then modifies the original copy operation by incrementing the starting memory address included in the copy command by 4 kB and stores the modified copy command until copy engine 450(0) has finished executing the first 4 kB copy operation.”)
Liang teaches sub-engines parallelly process sub-blocks of data. But, Liang does not explicitly teaches how the who controls the sub-engines’ work. Duncan teaches a scheduler that receives data and schedule sub-engines to process the data using certain strategies.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Liang with the specific teachings of Duncan to have a scheduler to effectively schedule the work of sub-engines. The benefit would be to improve the work efficiency of sub-engines.

Regarding claim 8, Liang in view of Duncan teaches:
The apparatus of claim 7, wherein the scheduler schedules the sub-blocks based on a current sub-block processing load at each of the plurality of sub-copy engines. (Duncan [0077], “In one embodiment, pre-processor 410 tracks each of the pending operations in an ordered list 420 arranged according to priority. Pre-processor 410 is configured to schedule the highest priority pending operation. In another embodiment, host interface 206 includes a number of FIFOs (not shown), each FIFO associated with a given priority level. As host interface 206 receives tasks, the tasks are added to the particular FIFO associated with that tasks priority level. Pre-processor 410 then selects the next pending task in the highest priority FIFO that includes at least one pending task to schedule on the available processing engine.” [0068], “The copy engines 450 may execute concurrently with the processing cluster array 230. In order to perform copy operations,”.” [0071], “First, the long latency period could be addressed by adding pre-emption capabilities to each individual processing engine of PPU 202(0) (e.g., copy engine 450(0), copy engine 450(1), GPC 208(0), etc.),” The combination of claim 7 is incorporated here.)

Regarding claim 9, Liang in view of Duncan teaches:
The apparatus of claim 8, wherein each of the plurality of sub-copy engines comprises a plurality of sub-buffers and wherein a sub-buffer is assigned to a sub-block upon the sub-block being scheduled to a sub-copy engine. (Liang FIG. 4, [0068] “Returning to FIG. 4, 2D sub-engines 142 may be configured to read and/or store pixel data associated with a surface in memory system 107. In one example, 2D sub-engines 142 store the pixel data through MIF 104. MIF 104 may be accessed through bus interface 146. For storage operations, bus interface 146 collects pixel data from all of 2D sub-engines 142 and sends the pixel data to MIF 104 for storage. For read operations, bus interface 146 forwards read requests to MIF 104 and receives the pixel data back in response to the read requests. Bus interface 146 may then route the retrieved pixel data to the appropriate one of caches 144 for further operation by a respective one of 2D sub-engines 142.”) 

Regarding claim 10, Liang in view of Duncan teaches:
The apparatus of claim 9, further comprising a queue to queue the surface data sub-blocks for transmission to the plurality of copy engines. (Duncan [0077], “In one embodiment, pre-processor 410 tracks each of the pending operations in an ordered list 420 arranged according to priority. Pre-processor 410 is configured to schedule the highest priority pending operation. In another embodiment, host interface 206 includes a number of FIFOs (not shown), each FIFO associated with a given priority level. As host interface 206 receives tasks, the tasks are added to the particular FIFO associated with that tasks priority level. Pre-processor 410 then selects the next pending task in the highest priority FIFO that includes at least one pending task to schedule on the available processing engine.” Liang teaches a plurality of sub-copy engines to process sub-blocks of data, but does not explicitly teach how to schedule and arrange the workload. Duncan teaches using a queue to arrange the workload. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Liang with the specific teachings of Duncan to have a scheduler to effectively schedule the work of sub-engines. The benefit would be to improve the work efficiency of sub-engines.)



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YANNA WU whose telephone number is (571)270-0725. The examiner can normally be reached Monday-Thursday 8:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YANNA WU/Primary Examiner, Art Unit 2611