DETAILED ACTION
Re Application No. 17/190671, this action responds to the claims filed 03/03/2021.
Claims 1-20 are presented for examination.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 8 is objected to for the following informalities: the language “in response to the TLB miss in the for the virtual to physical memory address translation data” (lines 1-3).  This limitation is grammatically incorrect, and appears to contain typographical errors.

Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5-7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  Re the language “the first circuitry” (e.g. claim 5, line 1),  There is insufficient antecedent basis for this limitation in the claim, as there are a plurality of “first circuitry”, one for each processor, so it is unclear which one is “the first circuitry”;

Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-2, 5-6, and 11-13 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hendry et al (US 2011/0252200 A1).

Re claim 1, Hendry discloses the following:
A heterogeneous computing device comprising: a graphics processor coupled with an application processor (Fig 3, CPU 34, GPU 36).  The heterogeneous computing device contains multiple processors coupled over an interface, including graphics processor and a general purpose CPU (applications processor);
wherein the graphics processor and the application processor each include a cache memory (Fig. 3, caches 48 and 58).  The respective processors (e.g. graphics and application processors) include various levels of cache memory;
 and first circuitry to perform virtual to physical memory address translation; and (Figure 3, MMU/TLBs 52 and 62; ¶ 32).  The MMUs/TLBs (first circuitry) are integrated into each of the processors (Fig. 3), and perform virtual to physical memory translation (¶ 32);
wherein the first circuitry is to manage cache coherency state concurrently with virtual to physical memory address translation and (Fig. 3, cache coherence components 54 and 64; ¶ 32-34).  The cache coherence components (along with the MMU/TLBs, collectively first circuitry) are used to maintain cache coherency.  The coherency is maintained over time; accordingly, while translation is being performed, coherence is also being maintained, so the coherency state is managed “concurrently” with the translation (¶ 33-34).  Furthermore, translation and coherency management can both occur during an attempt to access data (concurrently) (¶ 32).  Additionally, the MMUs/TLBs and cache coherence components may be integrated into the same component (¶ 37);
concurrently with an address translation associated with a write to a virtual memory address of a block of memory, the first circuitry is to: determine a cache coherency state for a cache line associated with the block of memory based on cache coherency metadata stored in association with virtual to physical memory address translation for the block of memory; and (Figs. 4 and 9-10; ¶ 32 and 43).  In response to an attempt to access memory, the MMU/TLB performs virtual to physical translation, and, as part of the process (concurrently), looks up the ownership (coherency) metadata using the physical address (Figs. 4 and 9-10; ¶ 32).  Furthermore, an access may be associated with writing a cache line into the shared memory (a write to a virtual memory address of a block of memory) (¶ 43);
based on the cache coherency state, transfer coherency ownership for the cache line between the graphics processor and the application processor (Figs. 9-10).  Based on the ownership information (cache coherency state), ownership of a cache line is transferred from the CPU to GPU (Fig. 9), or from the GPU to CPU (Fig. 10);
wherein the first circuitry of each of the graphics processor and the application processor is to transfer coherency ownership via a response sent between the graphics processor and application processor (Figs. 9-10).  The transfer of coherency ownership occurs via a request for ownership, either from the CPU (Fig. 9) or the GPU (Fig. 10).  The requested processor indicates (via a response) that the region has been evicted, and is now free (Fig. 9, step 136; Fig. 10, step 156).

Re claim 2, Hendry discloses the device of claim 1, and further discloses that based on the response sent between graphics processor and application processor, the first circuitry of each of the graphics processor and the application processor is to set coherency ownership to indicate ownership for the graphics process and clear coherency ownership for the application processor or clear coherency ownership for the graphics processor and set coherency ownership for the application processor (¶ 45 and 48).  The cache coherence components respectively set and clear their ownership information in order to reflect the transfer of ownership between the CPU and GPU.

Re claim 5, Hendry discloses the device of claim 1, and further discloses that the first circuitry includes a translation lookaside buffer (TLB to cache a virtual to physical address translation for the block of memory and cache coherency state associated with the block of memory (Fig. 3, MMU/TLB 52/62, cache coherence component 54/64; ¶ 32-33).  The MMU/TLB and cache coherence components (collectively first circuitry) includes translations from virtual to physical addresses for memory, as well as cache coherency information (cache coherency state) (Fig. 3, MMU/TLB 52/62, cache coherence component 54/64; ¶ 32).  The cache coherence components may be part of the MMU/TLB (¶ 33).

Re claim 6, Hendry discloses the device of claim 5, and further discloses that the TLB is to store the cache coherency state in a coherency buffer and the first circuitry is to determine the cache coherency state associated with the block of memory based on an entry in the coherency buffer (Figs. 3-4; ¶ 32-34).  The cache coherency component (coherency buffer) includes cache coherency state information; it can be part of the MMU/TLB.

Re claim 11, Hendry discloses the device of claim 1; accordingly, it also discloses a method implementing that device, as in claim 11.

Re claim 12, Hendry discloses the method of claim 11, and further discloses the following:
the memory write request is by the graphics processor and transferring coherency ownership associated with the block of memory includes (Fig. 9, step 122; ¶ 43).  The GPU (graphics processor) sends a request to access the cache line (Fig. 9, step 122).  The access may include writing data (¶ 43);
sending, by a first address translation circuit associated with the graphics processor, a request to a second address translation circuit associated with the application processor, wherein the request is to obtain coherency ownership for the cache lines associated with the block of memory (Fig. 3; Fig. 9, steps 128 and 132).  The MMU/TLB of the GPU (first address translation unit) sends a request for ownership of the data to the MMU/TLB of the CPU (second address translation circuit);
receiving, at the first address translation circuit, from the second address translation circuit, a response to the request, wherein the response is a transfer ownership response; and (¶ 136).  The MMU/TLB of the CPU (second address translation unit) sends a response indicating that the region of memory is free, thus indicating that transfer of ownership can proceed;
after receiving the response at the first address translation circuit, setting coherency ownership in the cache coherency metadata of the first address circuit to indicate ownership for the graphics processor and clearing coherency ownership for the application processor in the cache coherency metadata of the first translation circuit (Fig. 9, step 138; ¶ 45 and 48).  See claim 2 above.

Re claim 13, Hendry discloses the method of claim 12, and further discloses the following:
receiving the request to obtain coherency ownership at the second address translation circuit (Fig. 9, step 132).  The MMU/TLB at the CPU receives the request from the GPU;
setting coherency ownership in the cache coherency metadata of the second translation circuit to indicate ownership for the graphics processor and clearing coherency ownership for the application processor in the cache coherency metadata; and (¶ 45 and 48).  Both the CPU and GPU set and clear their coherency metadata to indicate the correct ownership;
sending, by the second address translation circuit, to the first address translation circuit, the transfer ownership response (Fig. 9, step 136).  The MMU/TLB sends the indication that the memory is free (transfer ownership response).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3-4 are rejected under 35 U.S.C. 103 as being unpatentable over Hendry in view of Alexander et al (US 2017/0286303 A1).

Re claim 3, Hendry discloses the device of claim 2, and further discloses that the data of the block of memory is stored in one or more cache lines of the cache memory of the graphics processor and […] stored in one or more cache lines of the cache memory of the application processor (Figs. 9-10; ¶ 42).  As noted above, the data of the block of memory is first stored in the cache of the CPU or GPU, and then transferred to the GPU or CPU.  Furthermore, when ownership is to be transferred, rather than actually evicting data from the transferring processor, the cached location can merely be marked for eviction, and evicted later (¶ 42).  

As noted above, Hendry discloses that rather than immediately evicting data from a cache during the transfer process, it can instead be marked for subsequent eviction (¶ 42); accordingly, one having ordinary skill in the art would understand that this could produce situations where the data is not actually evicted until some time after the data has been loaded into the other processor’s cache.  However, Hendry does not explicitly state that this is the case; accordingly, in the interest of furthering compact prosecution, Examiner has provided Alexander.

Alexander discloses that the data of the block of memory is stored in one or more cache lines of [a first cache] and concurrently stored in one or more cache lines of [a second cache] (¶ 22 and 43).  When data is to be promoted from an L2 cache (second cache) to an L1 cache (first cache), the data in the L2 can be marked for eviction, but eviction delayed until after the data is returned to a fill buffer (¶ 43), which is part of the L1 cache (¶ 22).  Accordingly, for a nonzero amount of time, the data will be in both the L2 cache, as well as the fill buffer of a L1 cache (i.e. concurrently).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify the eviction marking of Hendry to delay actual eviction until data has been transferred to the receiving cache, as in Alexander, because Alexander suggests that this would avoid an unnecessary prefetch (¶ 26 and 43).

Re claim 4, Hendry and Alexander disclose the device of claim 3, and Hendry further discloses that the cache lines that store data for the block of memory have a cache coherency state configured based on which of the graphics processor and application processor has coherency ownership of the block of memory (Fig. 4).  The cache coherence components indicate whether memory addresses, including cache lines, are owned by the GPU or CPU.

Claims 7-10 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hendry in view of Basu et al (US 2017/0337136 A1).

Re claim 7, Hendry discloses the device of claim 6, but does not specifically disclose a page walker in response to a TLB miss.

Basu disclose that the first circuitry includes a page walker to walk a page table in response to a TLB miss for virtual to physical memory address translation data for the block of memory (¶ 84).  When there is a TLB miss, the memory management unit (first circuitry) is configured to walk a page table (i.e. page walker), and retrieve the relevant virtual-to-physical memory address translation data.

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to integrate the page walk functionality of Basu into the cache miss handling of Hendry, because Basu suggests that utilizing a page table walk allows for virtual-to-physical translation, and can be searched entry by entry to find page table entry information that it is not present in the TLB (¶ 19-20).

Re claim 8, Hendry and Basu disclose the device of claim 7, and Basu further discloses the following:
the page walker, in response to the TLB miss in the for the virtual to physical memory address translation data for the block of memory, is to: read the virtual to physical memory address translation data for the block of memory and the cache coherency metadata for the block of memory from the page table (Figs. 2-3; ¶ 84).  If there is a TLB miss, the relevant translation, along with the metadata, which includes a cache coherency indicator, is fetched from the page table;
write the virtual to physical memory address translation data for the block of memory to a page table entry of the TLB; and write the cache coherency metadata for the block of memory to the entry in the coherency buffer (Figs. 4-5; ¶ 65).  The entry from the page table, which includes the translation and coherency metadata, can be stored in the TLB (TLB/coherency buffer).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Re claim 9, Hendry and Basu disclose the device of claim 8, and Hendry further discloses that the [device] is to store the coherency metadata for the block of memory at sub-page granularity (¶ 20).  Cache coherence can be maintained at a line-level granularity (sub-page granularity).

Basu discloses that the page table is to store the cache coherency metadata for the block of memory (Figs. 2-3).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Re claim 10, Hendry and Basu disclose the device of claim 9, and Hendry further discloses that the [device] is to store the cache coherency metadata for the block of memory at cache line granularity (¶ 20).  Line-level granularity is both sub-page granularity, and also cache line granularity.

Basu discloses that the page table is to store the cache coherency metadata for the block of memory (Figs. 2-3).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Re claim 14, Hendry discloses the method of claim 13, and further discloses evicting modified/invalid data from the cache of a CPU (application processor) (¶ 45 and 48); while it is well known in the art that eviction involves flushing dirty cache lines, since it is not explicitly stated, in the interest of furthering compact prosecution, Examiner has provided Basu.

Basu discloses flushing one or more dirty cache lines associated with the block of memory from the cache memory of the application processor (¶ 27).  Dirty copies of data are returned (flushed) to the memory from either the CPU (application processor) or GPU.

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to integrate the flushing of dirty data from Basu into the cache coherency of Hendry, because Basu suggests that flushing dirty data “ensures that no data and/or incorrect copies of page table entries (with outdated metadata) are held by processors of other types” (¶ 27).

Re claim 15, Hendry and Basu disclose the method of claim 14, and Hendry further discloses updating the cache lines associated with the block of memory in the cache memory of the graphics processor (Fig. 9, steps 128 and 130).  After acquiring ownership, the GPU can access the lines from memory and load them into the cache.

Re claim 16, Hendry and Basu disclose the method of claim 15, and Basu further discloses the following:
performing the virtual to physical memory address translation includes: determining that a translation lookaside buffer (TLB) of the graphics processor lacks information to perform the virtual to physical memory address translation (¶ 84).  See claim 7 above);
accessing a page table of the heterogeneous processing system to retrieve a page table entry for the virtual memory address associated with the memory write request; and (Figs. 2-3; ¶ 84).  See claim 8 above;
storing the page table entry and the cache coherency metadata associated with the page table entry in the TLB of the graphics processor (Figs. 4-5; ¶ 65).  See claim 8 above.

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Re claim 17, Hendry discloses the following:
A heterogeneous processing system comprising: a graphics processor including a first cache memory and a first circuit (Fig. 3, GPU 36, caches 58 and 60, MMU/TLB 62, cache coherence component 64).  See claim 1 above;
the first circuit to perform virtual to physical memory address translation and manage cache coherency state (Fig. 3, MMU/TLB 62, cache coherence component 64; ¶ 32-34).  See claim 1 above;
an application processor including a second cache memory and a second circuit (Fig. 3, CPU 34, caches 48 and 50, MMU/TLB 52, cache coherence component 54).  See claim 1 above;
the second circuit configured to perform virtual to physical memory address translation and manage cache coherency state (Fig. 3, MMU/TLB 52, cache coherence component 54; ¶ 32-34). See claim 1 above;
system memory coupled to the […] cache memory (Fig. 3, shared memory 42).  The CPU/GPU, and their respective caches, are connected to a shared memory;
the cache coherency state including coherency ownership information for one or more cache lines associated with the block of memory; and (Fig. 4).  See claim 1 above;
wherein in association with a write request to the block of memory by the application processor or the graphics processor, the first circuit or the second circuit is to: perform a virtual to physical memory address translation for a virtual memory address of the block of memory; and (Figs. 4 and 9-10; ¶ 32 and 43).  See claim 1 above;
concurrently with the virtual to physical memory address translation, determine the cache coherency state associated with the block of memory based on the cache coherency state; and (Figs. 4 and 9-10; ¶ 32 and 43).  See claim 1 above;
based on the cache coherency state, transfer cache coherency ownership associated with the block of memory between the graphics processor and the application processor (Figs. 9-10).  See claim 1 above;
 wherein the first circuit and the second circuit are to transfer coherency ownership via a response sent between the graphics processor and application processor (Figs. 9-10).  See claim 1 above.

Hendry discloses a shared memory; while it could broadly be considered a “third cache” that is shared between the CPU and GPU, it is not explicitly indicated as such.  Furthermore, Hendry does not disclose a page table.  Accordingly, Examiner has provided Basu.

Basu discloses the following:
a third cache memory coupled to the graphics processor and the application processor, the third cache memory to cache a block of memory shared between the graphics processor and the application processor; and (Fig. 1, L3 cache 120; ¶ 38).  The L3 cache is shared between the GPU and CPU; accordingly, blocks that are stored there are shared between the GPU and CPU;
system memory coupled to the third cache memory, the system memory to store a page table including a virtual to physical memory address mapping and a cache coherency state, the cache coherency state including cache coherency ownership information for one or more cache lines associated with the block of memory; and (Fig. 1, memory 106, page table 124; Fig. 2, entry 200; Fig. 3, cache coherency metadata indicator 306).  The system memory is shared between the CPU and GPU, and is coupled to the L3 cache over a bus (Fig. 1).  The system memory stores a page table (Fig. 1, page table 124), and the page table contains entries including virtual to physical address translations, as well as metadata (Fig. 2, entry 200).  The metadata includes cache coherency information for one or more lines (Fig. 3, cache coherency metadata indicator 306).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Re claim 18, Hendry and Basu disclose the system of claim 17, and Hendry further discloses that the [device] is to store the coherency metadata for the block of memory at sub-page granularity (¶ 20).  See claim 9 above.

Basu discloses that the page table is to store the cache coherency metadata for the block of memory (Figs. 2-3).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Re claim 19, Hendry and Basu disclose the system of claim 17, and Hendry further discloses that the [device] is to store the cache coherency metadata for the block of memory at cache line granularity (¶ 20).  Cache coherence can also be maintained at a page-level granularity.

Basu discloses that the page table is to store the cache coherency metadata for the block of memory (Figs. 2-3).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Re claim 20, Hendry and Basu disclose the system of claim 19, and Basu further discloses:
the write request to the block of memory is by the graphics processor and to perform a virtual to physical memory address translation in association with the write request includes to: (¶ 25).  The GPU is making the write request, and the request involves performing a translation;
determine that a translation lookaside buffer (TLB of the graphics processor lacks information to perform the virtual to physical memory address translation (¶ 84).  When there is a TLB miss (TLB of the relevant processor, i.e. the graphics processor, lacks a virtual to physical translation), a page table walk is performed;
access the page table to retrieve a page table entry for the virtual memory address associated with the write request; and (Figs. 2-3; ¶ 84).  See claim 8 above;
store the page table entry and the cache coherency state associated with the page table entry in the TLB of the graphics processor (Figs. 4-5; ¶ 65).  See claim 8 above.

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry and Basu, for the reasons noted in claim 7 above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CRAIG S GOLDSCHMIDT whose telephone number is (571)270-3489. The examiner can normally be reached M-F 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on 5712707519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CRAIG S GOLDSCHMIDT/
Primary Examiner, Art Unit 2132