DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 2-3 and 5-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
Claim 2 recites: “Apparatus according to claim 1, in which the control circuitry is configured to control the transfer of at least the subset of the translation data in response to initiation of execution by the second processing device of a processing task in a virtual address space associated with a given processing task executed by the first processing device prior to the transfer.”  This language only differs from language in claim 1 by the underlined language.  Claim 1 now recites “a second, different, processing device” and “the second, different processing device”.  It is not clear whether the language of claim 2 was meant to be deleted or if this language is meant to refer to another “the second processing device” which is distinguished from “the second different processing device” in claim 1.  Note also that there would be an antecedent basis issue and that the duplicate language of claims 1 and 2 would also be indefinite because it would not be clear what is being required.  
All dependent claims are rejected as containing the material of the claims from which they depend. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-13 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Lustig (TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs) and Bhowmik (US 2013/0173882).
1. Apparatus comprising: 
two or more processing devices each having an associated translation lookaside buffer to store translation data defining address translations between virtual and physical memory addresses, (“Translation Lookaside Buffers (TLBs) are performance-critical structures used to cache address translation information for virtual memory systems. Since every instruction requires at least one translation[.]”  Lustig page 1, introduction.  
Lustig does not expressly state that the address translation information for virtual memory systems are between virtual and physical memory addresses.
Bhowmik teaches: “A translation lookaside buffer (TLB) configured for use in a multiple operating system environment includes a plurality of storage locations, each storage location being configured to store a page translation entry configured to relate a virtual address range to a physical address range, each page translation entry having an address space identifier (ASID) associated with an operating system.”  Bhowmik Abstract.
It would have been obvious to one of ordinary skill in the art to combine the teaching of Bhowmik before the effective filing date because this speeds up the translation between virtual address space is made available to programs (to avoid the need for complex management in programs) and physical address space used by physical memory chips.) each address translation being associated with a respective virtual address space; and control circuitry to control the transfer of at least a subset of the translation data from the translation lookaside buffer associated with a first processing device to the translation lookaside buffer associated with a second, different, processing device; and (“Leader-Follower prefetching exploits the fact that in ICS-heavy benchmarks, if a core (the leader) TLB misses on a particular virtual page entry, other cores (the followers) will also typically TLB miss on the same virtual page eventually. Since the leader would already have found the appropriate translation, we can prevent the followers from missing on this entry by pushing it into the followers’ TLBs.”  Lustig page 8, section 4.3.  See also Lustig page 9, figure 14.) 
control the transfer of at least the subset of the translation data in response to initiation of execution by the second, different processing device of a processing task in a virtual address space, the virtual address space being associated with a given processing task executed by the first processing device prior to the transfer. (Lustig teaches: “Like many uniprocessor TLB prefetching studies, we do not prefetch entries directly into the TLB, but instead insert them into a small, separate Prefetch Buffer (PB) which is looked up concurrently with the TLB. This helps mitigate the challenge of prefetching into the TLB too early and displacing useful information. Each PB entry maintains a Valid bit and a Prefetch Type bit (to indicate whether the entry arose from Leader-Follower or Distance-based Cross-Core prefetching) in addition to the translation entry (virtual page, physical page, context ID etc.). On a PB entry hit, the particular entry is removed from the PB and inserted into the TLB.”  Lustig page 8, section 4.3.1.)
2. Apparatus according to claim 1, in which the control circuitry is configured to 
control the transfer of at least the subset of the translation data in response to initiation of execution by the second processing device of a processing task in a virtual address space associated with a given processing task executed by the first processing device prior to the transfer.  (“Like many uniprocessor TLB prefetching studies, we do not prefetch entries directly into the TLB, but instead insert them into a small, separate Prefetch Buffer (PB) which is looked up concurrently with the TLB. This helps mitigate the challenge of prefetching into the TLB too early and displacing useful information. Each PB entry maintains a Valid bit and a Prefetch Type bit (to indicate whether the entry arose from Leader-Follower or Distance-based Cross-Core prefetching) in addition to the translation entry (virtual page, physical page, context ID etc.). On a PB entry hit, the particular entry is removed from the PB and inserted into the TLB.”  Lustig page 8, section 4.3.1.)
3. Apparatus according to claim 2, in which the given processing task is selected from the list consisting of: 
a processing task most recently executed by the first processing device; and a processing task currently executed by the first processing device.  (“Case 2: Suppose instead that core 1 sees a D-TLB and PB miss (step 2a). In response, the page table is walked and the translation is located and refilled into the D-TLB. In step 2b, this translation is also prefetched or pushed into PBs of the other cores, with the aim of eliminating future ICS misses on the other cores. In step 2b, at PB insertion time, a check is made to see if the pushed entry already exists. If so, the entry is brought to the head of the PB.”  Lustig page 9, continued onto page 9.) 
4. Apparatus according to claim 1, in which 
the apparatus is configured to generate a prediction of initiation of execution by the second processing device of a processing task in a virtual address space associated with a given processing task executed by the first processing device prior to the transfer, and to control the transfer of at least the subset of the translation data in response to the prediction.  (“Case 2: Suppose instead that core 1 sees a D-TLB and PB miss (step 2a). In response, the page table is walked and the translation is located and refilled into the D-TLB. In step 2b, this translation is also prefetched or pushed into PBs of the other cores, with the aim of eliminating future ICS misses on the other cores. In step 2b, at PB insertion time, a check is made to see if the pushed entry already exists. If so, the entry is brought to the head of the PB.”  Lustig page 8, continued onto page 9. “We mitigate harmful and useless prefetches by incorporating confidence estimation. To do so, we add a CPU Number field to each PB entry. The CPU Number tracks the leader core responsible for the prefetch of each entry. In addition, as shown in Figure 15, each core maintains confidence counters, one for every other core in the system. Therefore, in our example with an N-core CMP, core 0 has saturating counters for cores 1 to N-1.”  Lustig page 9, section 4.3.2, fourth paragraph.  See also the following three paragraphs showing the steps taken in carrying out the prediction, not pasted here for brevity.  Note also that prefetching is in general based on the concept of locality which is a prediction of future usage.)
5. Apparatus according to claim 2, in which: 
the control circuitry is configured to control the transfer of translation data relating to the virtual address space associated with the given processing task.  (See rejection of figure 2.  Note that a page entry hit in a TLB results from a page access (e.g. load/store operations))
6. Apparatus according to claim 2, in which: 
each virtual address space has an address space identifier; and the control circuitry is configured to maintain task data to indicate address space identifiers of processing tasks executed by the processing devices.  (Bhowmik teaches: “each page translation entry having an address space identifier (ASID)”  Bhowmik paragraph 0005. “The ASID may be a multi-bit field having a plurality of values.”  Bhowmik paragraph 0006.  “In a multi-operating system environment each host or guest operating system is assigned a unique ASID. Each page translation entry includes a virtual address 124, 134, a physical address 126, 136 and an ASID 128, 138. Each page translation entry 122, 132 is therefore uniquely associated with the specific host or guest operating system that created the entry.”  Bhowmik paragraph 0031.  See also Bhowmik figure 1.  Note that guest operating systems or other applications using a processor would be understood to be running tasks.)
7. Apparatus according to claim 6, in which 
the control circuitry is configured to identify from the task data, for the address space identifier of the virtual address space associated with the given processing task, one or more other processing elements which most recently executed the given processing task and to control the transfer of translation data from the translation lookaside buffer associated with one of the identified processing elements.  (With respect to claim interpretation, note that the language “for the address space identifier of the virtual address space associated with the given processing task” is written as an intended use because it does not require any specific structural limitations or steps to be performed. See MPEP §§ 2103 and 2111.04.  “We mitigate harmful and useless prefetches by incorporating confidence estimation. To do so, we add a CPU Number field to each PB entry. The CPU Number tracks the leader core responsible for the prefetch of each entry. In addition, as shown in Figure 15, each core maintains confidence counters, one for every other core in the system. Therefore, in our example with an N-core CMP, core 0 has saturating counters for cores 1 to N-1. The figure illustrates three cases of operation for confidence-based Leader-Follower prefetching: Case 1: Suppose that core 0 sees a PB hit (step 1a). As in the baseline case, step 1b removes the PB entry and inserts it into the D-TLB. In addition, we check, with the Prefetch Type bit, if the entry had been prefetched based on the Leader-Follower scheme. If so, we identify the initiating core (from the CPU number). In our example, this is core 1. Therefore, in step 1c, a message is sent to increment core 1’s confidence counter corresponding to core 0 since we are now more confident that prefetches where core 1 is the leader and core 0 is the follower are indeed useful. Case 2: Suppose instead (step 2a) that core 1 sees a D-TLB and PB miss. In response, the page table is walked and the D-TLB refilled. Then, in step 2b, core 1’s confidence counters are checked to decide which follower cores to push the translation to. We prefetch to a follower if its B-bit confidence counter is greater or equal to 2B−1. In our example, core 1’s counter corresponding to core 0 is above this value, and hence step 2c pushes the translation into core 0’s PB. At the same time, since core 1 itself missed in its PB, we need to increase the rate of prefetching to it. Step 2d therefore sends messages to all other cores so that core 1’s confidence counters in the other cores are incremented. Case 3: Consider the third case in which a PB entry is evicted from core N-1 without being used (step 3a). Since this corresponds to a bad prefetch, we send a message to the core that initiated this entry (step 3b), in this case core 1. There, core 1’s counter corresponding to core N-1 is decremented, decreasing bad prefetching.”  Lustig page, section 4.3.2.)
8. Apparatus according to claim 7, in which, 
when the control circuitry identifies more than one processing element, the control circuitry is configured to control the transfer of at least a subset of the translation data from the translation lookaside buffer associated with a physically closest one of the identified processing devices to the translation lookaside buffer associated with a second, different, processing device. (See rejection of claim 7.  Note that a teaching of controlling of transfers between all processors also teaches control of transfers between the closest processors.)
9. Apparatus according to claim 2, in which 
the second processing device is configured to detect a processing event and in response to the detection, to request the control circuitry to control the transfer of at least a subset of the translation data from the translation lookaside buffer associated with the first processing device to the translation lookaside buffer associated with the second processing device. (See rejection of figure 2 teaching a page miss in the TLB (hit in the buffer) resulting in moving the TLB entry of the first processor from the buffer to the TLB of the second processor.  Note that a page entry hit in a TLB results from a page access (e.g. load/store operations))
10. Apparatus according to claim 9, in which the processing event is an event selected from the list consisting of: 
a translation lookaside buffer miss at the second processing device, following initiation of execution of the given processing task to the second processing device; a request for the second processing device to start processing of the given processing task; execution by the second processing device of a program instruction to request the control circuitry to control the transfer; and detection of a change in one or more control registers of the second processing device associated with a change in one or both of processing task and virtual address space.  ((“Like many uniprocessor TLB prefetching studies, we do not prefetch entries directly into the TLB, but instead insert them into a small, separate Prefetch Buffer (PB) which is looked up concurrently with the TLB. This helps mitigate the challenge of prefetching into the TLB too early and displacing useful information. Each PB entry maintains a Valid bit and a Prefetch Type bit (to indicate whether the entry arose from Leader-Follower or Distance-based Cross-Core prefetching) in addition to the translation entry (virtual page, physical page, context ID etc.). On a PB entry hit, the particular entry is removed from the PB and inserted into the TLB.”  Lustig page 8, section 4.3.1.  Note that the hit in the PB also describes a miss in the TLB.))
11. Apparatus according to claim 2 in which 
the first processing device is configured to detect a processing event and in response to the detection, to request the control circuitry to control the transfer of at least a subset of the translation data from the translation lookaside buffer associated with the first processing device to the translation lookaside buffer associated with the second processing device.  (“Leader-Follower prefetching exploits the fact that in ICS-heavy benchmarks, if a core (the leader) TLB misses on a particular virtual page entry, other cores (the followers) will also typically TLB miss on the same virtual page eventually. Since the leader would already have found the appropriate translation, we can prevent the followers from missing on this entry by pushing it into the followers’ TLBs.”  Lustig page 8, section 4.3.)
12. Apparatus according to claim 11, 
in which the processing event comprises execution by the first processing device of a program instruction to request the control circuitry to control the transfer.  (Case 2: Suppose instead that core 1 sees a D-TLB and PB miss (step 2a). In response, the page table is walked and the translation is located and refilled into the D-TLB. In step 2b, this translation is also prefetched or pushed into PBs of the other cores, with the aim of eliminating future ICS misses on the other cores.  Lustig page 9.  See also Lustig figure 14.)
13. Apparatus according to claim 2, in which 
the subset of the translation data comprises at least one or more selected from the list consisting of: translation data defining an address translation of a stack pointer address; translation data defining an address translation of a program counter address; translation data defining an address translation of a link register address; and a most-recently used subset of translation data relating to the virtual address space associated with the given processing task.  (Case 2: Suppose instead that core 1 sees a D-TLB and PB miss (step 2a). In response, the page table is walked and the translation is located and refilled into the D-TLB. In step 2b, this translation is also prefetched or pushed into PBs of the other cores, with the aim of eliminating future ICS misses on the other cores.  Lustig page 9.  See also Lustig figure 14.  Note that an entry which is moved directly in response to return from a table walk is pushed when it is the most recently used entry.)
15. Apparatus according to claim 1, comprising 
interconnect circuitry connected to the first and second processing devices, the control circuitry being configured to control the transfer of the translation data via the interconnect circuitry.  (See rejection of claim 1.  See also Lustig figures 14-18.  A person of ordinary skill in the art would understand interconnect/control circuitry to interconnect/control a data transfer between TLB’s.  Note also that the prior art is presumed operable.)
16. Apparatus according to claim 1, comprising 
a hierarchy of translation lookaside buffer storage including the respective translation lookaside buffers associated with each processing device and a higher level translation lookaside buffer further from each processing device than said respective translation lookaside buffers (“Figure 18 presents a CMP with private, per-core L1 TLBs backed by an SLL L2 TLB. While this example uses just one level of per-core private TLBs, further levels may be readily accommodated (for example, each core could maintain two levels of per-core private TLB followed by an L3 SLL TLB). As with last-level caches, the SLL TLB is accessed when there is a miss in any of the L1 TLBs. The SLL TLB strives for inclusion with the L1 TLB, so that entries that are accessed by one core are available to others. Figure 18 shows the SLL TLB residing in a central location, accessible by all the cores. While this centralized approach is a possible implementation, we discuss this and other implementation issues in Section 5.2. SLL TLBs enjoy two orthogonal benefits. First, they exploit inter-core sharing in parallel programs. Specifically, a core’s TLB miss brings an entry into the SLL TLB so that subsequent L2 misses on the same entry from other cores are eliminated.” Lustig page 12, section 5.1, first paragraph.  See also Lustig figure 18.)
17. Apparatus comprising: 
two or more processing means each having an associated translation lookaside buffer means for storing translation data defining address translations between virtual and physical memory addresses, each address translation being associated with a respective virtual address space; means for controlling the transfer of at least a subset of the translation data from the translation lookaside buffer means associated with a first processing means to the translation lookaside buffer means associated with a second, different, processing means; and means for controlling control the transfer of at least the subset of the translation data in response to initiation of execution by the second, different processing means of a processing task in a virtual (See rejection of claim 1.)
18. A method comprising: 
storing translation data defining address translations between virtual and physical memory addresses, each address translation being associated with a respective virtual address space, in respective translation lookaside buffers associated with two or more processing devices; controlling the transfer of at least a subset of the translation data from the translation lookaside buffer associated with a first processor to the translation lookaside buffer associated with a second, different, processor; and controlling the transfer of at least the subset of the translation data in response to initiation of execution by the second, different processor of a processing task in a virtual address space, the virtual address space being associated with a given processing task executed by the first processor prior to the transfer. (See rejection of claim 1.)
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Lustig, Bhowmik, and Landsberg (Analyzing and Optimizing TLB-Induced Thread Migration Costs on Linux/ARM 2018)
14. Apparatus according to claim 2, 
in which the subset of the translation data comprises all translation data held by the translation lookaside buffer of the first processing device relating to the virtual address space associated with the given processing task.  (The previously cited art does not expressly teach migrating all TLB entries associated with a given processing task.  
Landsberg teaches: “As already stated, thread migration comes with a performance penalty due to cold caches, resulting in an increased number of cache misses but cannot be avoided sometimes. TLB migration is supposed to reduce this penalty by copying the relevant TLB entries to the TLB of the thread’s new CPU. The relevance of these entries is determined by characteristics such as validity, associated address space and whether or not it was accessed since the last migration, if available. Overall, a TLB migration consists of the following steps: 1. Extracting all entries from the local TLB because there is no way to tell which ones are relevant in advance (at least on the Cortex-A7). 2. Filtering out all entries which are irrelevant for the current thread, e.g., invalid entries or entries for other processes are dispensable. 3. Storing the remaining entries in memory so that they can be restored in the future. 4. Populating the new core’s TLB by using the information stored in the memory the step before.”  Landsberg page 7.
It would have been obvious to one of ordinary skill in the art before the effective filing date because migrating all the relevant entries to the target processor reduces TLB misses.)


Response to Arguments
Applicant's arguments filed 11/30/2021 have been fully considered but they are not persuasive.
Rejections under § 112a/b:
All rejections from the previous action under these sections are withdrawn in response to claim amendments.
 Rejections under § 103:
Applicant’s point that Lustig teaches pushing the TLB entries to a second core is noted and there may be support for amendments limiting to a method/system in which the TLB entry only leaves the TLB of the first processor in response to initiation by a second processor.  The claim language however reads on “control the transfer of at least the subset of the translation data in response to initiation of execution by the second, different processing device of a processing task in a virtual address space, the virtual address space being associated with a given processing task executed by the first processing device prior to the transfer” (o the second processor, not prior to the entry leaving the TLB of the first processor) because an intermediate buffer is used between the TLB’s and the transfer from the intermediate buffer the rest of the way to the second processor’s TLB occurs in response to a miss associated with a virtual address.  See rejection above.  
  





Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 





Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571)272-8646.  The examiner can normally be reached on Monday - Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on 571 272 4204.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


PAUL M. KNIGHT
Examiner
Art Unit 2139



/PAUL M KNIGHT/Examiner, Art Unit 2139