DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
All information disclosure statements are incompliance with the provisions of 37 C.F.R. § 1.97.  Accordingly, they have been considered. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1-3, 5, 7-13, 16-17, and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Santhanakrishan (US 2007/0094453) and Eisen (US 2006/0161762).
Claim 1.  A method for memory request throttling in a processing system, the method comprising: 
determining, at a latency tracker of a cache of the processing system, an access latency metric representing an average access latency for memory requests for a processor core of the processing system;  (Santhanakrishnan teaches: “A discussion of a dynamic configuration for a prefetcher is proposed. For example, a thread specific latency metric is calculated and provides dynamic feedback to the software on a per thread basis via the configuration and status registers.”  Santhanakrishnan abstract.  “As previously discussed, the setting of the aggressiveness index may depend on the latency monitor metric. For example, one set of registers stores different latency trip points. The prefetcher will change behavior as the observed average latency crosses the trip points.”  Santhanarishnan paragraph 0033.  Note also that “for a processor core” is not a specific structural limitation and does not require any steps to be performed.  See MPEP §§ 2111.04 and 2103.) 
determining, at a prefetch accuracy tracker of the cache, a prefetch accuracy metric representing an accuracy of a prefetcher of a cache associated with the processor core; and (“At the context switch decision block, a time slice analysis is performed. The time slice analysis is based at least in part on implementation specific parameters, some embodiments of which are prefetcher accuracy and load latencies.” Santhanakrishnan paragraph 0046.  Note that prefetching refers to bringing data into a cache.  See also Santhanakrishnan figure 4A.)
setting, at a throttle controller of the cache, a throttle level for a thread executing at the processor core by modifying a maximum prefetch distance of the prefetcher based on at least one of the access latency metric and the prefetch accuracy metric, wherein the throttle level sets a maximum number of pending memory request entries of a buffer that are available to the processor core.  (“For example, a thread specific latency metric is calculated and provides dynamic feedback to the software on a per thread basis via the configuration and status registers.”  Santhanakrishnan Abstract.  “Subsequently, the new thread is scheduled and the prefetcher is parameterized according to the previously discussed latency monitor metric and aggressiveness index that is stored in the configuration and status register (described earlier in connection with FIG. 4). [0046] At the context switch decision block, a time slice analysis is performed. The time slice analysis is based at least in part on implementation specific parameters, some embodiments of which are prefetcher accuracy and load latencies. In addition, system parameters such as utilizations are also provided to the operating system. In typical operating system controlled systems, this information can be used by the OS in order to study the performance of the prefetcher in the particular time slice. This information in association with past behavior gives the OS an ability to predict the effectiveness of the prefetcher in the next time slice. The OS can then either increase the aggressiveness index of the prefetcher during the next time slice in case it deems such or decrease it otherwise.”  Santahnakrishnan paragraph 0045 – 0046.  See also Santhanakrishnan figure 4A showing the prefetch control working with the cache.  “One example of a typical prefetch control block is depicted in FIG. 1. A queue 102 stores a fixed number of cache lines from the cache 106, the fixed number of cache lines based on control from the prefetch control block 104.”  Santhanakrishnan paragraph 0006.
Santhanakrishnan does not teach setting a maximum number of entries in a buffer available to the processor core to modify the prefetch distance.
Eisen teaches: “The resource control applied may be the number of instruction fetches allocated to the thread or the number of execution time slices. Alternatively, or in combination, the size of a prefetch instruction storage allocated to the thread may be limited. The control condition may be comparison of the number of correct or incorrect speculations to a threshold, comparison of the number of correct to incorrect speculations, or a more complex evaluator such as the size of a ratio of incorrect to total speculations.”  Eisen Abstract.  “[0010] The processor includes a control unit that reduces execution resources allocated to a hardware thread in response to determining that speculative execution of instructions is proceeding poorly. The limited resources may include one or more of: instruction fetches or time slices (limiting the amount of processing power allocated to the thread), hardware thread priority and/or reducing the size of the prefetched instruction storage for the thread (thus reducing processing for that thread by throttling the available instruction queue). The condition for determining when to apply reduction of resources to a thread may be comparison of a number of correct or incorrect speculated branches to a threshold, comparison of the number of correct branches to incorrect branches, or computation of more sophisticated evaluators of speculation such as ratio of correct or incorrect speculations to the total number of speculations.” Eisen paragraph 0010.  “L1 Icache 20 provides loading of instruction streams in conjunction with instruction fetch unit IFU 16, which prefetches instructions and may include storage for speculative loading of instruction streams. IFU 16 contains one or more instruction queues (IQ) 28 that can be depth-controlled for individual threads in accordance with an embodiment of the invention.” Eisen paragraph 0022.  “With the throttling mechanism of the present invention, fewer instructions are likely to be dispatched speculatively for poorly predicted branch. Therefore, there are fewer speculative instructions present in the pipeline for the particular thread when the branch condition is resolved.” Eisen paragraph 0024.  See also Eisen figure 2.
(D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; The prior art contained a "base" device (method, or product) upon which the claimed invention can be seen as an "improvement” (the improvement being that more resources are allocated for other threads and/or non-speculative executions).  The prior art contained a known technique that is applicable to the base device (method, or product) (the use of queue to reduce prefetch distance is applicable to the base device). One of ordinary skill in the art would have recognized that applying the known technique would have yielded predictable results and resulted in an improved system. See MPEP § 2143(I)(D).
Claim 2.    The method of claim 1, wherein setting the throttle level for the thread executing at the processor core further comprises: 
modifying an aggressiveness of the prefetcher based on at least one of the access latency metric and the prefetch accuracy metric.  (“The parameterized prefetcher allows for different amounts of prefetching based on an index value. For example, in one embodiment, a two bit aggressiveness index defines the amount of prefetching, such as, the number of cache lines to prefetch. The two bit aggressiveness index ranges from a binary value of zero that indicates no prefetching to a binary value of three that indicates maximum prefetching.”  Santhanakrishnan paragraph 0031.  “As previously discussed, the setting of the aggressiveness index may depend on the latency monitor metric. For example, one set of registers stores different latency trip points. The prefetcher will change behavior as the observed average latency crosses the trip points.”  Santhanakrishnan paragraph 0033.  See also rejection of claim 1 citing paragraphs 0045 and 0046 of Santhanakrishnan.)
Claim 3.  The method of claim 2, wherein 
the maximum number of pending memory requests is a maximum number of missed information buffer (MIB) entries available for use by the processor core.  (See rejection of claim 1.  Denoting the buffer a “missed information buffer (MIB)” does not require any specific structural limitation (e.g. it does not require the material of paragraph 0015 and 0029 to be read into the claims).  See MPEP §§ 2103 and 2111.04.)  
Claim 5.    The method of claim 2, wherein modifying the aggressiveness of the prefetcher further comprises: 
selectively disabling the prefetcher.  (“The two bit aggressiveness index ranges from a binary value of zero that indicates no prefetching to a binary value of three that indicates maximum prefetching.”  Santhanakrishnan paragraph 0031)
Claim 7.    The method of claim 1, wherein: 
determining the prefetch accuracy metric comprises determining the prefetch accuracy metric for a specified thread executing at the processor core; (“FIG. 5 is a method for a flowchart that represents a software's perspective as utilized by one embodiment of the claimed subject matter. [0045] The depicted flowchart illustrates how a thread is scheduled for processing with the ability to parameterize the prefetcher and perform a time slice analysis. As the new thread is to be processed for scheduling, it enters a wait state. Subsequently, the new thread is scheduled and the prefetcher is parameterized according to the previously discussed latency monitor metric and aggressiveness index that is stored in the configuration and status register (described earlier in connection with FIG. 4). [0046] At the context switch decision block, a time slice analysis is performed. The time slice analysis is based at least in part on implementation specific parameters, some embodiments of which are prefetcher accuracy and load latencies. In addition, system parameters such as utilizations are also provided to the operating system. In typical operating system controlled systems, this information can be used by the OS in order to study the performance of the prefetcher in the particular time slice. This information in association with past behavior gives the OS an ability to predict the effectiveness of the prefetcher in the next time slice. The OS can then either increase the aggressiveness index of the prefetcher during the next time slice in case it deems such or decrease it otherwise.”  Santahnakrishnan paragraph 0044 – 0046.  See also Santhanakrishnan figure 4A showing the prefetch control working with the cache.) and setting the throttle level for the thread executing at the processor core comprises throttling a rate of memory access requests issuable by the processor core for the specified thread.  ([0046] At the context switch decision block, a time slice analysis is performed. The time slice analysis is based at least in part on implementation specific parameters, some embodiments of which are prefetcher accuracy and load latencies. In addition, system parameters such as utilizations are also provided to the operating system. In typical operating system controlled systems, this information can be used by the OS in order to study the performance of the prefetcher in the particular time slice. This information in association with past behavior gives the OS an ability to predict the effectiveness of the prefetcher in the next time slice. The OS can then either increase the aggressiveness index of the prefetcher during the next time slice in case it deems such or decrease it otherwise.”  Santahnakrishnan paragraph 0044 – 0046.)
Claim 8.    The method of claim 1, wherein determining the access latency metric comprises: 
sampling a plurality of memory requests issued to a local memory associated with the processor core to generate a sample set of memory requests; measuring, for each memory request of the sample set, a corresponding access latency for fulfilling the memory request; and determining the access latency metric based on an averaging of the access latencies measured for the sample set of memory requests.  (“For example, one set of registers stores different latency trip points. The prefetcher will change behavior as the observed average latency crosses the trip points.”  Santhanakrishnan paragraph 0033.  “FIG. 4B is one embodiment of a method for calculating the thread specific metric. The latency monitor analyzes latency, (such as, non-prefetcher load), in the system on a per thread basis and provides feedback to the dynamically adjusted prefetcher. For example, in one embodiment, the latency monitor samples a finite number (N) of a predetermined transaction type (in one embodiment, demand-load transactions), depicted in an execution block 410. For each demand-load transaction, the number of cycles between transaction dispatch and completion is recorded and added to a thread specific accumulator, depicted in an execution block 412.”  Santhanakrishnan paragraph 0038.  “[0040] Subsequently, once all N loads have been sampled, the value of the accumulator is divided by N, depicted in an execution block 414. [0041] Thus, the resulting value represents average load latency in the system and this metric could be used to select the number of cache lines to be prefetched.” Santhanakrishnan paragraphs 0040 – 0041.)
Claim 9.    The method of claim 1, wherein setting the throttle level for the thread executing at the processor core comprises: 
accessing a data structure representing a plurality of throttle levels, each throttle level representing a corresponding modification to at least one of: a maximum number of pending memory transactions available and a level of prefetcher aggressiveness, (“The parameterized prefetcher allows for different amounts of prefetching based on an index value. For example, in one embodiment, a two bit aggressiveness index defines the amount of prefetching, such as, the number of cache lines to prefetch. The two bit aggressiveness index ranges from a binary value of zero that indicates no prefetching to a binary value of three that indicates maximum prefetching.”  Santhanakrishnan paragraph 0031.  Note also that when more lines are brought into a cache, more “pending memory transactions” are “available”.) and each throttle level being associated with at least one of a corresponding latency threshold and a corresponding prefetch accuracy threshold; (“In one embodiment, the amount of cache lines that are prefetched also depends on the latency monitor metric (calculation of the metric is discussed in connection with FIG. 4B) that is analyzed on a per thread basis”  Santhanakrishnan paragraph 0032.) and selecting a throttle level to implement for the (“The configuration and status registers provide information about the system. One such piece of information will be the average latency as observed by the latency monitor. In one embodiment, the average latency is set to the exact value of the latency monitor metric. In contrast, for another embodiment, the average latency could be a latency index to represent a range of latency values.  The prefetcher can also provide information about how well it is doing, such as, an efficiency index (, e.g. a derivative based on the number of times a prefetched line is actually used).” Santhanakrishnan paragraph 0037.  Note the number of times a prefetched line is used is a measure of accuracy of the prefetcher.)
Claim 10.    A processing system, comprising: 
an interconnect fabric coupleable to a local memory; and at least one compute cluster coupled to the interconnect fabric, the compute cluster comprising: a processor core; and a cache hierarchy comprising: a plurality of caches; (See Santhanakrishnan figure 2.) a throttle controller configured to set a throttle level for a thread executing at the processor core based on at least one of an access latency metric and a prefetch accuracy metric wherein the throttle level sets a maximum number of pending memory request entries of a buffer that are available to the processor core; (See rejection of claim 1.) wherein the access latency metric represents an average access latency for memory requests for the processor core; (See rejection of claim 1.) and wherein the prefetch accuracy metric represents an accuracy of a prefetcher of a cache of the cache hierarchy.  (See rejection of claim 1.)
Claim 11.  The processing system of claim 10, wherein the throttle controller is further configured to set the throttle level for the thread executing at the processor core by: 
modifying an aggressiveness of the prefetcher based on at least one of the access latency metric and the prefetch accuracy metric. (See rejection of claim 2.)
Claim 12.  The processing system of claim 11, wherein 
the maximum number of pending memory requests is a maximum number of missed information buffer (MIB) entries available for use by the processor core.  (See rejection of claim 3.)
Claim 13.    The processing system of claim 11, wherein the throttle controller is configured to modify the aggressiveness of the prefetcher by at least one of: 
modifying a maximum prefetch distance of the prefetcher; modifying a minimum prefetch confidence of the prefetcher; and selectively disabling the prefetcher. (See rejection of claim 5.)
Claim 16.    The processing system of claim 10, wherein the cache hierarchy further comprises: 
a prefetch accuracy tracker configured to determine the prefetch accuracy metric for a specified thread executing at the processor core; (See rejection of claim 7.) and wherein the throttle controller is configured to set the throttle level for the specified thread. (See rejection of claim 7.)
Claim 17.    The processing system of claim 10, wherein the cache hierarchy further comprises: a latency tracker to determine the access latency metric by: 
sampling a plurality of memory requests issued to a local memory associated with the processor core to generate a sample set of memory requests; measuring, for each memory request of the sample set, a corresponding access latency for fulfilling the memory request; and determining the access latency metric based on an averaging of the access latencies measured for the sample set of memory requests. (See rejection of claim 8.)
21. (New) The method of claim 1, wherein 
the throttle level modifies the maximum number of pending memory request entries of a buffer that are available to a processor core (See rejection of claim 1.) 
.   
Claim 22. (New) The processing system of claim 10, wherein 
the throttle level modifies the maximum number of pending memory requests available for the processor core.  (See rejection of claim 21.)
Claims 4, 6, 14, 15, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Santhanakrishan, Eisen, and Rafacz (US 2014/0108740)
Claim 4.  The method of claim 2, wherein modifying the aggressiveness of the prefetcher further comprises  
modifying a minimum prefetch confidence of the prefetcher.  (“For example, in one embodiment, a two bit aggressiveness index defines the amount of prefetching, such as, the number of cache lines to prefetch.”  Santhanakrishnan paragraph 0031.  “For example, in one embodiment, the latency monitor samples a finite number (N) of a predetermined transaction type (in one embodiment, demand-load transactions), depicted in an execution block 410. For each demand-load transaction, the number of cycles between transaction dispatch and completion is recorded and added to a thread specific accumulator, depicted in an execution block 412.”  Santhanakrishnan paragraph 0038. 
Santhanakrishnan does not expressly teach modifying the prefetcher distance or minimum confidence.
Rafacz teaches: “In some embodiments the prefetch accuracy is estimated by the prefetcher 107 based on other information such as confidence information stored at the prefetcher 107.”  Rafacz paragraph 0022. “In some embodiments, the prefetch throttle 105 throttles prefetching by changing other prefetch parameters, such as confidence thresholds of the prefetcher 107. Thus, for example, the prefetcher 107 can determine whether to issue a memory access based on a confidence level that an access pattern has been detected. The prefetch throttle 105 can throttle prefetching by increasing the confidence threshold that triggers issuance of a memory access by the prefetcher 107, thereby reducing the number of memory accesses issued by the prefetcher 107.”  Rafacz paragraph 0028.
The combination including of Rafacz would have been obvious to one of ordinary skill in the art before the effective filing date because modifying prefetching based on confidence can reduce the number of memory accesses.)
Claim 6.    The method of claim 1, wherein setting the throttle level for the thread executing at the processor core constrains use of bandwidth between a compute complex of the processing system and local memory based on a maximum bandwidth utilization metric 
. (Note that the natural outcome of throttling communication between the processor and memory is a reduction in bandwidth.  However, for completeness, a secondary reference will be brought in to teach this aspect.  The combination of previously cited art does not expressly state that bandwidth is limited when prefetching is throttled.  
Rafacz teaches: “A processing system monitors memory bandwidth available to transfer data from memory to a cache. In addition, the processing system monitors a prefetching accuracy for prefetched data. If the amount of available memory bandwidth is low and the prefetching accuracy is also low, prefetching can be throttled by reducing the amount of data prefetched. The prefetching can be throttled by changing the frequency of prefetching, prefetching depth, prefetching confidence levels, and the like.”  Rafacz Abstract.  “FIGS. 1-4 illustrate techniques to improve processing efficiency by throttling the prefetching of data to a cache based both on available memory bandwidth and on prefetching accuracy. In some embodiments, as prefetching operations impact the available bandwidth of a memory, a processing system monitors the available memory bandwidth and a prefetching accuracy of a prefetcher and throttles the prefetcher accordingly. The processing system determines the prefetch accuracy by determining, relative to the total amount of (prefetched data that is stored in the cache, how much prefetched data is retrieved from the cache. As such, a relatively inaccurate prefetcher may be throttled while memory bandwidth is at a premium, thus freeing memory bandwidth for higher-priority accesses, while also being permitted to prefetch at a greater frequency when there is relatively abundant available memory bandwidth as the impact of inaccurate prefetching is lower at such times.”  Rafacz paragraph 0009.  “To illustrate, if the memory bandwidth of the processing system is 10 GB/s, and data is currently being transferred to and from the memory at 4 GB per second, there is 6 GB/s of available bandwidth. That is, the processing system has the capacity to transfer an addition 6 GB/s to/from memory. Memory bandwidth is consumed both by memory access requests generated by executing programs and by prefetching data from the memory based on the generated memory access requests. Accordingly, by throttling prefetching when available memory bandwidth and prefetching accuracy are both low, available memory bandwidth can be more usefully made available to an executing program, thereby enhancing processing system efficiency.”  Rafacz paragraph 0011.  See also Rafacz paragraph 0024 including the table in that paragraph.
It would have been obvious to one of ordinary skill in the art to combine the teaching of Rafacz before the effective filing date because throttling prefetching when bandwidth is low saves the available bandwidth for other memory accesses (e.g. read misses).)
Claim 14.  The processing system of claim 11, wherein the throttle controller is further configured to modify the aggressiveness of the prefetcher by: 
modifying a minimum prefetch confidence of the prefetcher.  (“For example, in one embodiment, a two bit aggressiveness index defines the amount of prefetching, such as, the number of cache lines to prefetch.”  Santhanakrishnan paragraph 0031.  “For example, in one embodiment, the latency monitor samples a finite number (N) of a predetermined transaction type (in one embodiment, demand-load transactions), depicted in an execution block 410. For each demand-load transaction, the number of cycles between transaction dispatch and completion is recorded and added to a thread specific accumulator, depicted in an execution block 412.”  Santhanakrishnan paragraph 0038. 
Santhanakrishnan does not expressly teach modifying the prefetcher confidence.
Rafacz teaches: “In some embodiments the prefetch accuracy is estimated by the prefetcher 107 based on other information such as confidence information stored at the prefetcher 107.”  Rafacz paragraph 0022. “In some embodiments, the prefetch throttle 105 throttles prefetching by changing other prefetch parameters, such as confidence thresholds of the prefetcher 107. Thus, for example, the prefetcher 107 can determine whether to issue a memory access based on a confidence level that an access pattern has been detected. The prefetch throttle 105 can throttle prefetching by increasing the confidence threshold that triggers issuance of a memory access by the prefetcher 107, thereby reducing the number of memory accesses issued by the prefetcher 107.”  Rafacz paragraph 0028.
The combination including of Rafacz would have been obvious to one of ordinary skill in the art before the effective filing date because modifying prefetching based on confidence can reduce the number of memory accesses.)
Claim 15.    The processing system of claim 10, wherein the throttle controller is further configured to set the throttle level for the thread executing at the processor core by: 
constraining use of bandwidth between the computer cluster and a local memory associated with the processing system based on a maximum bandwidth utilization metric.  (See rejection of claim 6.)
Claim 18.    A method for throttling memory bandwidth utilization in a processing system, the method comprising: 
executing a software application at a processor core of a compute cluster of the processing system, the software application including at least one instruction to configure the processor core (“Traditionally, prefetching solutions have either been implemented in hardware or software. For example, hardware prefetching solutions typically scan for patterns and inserts prefetch transactions in the system (using utilization-based throttling mechanisms). In contrast, software explicitly generates prefetches or provides hints to the hardware instructions or hints inserted into the application.  However, both approaches have severe limitations. Hardware penalizes the system even if the utilization of the system is high due to useful prefetches, in contrast, software prefetching, adversely impacts application portability and has undesirable ISA (Instruction Set Architecture) effects. Furthermore, as processors evolve into multi core configurations that support multi-threading, simultaneous execution of heterogeneous workloads for a multi-threaded computer system exacerbates the problem.”  Santhanakrishnan paragraph 0005.  “this proposal allows for a thread aware hardware prefetcher that could be dynamically configured by software. The proposed prefetcher utilizes a parameterized prefetcher, a thread-wise latency monitor, and configuration and status registers. This proposal supports one or all of the different types of prefetching behaviors, such as, throttling prefetching when system resource utilization is high, task-specific prefetching profiles, and software-managed prefetcher adaptation that allows a single thread to have different prefetching profiles in different parts of its code. Furthermore, the hardware prefetcher provides dynamic feedback to the software on a per thread basis, via the configuration and status registers. Thus, the software can optionally use the information from the registers to dynamically configure the prefetching behavior and allows the software to be able to both query the performance and configure the prefetcher.”  Santhanankrishnan paragraph 0019.) to set a target memory utilization bandwidth constraint for a thread of the software application; monitoring an actual memory utilization bandwidth of the thread at the compute cluster; incrementally modifying a throttling level set for the thread until the monitored actual memory utilization bandwidth meets the target memory utilization bandwidth; (Santhanankrishnan does not expressly teach monitoring memory bandwidth.  
Rafacz teaches: “A processing system monitors memory bandwidth available to transfer data from memory to a cache. In addition, the processing system monitors a prefetching accuracy for prefetched data. If the amount of available memory bandwidth is low and the prefetching accuracy is also low, prefetching can be throttled by reducing the amount of data prefetched. The prefetching can be throttled by changing the frequency of prefetching, prefetching depth, prefetching confidence levels, and the like.”  Rafacz Abstract.  “FIGS. 1-4 illustrate techniques to improve processing efficiency by throttling the prefetching of data to a cache based both on available memory bandwidth and on prefetching accuracy. In some embodiments, as prefetching operations impact the available bandwidth of a memory, a processing system monitors the available memory bandwidth and a prefetching accuracy of a prefetcher and throttles the prefetcher accordingly. The processing system determines the prefetch accuracy by determining, relative to the total amount of (prefetched data that is stored in the cache, how much prefetched data is retrieved from the cache. As such, a relatively inaccurate prefetcher may be throttled while memory bandwidth is at a premium, thus freeing memory bandwidth for higher-priority accesses, while also being permitted to prefetch at a greater frequency when there is relatively abundant available memory bandwidth as the impact of inaccurate prefetching is lower at such times.”  Rafacz paragraph 0009.  “To illustrate, if the memory bandwidth of the processing system is 10 GB/s, and data is currently being transferred to and from the memory at 4 GB per second, there is 6 GB/s of available bandwidth. That is, the processing system has the capacity to transfer an addition 6 GB/s to/from memory. Memory bandwidth is consumed both by memory access requests generated by executing programs and by prefetching data from the memory based on the generated memory access requests. Accordingly, by throttling prefetching when available memory bandwidth and prefetching accuracy are both low, available memory bandwidth can be more usefully made available to an executing program, thereby enhancing processing system efficiency.”  Rafacz paragraph 0011.  See also Rafacz paragraph 0024 including the table in that paragraph.
It would have been obvious to one of ordinary skill in the art to combine the teaching of Rafacz before the effective filing date because throttling prefetching when bandwidth is low saves the available bandwidth for other memory accesses (e.g. read misses).) and wherein the throttling level configures at least one of: an aggressiveness of a prefetcher of a cache associated with the thread; and a maximum number of pending transactions available for the thread. (“The parameterized prefetcher allows for different amounts of prefetching based on an index value. For example, in one embodiment, a two bit aggressiveness index defines the amount of prefetching, such as, the number of cache lines to prefetch. The two bit aggressiveness index ranges from a binary value of zero that indicates no prefetching to a binary value of three that indicates maximum prefetching. In this embodiment, the binary value of three for the index indicates prefetching up to ten cache lines, the binary value of two indicates prefetching up to eight cache lines, and the binary value of one indicates prefetching up to six cache lines.” Santhanakrishnan paragraph 0031.)
Claim 19.    The method of claim 18, wherein the throttling level configures the aggressiveness of the prefetcher by at least one of: 
modifying a prefetch distance of the prefetcher; and modifying a minimum prefetch confidence of the prefetcher.  (See rejection of claims 1 and 4 teaching both limitations.  It is noted that the limitations are in the alternative, but art teaching both is references in the interest of compact prosecution.)
Claim 20.    The method of claim 18, wherein the throttling level configures the aggressiveness of the prefetcher by: 
selectively disabling the prefetcher.  (See rejection of claim 5.)


Response to Arguments
Applicant's arguments filed 02/01/202 have been fully considered but they are not persuasive.
Rejections under §112:
All rejections under this section from the previous action are withdrawn.
Rejections under § 103:
Applicant points out that the previously cited art fails to teach the amended claims.  The search turned up a new reference rendering the amended claim obvious (in combination with the other references).  Note that the details of the MIB cache in paragraph 0029 are not expressly claimed.  This is not to be taken as an indication of allowable subject matter, but material from that paragraph does not appear to be in the cited references (e.g. the MIB cache being a buffer storing missed requests at the L2 cache). 




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Title
Document I.D.
Reason Included
Method and apparatus for sharing instruction scheduling resources among a plurality of execution threads in a multi-threaded processor architecture
US 9606800 B1
"In FIG. 3C, schedule queues 220, in which each entry can be used for any thread, have limits on how many entries can be used by a single thread. As the legend indicates, a single thread may be able to use all of the entries of the schedule queue except for a single entry. The upper threshold on how many entries a given thread can use may be equal to the total number of entries in the schedule queue minus one, as shown here, or may be lower. Although shown in FIG. 3C as being the same, each of the schedule queues 220 may have a different size and may have different thresholds set for how many entries can be used by a single thread." paragraph 28. "(36)    In FIG. 5B, a limit is enforced on the schedule queue so that a single thread does not fill up the entire schedule queue. Therefore, at 420, if there is an open location in the assigned schedule queue, control transfers to 450. At 450, control determines whether thread usage of the assigned schedule queue is at an upper limit. If so, control returns to 420; otherwise, control continues at 424." Paragraph 0036
MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER
US 20130297910 A1
"For the shared storage resources 110 and 120, statically allocating an equal portion, or number of queue entries, to each thread may provide good performance, in part by avoiding starvation. The enforced fairness provided by this partitioning may also reduce the amount of complex circuitry used in sophisticated fetch policies, routing logic, or other." paragraph 0024.  



Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571)272-8646.  The examiner can normally be reached on Monday - Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on 571 272 4204.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


PAUL M. KNIGHT
Examiner
Art Unit 2139



/PAUL M KNIGHT/Examiner, Art Unit 2139