DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments and Arguments
The present Office Action is in response to Applicant’s Response after Final Action of September 13, 2021, hereinafter “Reply”, and Request for Continued Examination (RCE) of October 5, 2021.  Claims 1, 6-8, 11, 15-18, and 20 have been amended.  No claims have been cancelled or added.  Claims 1-20 remain pending in the application.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submissions filed on October 5, 2021 have been entered.
The Reply has been fully considered, with the Examiner’s response set forth below.
(1)	In view of the amendment to the claims, Applicant’s arguments with respect to independent claims 1, 11, and 20 and dependent claims thereof have been considered but are moot because the new ground of rejection does not rely on any 
(2) 	Another iteration of claim analysis has been made due to the amendment to the claims in the Reply. Refer to the corresponding sections of the claim analysis below for details. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


Claims 1-2, 4, 6, 8-9, 11-12, 15, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fedorova et al. (US 2011/0246995 A1), hereinafter “Fedorova”, in view of Zhu et al. (US 2013/0185433 A1), hereinafter “Zhu”.

	Regarding claim 1, Fedorova teaches:
A computer-implemented method for executing workloads on processors, the method comprising: 
computing a forecasted amount of processor use for each workload included in a first plurality of workloads using a trained machine-learning model (FIG. 3; “[0035] … the instructions per cycle (IPC, a standard performance metric for a thread scheduler) for a thread”; “[0044] Embodiments of the present invention involve a rule-based predictive forecasting system that uses the thread metrics and/or resource-usage characterizations of multiple threads to predict [computing] changes in IPC [forecasted amount of processor use] for possible thread-scheduling decisions. This forecasting system can use measured metrics to predict [computing] which threads will cause relatively more “stress” (e.g., contention) for the cache/memory hierarchy and how each particular thread [workload] will respond to different co-runners. The forecasting system can then use this information to group threads together across fuzzy rulebase can be tuned during operation of the forecasting system based on predicted and observed performance. … parameters in the forecasting system can be updated after every data point (e.g., for every set of measured metrics) to configure and continuously adjust the statistical model of the system based on changes in behavior.”; “[0052] Note also that a wide range of other statistical techniques and models may be used to track data points and to predict changes in the IPC of threads. For instance, the forecasting system may also make use of neural network techniques.”; “[0054] One example involves training and using a forecasting system that includes a fuzzy rulebase [trained machine-learning model] to make scheduling decisions for a four-core processor that can simultaneously execute four threads [workloads]”; a rule-based predictive forecasting system that uses the thread metrics and/or resource-usage characterizations of multiple threads to predict [computing] changes in IPC (instructions per cycle) [forecasted amount of processor use] for possible thread-scheduling decisions, and the forecasting system can use measured metrics to predict [computing] which threads will cause relatively more “stress” (e.g., contention) for the cache/memory hierarchy and how each particular thread [workload] will respond to different co-runners);  
based on the forecasted amounts of processor use, computing a numeric performance cost estimate associated with an estimated level of cache interference arising from executing the first plurality of workloads on a first plurality of processors (FIGs. 2-3; “[0006] … depending on their cache access characteristics, two threads that share a cache may interfere with each others' cache data and cause pipeline resource contention that can reduce the performance of both threads. Also, a “cache-intensive” thread with a high cache miss rate may negatively affect a second “cache-sensitive” thread that re-uses cache data and dramatically suffers in performance when this cache data is displaced by other threads”; “[0034] … Scheduling efforts may need to account for threads that have different levels of cache access “intensity” and “sensitivity.” Cache intensity relates to frequency of cache accesses; “cache-intensive” threads tend to perform more cache accesses (e.g., regularly access cache data) than non-cache-intensive threads. Cache sensitivity relates to the cache re-use behavior of a thread. The cache accesses of threads that are “cache-sensitive” tend to heavily re-use cached data, and thus the thread will suffer in performance if the thread's cached data has been displaced by other threads”); and 
determining at least one processor assignment based on the numeric performance cost estimate (FIGs. 2-3; “[0039] Embodiments of the present invention involve techniques for improving the way in which threads are scheduled in a CMT processor. An automated system learns performance degradation in the measured IPC values of two threads when they are scheduled on processor cores that share a common cache.”),  
wherein at least one processor included in the first plurality of processors is subsequently configured to execute at least a portion of a first workload included in the first plurality of workloads based on the at least one processor assignment (FIGs. 2-3; “[0069] … the described techniques can be applied to systems in which each processor core can simultaneously execute more than two threads”; “[0070] … An automated rule-based forecasting system learns how to predict the performance degradation in the measured instruction throughput of two threads when they are scheduled on processor cores that share a common resource. The forecasting system then uses such cache-aware predictions to schedule threads into beneficial groupings, thereby improving the overall instruction throughput for the computing environment”).  

	Fedorova teaches a performance cost estimate.  Nevertheless, Fedorova does not teach a numeric performance cost estimate.

	However, Zhu teaches:
	a numeric performance cost estimate (FIGs. 1, 12; “[0032] … the WPPI system 102 uses the performance interference model 130 to automatically determine the workload types (118, 120, 112, 154) that may be optimally executed using the same degradation [numeric performance cost estimate] in performance that may be represented in terms of a percentage [numeric] of performance interference 164 that results from executing multiple workloads together) for multiple workloads (118, 120, 112, 154) analyzed for execution together using the hardware infrastructure resources (144, 146, 148, 150, 160)”; [0034]; “[0086] FIG. 12 shows performance interference model validation 1200 used by the WPPI system 102 to calculate degradation 1202 [numeric performance cost estimate] of consolidated workloads 1204. The WPPI system 102 may execute a recognized workload (e.g., SPECWeb2005) as a background noise workload while forecasting the performance degradation (e.g., dilation factor) [numeric performance cost estimate] after consolidating each application from the test suite of recognized workloads”).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fedorova to incorporate the teachings of Zhu to provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores of Fedorova, with a system for estimating and managing resource consumption by generating, using a performance interference model, a workload dilation factor that forecasts performance degradation of a workload type of Zhu.  Doing so with the system of Fedorova would improve a workload performance.  (Zhu, [0001])



Further regarding claim 20, Fedorova further teaches:
one or more memories storing instructions (FIG. 5; “[0081] … memory 504”); and 
one or more processors that are coupled to the one or more memories and, when executing the instructions (FIG. 5; “[0083] … some or all of the operations of measurement mechanism 512, prediction mechanism 514, and/or scheduling mechanism 516 may be performed using general-purpose circuits in processor 502 that are configured using processor instructions”).

Regarding claim 2, the combination of Fedorova teaches the computer-implemented method of claim 1.
	
Fedorova further teaches:
wherein the first workload comprises a container, an execution thread, or a task (FIGs. 2-3; “[0069] … the described techniques can be applied to systems in which each processor core can simultaneously execute more than two threads”).  

	Regarding claim 4, the combination of Fedorova teaches the computer-implemented method of claim 1.

Fedorova further teaches:
wherein computing the forecasted amount of processor use for the first workload comprises: 
computing at least one time-series feature based on a measured amount of processor use associated with executing the first workload on the first plurality of processors (“[0065] … The characteristics of a thread may change over time due to changes in workload or other factors (e.g., changes in applications, user patterns, and/or time-of-day behavior [time-series feature]), but such changes are simply measured and then input into the forecasting system as additional data points”); 
computing at least one contextual feature based on metadata associated with the first workload (“[0065] … The characteristics [metadata] of a thread may change over time due to changes in workload or other factors (e.g., changes in applications [contextual feature], user patterns, and/or time-of-day behavior), but such changes are simply measured and then input into the forecasting system as additional data points”); and 
inputting the at least one time-series feature and the at least one contextual feature into the trained machine-learning model (“[0054] One example involves training and using a forecasting system that includes a fuzzy rulebase [trained machine-learning model] to make scheduling decisions for a four-core processor that can simultaneously execute four threads”; “[0065] … The characteristics [metadata] of a thread may change over contextual feature], user patterns, and/or time-of-day behavior [time-series feature]), but such changes are simply measured and then input into the forecasting system as additional data points”).  

Regarding claim 11, the claimed media comprises substantially the same steps or elements as those in claims 1 and 4.  Accordingly, the claim is also rejected for the same reasons as set forth for those in claims 1 and 4 above.

Regarding claim 6, the combination of Fedorova teaches the computer-implemented method of claim 1.

	Fedorova further teaches:
wherein the first plurality of processors are partitioned into at least a first subset of processors that share a first lowest-level cache (LLC) and a second subset of processors that share a second LLC, and wherein computing the numeric performance cost estimate comprises estimating a cache interference cost based on an imbalance in predicted pressures between the first LLC and the second LLC (FIG. 2; [0006]; “[0046] … two threads (thread 1 200 and thread 2 202) execute on processor core 104 [first subset of processors] and share an L1 cache 110 [first LLC], while two other threads (thread 3 204 and thread 4 206) execute on processor core 106 [first subset of processors], and share a second different L1 cache 110 [second LLC]. When considering a scheduling change that swaps thread 2 202 and .  

Regarding claim 15, the claimed media comprises substantially the same steps or elements as those in claim 6.  Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 6 above.

Regarding claim 8, the combination of Fedorova teaches the computer-implemented method of claim 1.

	Fedorova further teaches:
wherein computing the numeric performance cost estimate comprises estimating a cache sharing cost resulting from sharing at least one of a level one cache memory and a level two cache memory between a first execution thread associated with the first workload and a second execution thread associated with a second workload included in the first plurality of workloads (FIGs. 1-3; “[0063] … if the cached data of threads 1 and 2 is frequently displaced, their instruction throughput could drop dramatically. Thus, an alternate heuristic might group threads 1 and 2 together to share a cache [level one cache memory], with threads 3 and 4 sharing the second cache. However, such a grouping might also lead to problems if the combined working sets of threads 1 and 2 interfere with each other (e.g., displace each other from their shared cache)”).  

the computer-implemented method of claim 1.

	Fedorova further teaches:
for each workload included in a second plurality of workloads, acquiring a set of attributes associated with the workload and a measured amount of processor use associated with executing at least a portion of the workload on at least one processor included in a second plurality of processors (“[0044] Embodiments of the present invention involve a rule-based predictive forecasting system that uses the thread metrics and/or resource-usage characterizations of multiple threads to predict changes in IPC for possible thread-scheduling decisions”; “[0065] … The characteristics of a thread may change over time due to changes in workload or other factors (e.g., changes in applications, user patterns, and/or time-of-day behavior [attributes]), but such changes are simply measured and then input into the forecasting system as additional data points”); and 
executing one or more machine-learning algorithms to generate the trained machine-learning model based on the measured amounts of processor use and the sets of attributes (“[0054] One example involves training and using a forecasting system that includes a fuzzy rulebase [trained machine-learning model] to make scheduling decisions for a four-core processor that can simultaneously execute four threads”; “[0065] … attributes]), but such changes are simply measured and then input into the forecasting system as additional data points”).  

Regarding claim 12, the combination of Fedorova teaches the one or more non-transitory computer readable media of claim 11.

	Fedorova further teaches:
computing the at least one feature associated with the first workload based on a measured amount of processor use associated with executing the first workload on the plurality of processors (“[0065] … The characteristics of a thread may change over time due to changes in workload or other factors (e.g., changes in applications, user patterns, and/or time-of-day behavior [the at least one feature]), but such changes are simply measured and then input into the forecasting system as additional data points”).  

Regarding claim 18, the combination of Fedorova teaches the one or more non-transitory computer readable media of claim 11.

	Fedorova further teaches:
	wherein computing the numeric performance cost estimate comprises estimating a performance cost resulting from executing a first thread associated with the first workload on a first processor included in the plurality of processors and subsequently executing a second thread associated with a second workload included in the plurality of workloads on the first processor (FIG. 2; “[0046] FIG. 2 illustrates an exemplary potential scheduling change for the exemplary computing device 100 of FIG. 1. In FIG. 2, two threads (thread 1 200 [first thread] and thread 2 202 [second thread]) execute on processor core 104 [first processor] and share an L1 cache 110”; “[0069] … In an architecture where more than two cores are present, the described techniques may additionally involve choosing two cores between which possible migrations should be evaluated (e.g., by choosing cores randomly, or by sequentially choosing all cores and considering only their neighboring cores for possible migrations)”).

Regarding claim 19, the combination of Fedorova teaches the one or more non-transitory computer readable media of claim 11.

	Fedorova further teaches:
wherein the forecasted amounts of processor use comprise predicted amounts of processor use associated with a statistical level of confidence (FIG. 2; “[0044] Embodiments of the present invention involve a rule-based predictive forecasting system that uses the thread metrics and/or resource-usage characterizations of multiple threads to predict changes in IPC for possible thread-forecasting system based on predicted and observed performance. For instance, the system may first use the fuzzy rulebase to make a set of predictions, and then, after making a thread-scheduling change: 1) compare the observed performance changes with the predicted performance changes, and 2) update the parameters of the fuzzy rulebase to more accurately reflect the observed performance changes. For example, a parameter may be updated using a formula such as: P new =P old +α·E·W, where the learning rate α, amount of error E, and weight W for the error are used to determine how much to adjust the value of the parameter (P) in response to an observed value that differs from the predicted value. Hence, parameters in the forecasting system can be updated after every data point (e.g., for every set of measured metrics) to configure and continuously adjust the statistical model of the system based on changes in behavior.”).

	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Fedorova et al. (US 2011/0246995 A1), hereinafter “Fedorova”, in view of Zhu et al. (US 2013/0185433 A1), hereinafter “Zhu”, as applied to claim 1 above, and further in view of McKenney et al. (US 6,615,316 B1), hereinafter “McKenney”.

	Regarding claim 3, the combination of Fedorova teaches the computer-implemented method of claim 1.

	The combination of Fedorova does not teach wherein the first plurality of processors are included in a non-uniform memory access multiprocessor instance.

However, McKenney teaches:
wherein the first plurality of processors are included in a non-uniform memory access multiprocessor instance (Col. 1, ln. 56-60,“Although all of the memory modules are globally accessible, a processor can access local memory on its node faster than remote memory on other nodes. Because the memory access time differs based on memory location, such systems are also called non-uniform memory access (NUMA) machines”).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fedorova to incorporate the teachings of McKenney to provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores of Fedorova, with a computer system of McKenney that may be useful in a NUMA machine using a method of estimating cache warmth used for process scheduling, relocating cache, memory management, movement and/or relocation.  Doing so with the system of Fedorova would provide an improved method of estimating cache warmth, and manipulating the state of the system based upon the lifetime of the cache line. Accordingly, an efficient yet accurate mathematical model is desirable for incorporating hardware counter 

Claims 5, 10, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Fedorova et al. (US 2011/0246995 A1), hereinafter “Fedorova”, in view of Zhu et al. (US 2013/0185433 A1), hereinafter “Zhu”, as applied to claims 1 and 11 above, and further in view of Chandra et al. (US 2019/0266015 A1), hereinafter “Chandra”.

	Regarding claim 5, the combination of Fedorova teaches the computer-implemented method of claim 1.

	The combination of Fedorova does not teach wherein determining the at least one processor assignment comprises executing one or more integer programming operations based on a first binary assignment matrix to generate a second binary assignment matrix that specifies the at least one processor assignment.

However, Chandra teaches:
wherein determining the at least one processor assignment comprises executing one or more integer programming operations based on a first binary assignment matrix to generate a second binary assignment matrix that specifies the at least one processor assignment (“[0056] For the task of DNN workload to core matching, Ci,j is defined as the output of the profiler 110, which given current core first binary assignment matrix], where ci,j is the cost of DNN workload i to run on core j. [0058] {Xij} is the resulting binary matrix, where xi,j=1 if and only if ith worker is assigned to jth job. [0059] Σj=1 NXij=1∀i∈1, N one core to one DNN workload assignment. [0060] Σj=1 NXij=1∀j∈1, N one DNN workload to one core assignment. [0061] Σi=1 NΣj=1 N=CijXij=1⇒min total cost function. [second binary assignment matrix]”; “[0105] In the case of a cloud core allocation for the DNNi workload, the CPUi and Ci refer to the cloud processing core utilization and cloud core allocated respectively. The above optimization problem may be an integer programming problem with non-linear constraints”).  

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fedorova to incorporate the teachings of Chandra to provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores of Fedorova, with a system for scheduling neural network workloads of Chandra.  Doing so with the system of Fedorova would provide a structure of Deep Neural Networks (DNNs) that is used to more efficiently schedule workloads.  (Chandra, [0018])

Regarding claim 14, the claimed media comprises substantially the same steps or elements as those in claim 5.  Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 5 above.

	Regarding claim 10, the combination of Fedorova teaches the computer-implemented method of claim 1.

	The combination of Fedorova does not teach wherein the trained machine-learning model comprises a conditional regression model.

However, Chandra teaches:
wherein the trained machine-learning model comprises a conditional regression model (“[0042] … The profiler may use a machine learning algorithm, such as a linear regression model, to create a model 210 for a DNN model. Once learned, the performance model 210 may be used to determine an estimate runtime 220 and memory usage 222 of the DNN workload based on the input parameters 200 to the DNN workload”).  

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fedorova to incorporate the teachings of Chandra to provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores of Fedorova, with a system for scheduling neural network workloads of Chandra using a machine learning algorithm, such as a linear regression model.  Doing so with the system of Fedorova would provide a structure of Deep Neural Networks (DNNs) that is used to more efficiently schedule workloads.  (Chandra, [0018])

Regarding claim 13, the combination of Fedorova teaches the one or more non-transitory computer readable media of claim 11.

	The combination of Fedorova does not teach computing the at least one feature associated with the first workload based on at least one of a number of requested processors associated with the first workload, an amount of requested memory associated with the first workload, a name of a software application associated with the first workload, and a user identifier associated with the first workload.

However, Chandra teaches:
computing the at least one feature associated with the first workload based on at least one of a number of requested processors associated with the first workload, an amount of requested memory associated with the first workload, a name of a software application associated with the first workload, and a user identifier associated with the first workload (“[0054] … to allocate cores, the allocator 106 assigns each DNN workload to a specific core 116A-116D based on current CPU utilization and DNN model specifications. By using the profiler 110, the allocator 108 has greater knowledge of how each DNN workload is going to behave across various core allocations and may optimize for the same. By using the profiler 110, how much time [the at least one feature] a DNN workload will take on each core given its percentage utilization may be estimated”; “[0055] … the mathematical model used by the allocator 108 is defined as: Let ci,j be the cost of assigning the ith core to a number of requested processors] and m is number of DNN workloads”).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fedorova to incorporate the teachings of Chandra to provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores of Fedorova, with a system for scheduling neural network workloads of Chandra using the profiler 110 to estimate how much time a DNN workload will take on each core given its percentage utilization.  Doing so with the system of Fedorova would provide a structure of Deep Neural Networks (DNNs) that is used to more efficiently schedule workloads.  (Chandra, [0018])

Claims 7 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Fedorova et al. (US 2011/0246995 A1), hereinafter “Fedorova”, in view of Zhu et al. (US 2013/0185433 A1), hereinafter “Zhu”, as applied to claims 1 and 11 above, and further in view of Zmora et al. (US 2018/0307624 A1), hereinafter “Zmora”.

	Regarding claim 7, the combination of Fedorova teaches the computer-implemented method of claim 1.


wherein the numeric performance cost estimate is computed based on a cost function that includes a plurality of terms for balancing the forecasted amounts of processor use across a plurality of lowest-level caches (LLCs) shared by the first plurality of processors and a plurality of additional caches shared by the first plurality of processors, and wherein the plurality of additional caches are at one or more cache levels that are higher than a cache level associated with the plurality of lowest-level caches (FIGs. 1-3; “[0032] … Computing device 100 can include a CMT processor 102 with two or more processor cores 104-106 [first plurality of processors], each of which includes a processor core pipeline 108 and a level-one (L1) cache 110 [lowest-level caches (LLCs)]. Processor cores 104-106 share a level-two (L2) cache 112 [additional caches]”; “[0049] … the forecasting system may involve a linear combination of basis functions of the form F(x) [cost function] = ∑ pi * wi(x), where pi (where i=1 . . . , N) are tunable parameters that are adjusted in the course of learning, i is the number of rules, x is a k-dimensional vector of parameters, and wi(x) is a function which provides weights for the parameters”; a plurality of terms is considered to be pi * wi(x), where pi (where i=1 . . . , N); a level-two (L2) cache 112 [additional caches] is higher than a level-one (L1) cache 110 [lowest-level caches (LLCs)] since the level-two (L2) cache 112 [additional caches] is farther from the processor cores 104-106 [first plurality of processors]).  
	
Regarding claim 16, the combination of Fedorova teaches the one or more non-transitory computer readable media of claim 11.

Fedorova further teaches:
wherein each processor included in the first plurality of processors is associated with a different set of one or more cache memories, and wherein computing the numeric performance cost estimate comprises estimating a cache interference cost based on an imbalance in predicted pressures across the sets of one or more cache memories (FIGs. 2-3; “[0006] … depending on their cache access characteristics, two threads that share a cache may interfere with each others' cache data and cause pipeline resource contention that can reduce the performance of both threads”; “[0044] Embodiments of the present invention involve a rule-based predictive forecasting system that uses the thread metrics and/or resource-usage characterizations of multiple threads to predict changes in IPC for possible thread-scheduling decisions. This forecasting system can use measured metrics to predict which threads will cause relatively more “stress” (e.g., contention) for the cache/memory hierarchy and how each particular thread will respond to different co-runners”; “[0046] … two threads (thread 1 200 and thread 2 202) execute on processor core 104 and share an L1 cache 110 [first LLC], while two other threads (thread 3 204 and thread 4 206) execute on processor core 106, and share a second different L1 cache 110 [second LLC]. When considering a scheduling change that swaps thread 2 202 and thread 3 204, the forecasting system would need to consider the predicted change in IPC for all four threads”).  

	The combination of Fedorova does not teach wherein each processor included in 

However, Zmora teaches:
wherein each processor included in the first plurality of processors is associated with a different set of one or more cache memories (FIG. 16; “[0199] … one or more processors 1602”; “[0202] … the processor 1602 includes cache memory 1604. Depending on the architecture, the processor 1602 can have a single internal cache or multiple levels of internal cache”).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fedorova to incorporate the teachings of Zmora to provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores with a shared cache of Fedorova, with an electronic device of Zmora having one or more processors, each with a single internal cache or multiple levels of internal cache.  Doing so with the system of Fedorova would implement context-based eviction policy in order to resolve the cache line conflict.  (Zmora, Abstract)

Regarding claim 17, the combination of Fedorova teaches the one or more non-transitory computer readable media of claim 11.

Fedorova further teaches:
 wherein computing the numeric performance cost estimate comprises estimating a cross-socket memory access cost resulting from executing a first execution thread associated with the first workload on a first processor included in the first subset of processors while also executing a second execution thread associated with the first workload on a second processor included in the second subset of processors (FIG. 2; “[0046] FIG. 2 illustrates an exemplary potential scheduling change for the exemplary computing device 100 of FIG. 1. In FIG. 2, two threads (thread 1 200 and thread 2 202) [first execution thread] execute on processor core 104 [first processor] and share an L1 cache 110, while two other threads (thread 3 204 and thread 4 206) [second execution thread] execute on processor core 106 [second processor], and share a second different L1 cache 110. When considering a scheduling change that swaps thread 2 202 and thread 3 204, the forecasting system would need to consider the predicted change in IPC for all four threads. For instance, in order to estimate the impact of such a swap on the IPC1 of thread 1 200, the forecasting system may consider: the cache miss rate MPI1 and cache access rate API1 for thread 1 200; the cache miss rate MPI2 and cache access rate API2 for thread 2 202; and the cache miss rate MPI3 and cache access rate API3 for thread 3 204”).  

	The combination of Fedorova does not teach wherein the plurality of processors are partitioned into at least a first subset of processors that are included in a first socket 

However, Zmora teaches:
wherein the plurality of processors are partitioned into at least a first subset of processors that are included in a first socket and a second subset of processors that are included in a second socket (FIG. 1; “[0042] … Some embodiments may include two or more sets of processor(s) 102 attached via multiple sockets, which can couple with two or more instances of the parallel processor(s) 112”).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fedorova to incorporate the teachings of Zmora to provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores with a shared cache of Fedorova, with an electronic device of Zmora having one or more processors, each with a single internal cache or multiple levels of internal cache, attached via multiple sockets.  Doing so with the system of Fedorova would implement context-based eviction policy in order to resolve the cache line conflict.  (Zmora, Abstract)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Mars et al. discloses a first indicator of a first number of cache misses to a cache memory of a multicore processor for a first application over a first time period is .
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tong B Vo whose telephone number is (571)272-7568.  The examiner can normally be reached on M-F 8:00 AM - 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571)272-4085.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  

/T.B.V./Patent Examiner, Art Unit 2136

/CHARLES RONES/Supervisory Patent Examiner, Art Unit 2136