Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to claims filed 10/29/2021.
Claims 1-20 pending.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of copending Application No. 16/259,244 (reference application) as exemplified in the table below. Although the claims at issue are not identical, they are not patentably distinct from each other because in many aspects ‘244 is narrower in scope and anticipates the instant application. Further, in a few aspects, ‘244 performs the same functionality but in a slightly different context. For example, whereas the instant application trains the dynamic system model in consideration of interference caused by additional workloads, ‘244 considers the interference (without using this term) of resource use and actions of simulated workloads in making adjustments to the domain model. Therefore, both the instant . This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Instant Application
16/259,244
18. An apparatus, comprising: a memory; and at least one processing device, coupled to the memory, operative to implement the following steps:

obtaining a dynamic system model based on a relation between an amount of at least one resource for a plurality of iterative workloads and at least one predefined service metric;
obtaining an instantaneous value of the at least one predefined service metric;








applying to a given controller associated with a given one of the plurality of iterative workloads: (i) the dynamic system model, (ii) an interference effect that accounts for interference of a allocation of resources to one or more additional iterative workloads of the plurality of iterative workloads on a performance of the given one of the plurality of iterative workloads, and















(iii) a difference between the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric, wherein the given controller determines an adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads based at least in part on the difference; and

initiating, by the given controller, an application of the determined adjustment to the amount of the at least one resource to the given one of the plurality of iterative workloads,





obtaining (i) a specification of an iterative workload comprising a plurality of states of the iterative workload and a set of available actions for one or more of the plurality of states, and (ii) a domain model of the iterative workload that relates an amount of resources allocated in training data with one or more service metrics,

wherein a duration of one simulated iteration using said domain model of the 

adjusting weights of at least one reinforcement learning agent by performing iteration steps for each simulated iteration of the iterative workload and then using variables observed during the simulated iteration to refine the at least one reinforcement learning agent; and
determining, by the at least one reinforcement learning agent implemented using at least one processing device, a dynamic resource allocation policy for the iterative workload, wherein the iteration steps for each simulated iteration of the iterative workload comprise:
(a) employing the at least one reinforcement learning agent to select an action from the set of available actions for a current state, obtain a reward for the 

wherein the step of adjusting weights of the at least one reinforcement learning agent employs a reward metric based on a difference between a desired service metric and a measured service metric.






(b) updating, by the at least one reinforcement learning agent, a function that evaluates a quality of a plurality of state-action combinations; and
(c) repeating the employing and updating steps with a new allocation of resources 


Similar mappings between the remaining claims would have been obvious to a person having ordinary skill in the art but have been omitted for the sake of brevity.

Claims 1-20 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-17 and 19-21 of copending Application No. 16/400,289 (reference application) in view of the disclosure of 16/400,289 (US 2020/0348979 A1) as exemplified in the table below. Although the claims at issue are not identical, they are not patentably distinct from each other because in some aspects ‘289 is narrower in scope and anticipates the instant application. Further, the instant application takes into consideration an interference effect of one or more additional iterative workloads of the plurality of iterative workloads on the given one of the plurality of iterative workloads which is not explicitly claims in ‘289, however, the disclosure of ‘289 (‘979) discloses at ¶ [0057] “Assuming that the SLA metric to be controlled is the execution time (et=T), one can feedback the amount of time t it takes to complete an epoch and compare this time to the desired time per epoch, which is T/n. If an epoch took longer than T/n to finish, more resources might me be needed. On the other hand, if the time t is significantly smaller than T/n, this indicates that the job does not need the amount of resources currently allocated to the job and reducing the allocation can decrease costs and even make room for additional jobs to run”. A person having ordinary skill in the art prior to the effective filing date of the claimed invention would . This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Instant Application
16/400,289
1. A method, comprising:

obtaining a dynamic system model based on a relation between an amount of at least one resource for a plurality of iterative workloads and at least one predefined service metric;
obtaining an instantaneous value of the at least one predefined service metric;









applying to a given controller associated with a given one of the plurality of iterative workloads: (i) the dynamic system model, and (iii) a difference between the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric, wherein the given controller determines an adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads based at least in part on the difference; and

(ii) an interference effect that accounts for interference of an allocation of resources to one or more additional iterative workloads of the plurality of iterative workloads on the given one of the plurality of iterative workloads on a performance of the given one of the plurality of iterative workloads, and








initiating, by the given controller, an application of the determined adjustment to the amount of the at least one resource to the given one of the plurality of iterative workloads;

wherein the method is performed by at least one processing device comprising a processor coupled to a memory.


obtaining a dynamic system model that relates (i) an amount of at least one resource provided by an execution environment that executes one or more iterative workloads and
(ii) at least one predefined service metric indicating a level of service provided by the execution environment for the one or more iterative workloads, wherein one or more parameters of the obtained dynamic system model are learned for a plurality of iterations of the one or more iterative workloads; obtaining an instantaneous value of the at least one predefined service metric; and

applying to a controller: (i) the dynamic system model for a given iteration of the plurality of iterations of the one or more iterative workloads, and (ii) a difference between the instantaneous value for the given iteration of the at least one predefined service metric and a target value for the at least one predefined service metric, wherein the controller determines an adjustment, based at least in part on the difference, to the amount of the at least one resource


‘979’ ¶ [0057] “Assuming that the SLA metric to be controlled is the execution time (et=T), one can feedback the amount of time t it takes to complete an epoch and compare this time to the desired time per epoch, which is T/n. If an epoch took longer than T/n to finish, more resources might me be needed. On the 

to be applied in the execution environment for the one or more iterative workloads,



wherein the method is performed by at least one processing device comprising a processor coupled to a memory.


Similar mappings between the remaining claims would have been obvious to a person having ordinary skill in the art but have been omitted for the sake of brevity.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-9, 11-12, 14-15, 17-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chandra et al. Pub. No. US 2019/0266015 A1 (hereafter Chandra) in view of Dirac et al. Pat. No. US 10,257,275 B1 (hereafter Dirac).

With regard to Claim 1, Chandra teaches a method, comprising: obtaining a dynamic system model (DNN models in at least Fig. 1 and receiving workloads in at least Fig. 11) based on a relation between an amount of at least one resource (The profiler 110 uses training data to generate a performance model for a DNN model. This performance model is used to determine predicted resource requirements for an instance of the DNN model, e.g., a DNN workload. Creating a performance model may be a supervised learning problem, where given a value of Sr, Ci, Bs, and Pc, the performance model predicts the time taken and peak RAM usage of a DNN workload in at least ¶ [0047] and ¶ [0042]) for a plurality of workloads (Systems, methods, and computer-executable instructions for scheduling neural network workloads on an edge device. A performance model for each neural network model is received  in at least abstract and scheduling workloads in at least Fig. 11) and at least one predefined service metric (FIGS. 4A, 4B, and 4C are graphs showing training data used for the 
obtaining, an instantaneous value of the at least one predefined service metric (The core allocator allocates of one or more cores 116A-116D to each DNN workload based on resource requirement of a DNN workload and current system utilization. The DNN parameter allocator takes the input from the allocator 106 and profiler 110 to assign various DNN parameters to each of the DNN workloads to maximize a specified optimization criteria … The allocator 106 uses the learned performance model profiler for each DNN model and current system utilization. The allocator 106 formulates allocation of DNN workloads as an optimization problem for assigning system resource and DNN parameters to each DNN workload while maximizing the specified optimization criteria and following the constraints that arise from hardware limitations in at least ¶ [0035] – [0036]);
applying to a given controller associated with a given one of the plurality of workloads (DNN model applied to controllers in at least Fig. 1, allocator, scheduler, profiler, et cetera): (i) the dynamic system model (DNN models in at least Fig. 1 and DNN model received and applied to controllers in at least Fig. 11), (ii) an interference effect that accounts for interference of an allocation of resources to one or more additional workloads of the plurality of workloads (At 1140, image streams are  on a performance of the given one of the plurality of workloads (to optimize the core allocations and the parameter allocations, the DNN parameters may be randomly initialized. For given DNN parameters, the optimal core allocation scheme is calculated for DNN workloads. Then, given the core allocation scheme, the optimal DNN parameters are determined. The optimal core allocation is then determined again, this time using the optimized DNN parameters. Using the latest core allocation, the DNN parameters may be optimized again. This process repeats until there is convergence. The core and DNN parameter allocators are described in greater detail below in at least ¶ [0053]), and
(iii) parameters for optimizing the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric, wherein the given controller determines an optimization to the amount of the at least one resource for the given one of the plurality of workloads based at least in part on the parameters; and initiating, by the given controller, an application of the determined optimization to the amount of the at least one resource to the given one of the plurality of workloads (to optimize the core allocations and the parameter allocations, the DNN parameters may be randomly initialized. For given DNN parameters, the optimal core allocation scheme is calculated 
wherein the method is performed by at least one processing device of the given controller, wherein the at least one processing device comprises a processor coupled to a memory (Computing device 1200 may include a hardware processor 1202 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 1204 and a static memory 1206, some or all of which may communicate with each other via a link (e.g., bus) 1208 in at least ¶ [0140] and  a computer-readable storage media or machine-readable storage media may include any medium that is capable of storing, encoding, or carrying instructions for execution by the computing device 1200 and that cause the computing device 1200 to perform any one or more of the techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions in at least ¶ [0144]).
Chandra teaches utilizing deep neural network models for optimizing resource allocations to workloads in view of instantaneous utilization and target values based on metrics. However, although Chandra clearly iterates to refine optimization (see at least ¶ [0053]) and accommodate additional workloads (see at least ¶ [0138] and Fig. 11), Chandra does not specifically recite that the workloads themselves are iterative. Further, Chandra teaches corrective adjustment (i.e., optimizing iteratively, see again the aforementioned citations) but does not specifically teach that these optimized 
However, in analogous art Dirac teaches iterative workloads (The optimizer implements a plurality of iterations of execution of the model, interleaved with observation collection intervals. During a given observation collection interval, tunable parameter settings suggested by the previous model execution iteration are used in the execution environment, and the observations collected during the interval are used as inputs for the next model execution iteration. When an optimization goal is attained, the tunable settings that led to achieving the goal are stored in at least abstract and col. 4 lines 49-59)
a difference between the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric (The optimization tool may determine a definition of, or a formula for, an objective function which is to be maximized or minimized with respect to the optimization target execution environment in various embodiments. An objective function may be identified in various ways in different embodiments, e.g., based on input from the client or based on the types of applications being run. Objective functions may sometimes be referred to as “loss functions”. While a number of different types of performance-related execution results (such as throughput, response times, consumed CPU-minutes, network bandwidth used, etc.) may be collected from the execution environment in some cases, a scalar objective function (an optimization objective expressible and computable as a single value) may be used in various embodiments for Bayesian optimization using Gaussian processes. The value of the objective function for a given  wherein the given controller determines an adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads based at least in part on the difference (In at least some embodiments, the optimization tool may determine the boundaries (e.g., start and end times) of the observation collection intervals as discussed below. The iterations of model execution and observation collection may be repeated until either the targeted extremum of the objective function has been attained (at least to within some tolerance limit), or until the resources available for the optimization task have been exhausted. The combination of tunable parameter settings that correspond to the attainment of the optimization goal (or the combination of tunable parameter settings that came closest to the optimization goal) may be stored in various embodiments, e.g., in a persistent storage repository accessible to the optimization tool and/or in a knowledge base in at least col. 4 line 59 – col. 5 line 6 and col. 15 lines 39-55);
initiating, by the given controller, an application of the determined adjustment to the amount of the at least one resource to the given one of the plurality of iterative workloads (The service 620 may select the appropriate resources for the execution environment corresponding to each of the computation requests 615, 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the iterative workloads and corrective adjustment corresponding to a difference between the instantaneous value and a target value of Dirac with the systems and methods of Chandra resulting in a system in which the workloads of Chandra are iterative as in Dirac and the corrective optimization of Chandra is corresponding to a difference between the instantaneous value and a target value as in Dirac. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of iterating on the optimization such that the parameters may be tuned to further increase performance and efficiency (see at least Dirac col. 15 lines 39-55 and abstract).

With regard to Claim 3, Dirac teaches wherein the adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads is determined by substantially minimizing the difference (An optimization goal 822 may indicate whether the objective function is to be maximized or minimized, or more generally the triggering conditions which are to be used to determine whether the optimization task has been completed successfully in at least col. 16 lines 9-16 and in at least col. 3 lines 32-55, col. 4 line 59 – col. 5 line 6 and col. 15 lines 39-55).

With regard to Claim 4, Chandra teaches wherein the obtained system model is one or more of: derived from a relation between an amount of at least one resource added and the predefined service level metric (The profiler 110 uses training data to generate a performance model for a DNN model. This performance model is used to determine predicted resource requirements for an instance of the DNN model, e.g., a DNN workload. Creating a performance model may be a supervised learning problem, where given a value of Sr, Ci, Bs, and Pc, the performance model predicts the time taken and peak RAM usage of a DNN workload in at least ¶ [0047] and ¶ [0042] and FIGS. 4A, 4B, and 4C are graphs showing training data used for the profiler in accordance with respective examples. FIG. 4A illustrates maximum 402 and minimum 404 run time values for varying batch sizes. FIG. 4B illustrates maximum 412 and minimum 414 values of run time for varying precision values. FIG. 4C illustrates the run time for various DNN models for varying CPU core usage. These figures show that each metric may play a crucial role in controlling the time and memory taken by a DNN model for execution in at least ¶ [0047] and ¶ [0042]) and predefined based on the relation between the amount of the at least one resource added (An easy but inefficient way to achieve the scheduling objective is via exhaustive profiling which involves executing all possible configurations for all available DNN models and find the best configuration for each model in at least ¶ [0040]).

With regard to Claim 5, Chandra teaches wherein the obtained system model is updated over time based on an amount of at least one resource added and the one or more predefined service metrics (to optimize the core allocations and the 

With regard to Claim 6, Chandra teaches wherein the given one of the plurality of iterative workloads comprises a training of a Deep Neural Network (The profiler 110 uses training data to generate a performance model for a DNN model. This performance model is used to determine predicted resource requirements for an instance of the DNN model, e.g., a DNN workload. Creating a performance model may be a supervised learning problem, where given a value of Sr, Ci, Bs, and Pc, the performance model predicts the time taken and peak RAM usage of a DNN workload in at least ¶ [0047] and ¶ [0042]).

With regard to Claim 7, Chandra teaches wherein the at least one resource comprises one or more of a number of processing cores in a computer processor, a number of processing cores in a graphics processing unit (The resource requirements may include how much how much time the DNN model takes to run on a number of CPU cores or GPU cores under different utilizations. In an example, the profiler 110 uses a machine learning technique for estimating the resource requirements of each DNN model to avoid profiling running all possible scenarios. The profiler 110 learns the dependency of tunable DNN parameters such as sampling rate, batch size, precision, and system resources such as CPU, GPU, memory utilizations on the performance throughput of DNN models in at least ¶ [0033] and The framework 100 includes an allocator 106 that uses the profiler 104 to handle system resource allocation for the DNN workloads that will be executed on available cores 116A, 116B, 116C, and 116D. The allocator 106 includes two subcomponents: a core allocator and a DNN parameter allocator. The core allocator allocates of one or more cores 116A-116D to each DNN workload based on resource requirement of a DNN workload and current system utilization in at least ¶ [0035]), an amount of memory (The profiler 110 keeps track of various system resources such as CPU, GPU and memory usage while varying various DNN parameters in at least ¶ [0041] – [0042]) and an amount of network bandwidth (the objective function may be any weighted combinations of the above defined objectives. As an example, the objective function that is being minimized may be: α*max(Ti)+β*Cost where Ti is the time taken by the DNNi workload; Cost is the cloud cost including bandwidth in at least ¶ [0086] and The scheduler 108 may 

With regard to Claim 8, Dirac teaches wherein the determination of the adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads is performed substantially in parallel with an execution of the plurality of iterative workloads (a given optimization coordinator 13A0 may work on orchestrating the optimization of several different execution environments 150, e.g., either in parallel or sequentially in at least col. 6 lines 8-31).

With regard to Claim 9, Chandra teaches wherein the interference effect of the one or more of the plurality of iterative workloads on the given one of the plurality of iterative workloads is determined in a sequence (At 1140, image streams are received. The image streams are mapped to DNN workloads. At 1150, the DNN workloads are scheduled to be executed. At 1160, the DNN workloads are executed at the scheduled time on the assigned processing core. In addition, the corresponding image stream is provided o the DNN workload. After a period of time, the scheduling process may run again to continue to schedule any uncompleted DNN workload as well as new DNN workloads in at least ¶ [0138] and Fig. 11).

With regard to Claim 11, Chandra teaches wherein a newly deployed workload is added to the sequence (At 1140, image streams are received. The image streams 

With regard to Claim 12, Chandra teaches a computer program product, comprising a non-transitory machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps (the storage device 1216 may include a computing-readable (or machine-readable) storage media 1222, on which is stored one or more sets of data structures or instructions 1224 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein in at least ¶ [0140] – [0144]):
obtaining a dynamic system model (DNN models in at least Fig. 1 and receiving workloads in at least Fig. 11) based on a relation between an amount of at least one resource (The profiler 110 uses training data to generate a performance model for a DNN model. This performance model is used to determine predicted resource requirements for an instance of the DNN model, e.g., a DNN workload. Creating a performance model may be a supervised learning problem, where given a value of Sr, Ci, Bs, and Pc, the performance model predicts the time taken and peak  for a plurality of workloads (Systems, methods, and computer-executable instructions for scheduling neural network workloads on an edge device. A performance model for each neural network model is received  in at least abstract and scheduling workloads in at least Fig. 11) and at least one predefined service metric (FIGS. 4A, 4B, and 4C are graphs showing training data used for the profiler in accordance with respective examples. FIG. 4A illustrates maximum 402 and minimum 404 run time values for varying batch sizes. FIG. 4B illustrates maximum 412 and minimum 414 values of run time for varying precision values. FIG. 4C illustrates the run time for various DNN models for varying CPU core usage. These figures show that each metric may play a crucial role in controlling the time and memory taken by a DNN model for execution in at least ¶ [0047] and ¶ [0042]);
obtaining, an instantaneous value of the at least one predefined service metric (The core allocator allocates of one or more cores 116A-116D to each DNN workload based on resource requirement of a DNN workload and current system utilization. The DNN parameter allocator takes the input from the allocator 106 and profiler 110 to assign various DNN parameters to each of the DNN workloads to maximize a specified optimization criteria … The allocator 106 uses the learned performance model profiler for each DNN model and current system utilization. The allocator 106 formulates allocation of DNN workloads as an optimization problem for assigning system resource and DNN parameters to each DNN workload while maximizing the specified optimization criteria and following the constraints that arise from hardware limitations in at least ¶ [0035] – [0036]);
applying to a given controller associated with a given one of the plurality of workloads (DNN model applied to controllers in at least Fig. 1, allocator, scheduler, profiler, et cetera): (i) the dynamic system model (DNN models in at least Fig. 1 and DNN model received and applied to controllers in at least Fig. 11), (ii) an interference effect that accounts for interference of an allocation of resources to one or more additional workloads of the plurality of workloads (At 1140, image streams are received. The image streams are mapped to DNN workloads. At 1150, the DNN workloads are scheduled to be executed. At 1160, the DNN workloads are executed at the scheduled time on the assigned processing core. In addition, the corresponding image stream is provided o the DNN workload. After a period of time, the scheduling process may run again to continue to schedule any uncompleted DNN workload as well as new DNN workloads in at least ¶ [0138] and Fig. 11) on a performance of the given one of the plurality of workloads (to optimize the core allocations and the parameter allocations, the DNN parameters may be randomly initialized. For given DNN parameters, the optimal core allocation scheme is calculated for DNN workloads. Then, given the core allocation scheme, the optimal DNN parameters are determined. The optimal core allocation is then determined again, this time using the optimized DNN parameters. Using the latest core allocation, the DNN parameters may be optimized again. This process repeats until there is convergence. The core and DNN parameter allocators are described in greater detail below in at least ¶ [0053]), and
(iii) parameters for optimizing the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric, wherein the given controller determines an optimization to the amount of the at least one resource for the given one of the plurality of workloads based at least in part on the parameters; and initiating, by the given controller, an application of the determined optimization to the amount of the at least one resource to the given one of the plurality of workloads (to optimize the core allocations and the parameter allocations, the DNN parameters may be randomly initialized. For given DNN parameters, the optimal core allocation scheme is calculated for DNN workloads. Then, given the core allocation scheme, the optimal DNN parameters are determined. The optimal core allocation is then determined again, this time using the optimized DNN parameters. Using the latest core allocation, the DNN parameters may be optimized again in at least ¶ [0053] and Fig. 11).
Chandra teaches utilizing deep neural network models for optimizing resource allocations to workloads in view of instantaneous utilization and target values based on metrics. However, although Chandra clearly iterates to refine optimization (see at least ¶ [0053]) and accommodate additional workloads (see at least ¶ [0138] and Fig. 11), Chandra does not specifically recite that the workloads themselves are iterative. Further, Chandra teaches corrective adjustment (i.e., optimizing iteratively, see again the aforementioned citations) but does not specifically teach that these optimized corrections are pursuant to a difference between the instantaneous value and a target value.
However, in analogous art Dirac teaches iterative workloads (The optimizer implements a plurality of iterations of execution of the model, interleaved with observation collection intervals. During a given observation collection interval, tunable parameter settings suggested by the previous model execution iteration are used in the 
a difference between the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric (The optimization tool may determine a definition of, or a formula for, an objective function which is to be maximized or minimized with respect to the optimization target execution environment in various embodiments. An objective function may be identified in various ways in different embodiments, e.g., based on input from the client or based on the types of applications being run. Objective functions may sometimes be referred to as “loss functions”. While a number of different types of performance-related execution results (such as throughput, response times, consumed CPU-minutes, network bandwidth used, etc.) may be collected from the execution environment in some cases, a scalar objective function (an optimization objective expressible and computable as a single value) may be used in various embodiments for Bayesian optimization using Gaussian processes. The value of the objective function for a given observation collection interval may be calculated using a combination of raw execution results by the optimization tool in some embodiments. For example, an objective function such as “total cost in dollars” may be computed using a collection of raw metrics such as “CPU-minutes consumed”, “network bandwidth used”, “storage space used”, in combination with cost metrics such as “dollars per CPU-minute”, “dollars-per-megabyte-of-bandwidth”, etc in at least col. 3 lines 32-55), wherein the given controller determines an adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads based at least in part on the difference (In at least some embodiments, the optimization tool may determine the boundaries (e.g., start and end times) of the observation collection intervals as discussed below. The iterations of model execution and observation collection may be repeated until either the targeted extremum of the objective function has been attained (at least to within some tolerance limit), or until the resources available for the optimization task have been exhausted. The combination of tunable parameter settings that correspond to the attainment of the optimization goal (or the combination of tunable parameter settings that came closest to the optimization goal) may be stored in various embodiments, e.g., in a persistent storage repository accessible to the optimization tool and/or in a knowledge base in at least col. 4 line 59 – col. 5 line 6 and col. 15 lines 39-55); and
initiating, by the given controller, an application of the determined adjustment to the amount of the at least one resource to the given one of the plurality of iterative workloads (The service 620 may select the appropriate resources for the execution environment corresponding to each of the computation requests 615, and initiate the execution of the program code 625 indicated in the computation requests in at least col. 12 lines 46-50).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the iterative workloads and corrective adjustment corresponding to a difference between the instantaneous value and a target value of Dirac with the systems and methods of Chandra resulting in a 

With regard to Claim 14, Dirac teaches wherein the determination of the adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads is performed substantially in parallel with an execution of the plurality of iterative workloads (a given optimization coordinator 13A0 may work on orchestrating the optimization of several different execution environments 150, e.g., either in parallel or sequentially in at least col. 6 lines 8-31).

With regard to Claim 15, Chandra teaches wherein the interference effect of the one or more of the plurality of iterative workloads on the given one of the plurality of iterative workloads is determined in a sequence (At 1140, image streams are received. The image streams are mapped to DNN workloads. At 1150, the DNN workloads are scheduled to be executed. At 1160, the DNN workloads are executed at the scheduled time on the assigned processing core. In addition, the corresponding image stream is provided o the DNN workload. After a period of time, the 

With regard to Claim 17, Chandra teaches wherein a newly deployed workload is added to the sequence (At 1140, image streams are received. The image streams are mapped to DNN workloads. At 1150, the DNN workloads are scheduled to be executed. At 1160, the DNN workloads are executed at the scheduled time on the assigned processing core. In addition, the corresponding image stream is provided o the DNN workload. After a period of time, the scheduling process may run again to continue to schedule any uncompleted DNN workload as well as new DNN workloads in at least ¶ [0138] and Fig. 11).

With regard to Claim 18, an apparatus, comprising: a memory; and at least one processing device, coupled to the memory, operative to implement the following steps (Computing device 1200 may include a hardware processor 1202 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 1204 and a static memory 1206, some or all of which may communicate with each other via a link (e.g., bus) 1208 in at least ¶ [0140] – [0144]):
obtaining a dynamic system model (DNN models in at least Fig. 1 and receiving workloads in at least Fig. 11) based on a relation between an amount of at least one resource (The profiler 110 uses training data to generate a performance model for a DNN model. This performance model is used to determine predicted  for a plurality of workloads (Systems, methods, and computer-executable instructions for scheduling neural network workloads on an edge device. A performance model for each neural network model is received in at least abstract and scheduling workloads in at least Fig. 11) and at least one predefined service metric (FIGS. 4A, 4B, and 4C are graphs showing training data used for the profiler in accordance with respective examples. FIG. 4A illustrates maximum 402 and minimum 404 run time values for varying batch sizes. FIG. 4B illustrates maximum 412 and minimum 414 values of run time for varying precision values. FIG. 4C illustrates the run time for various DNN models for varying CPU core usage. These figures show that each metric may play a crucial role in controlling the time and memory taken by a DNN model for execution in at least ¶ [0047] and ¶ [0042]);
obtaining, an instantaneous value of the at least one predefined service metric (The core allocator allocates of one or more cores 116A-116D to each DNN workload based on resource requirement of a DNN workload and current system utilization. The DNN parameter allocator takes the input from the allocator 106 and profiler 110 to assign various DNN parameters to each of the DNN workloads to maximize a specified optimization criteria … The allocator 106 uses the learned performance model profiler for each DNN model and current system utilization. The allocator 106 formulates allocation of DNN workloads as an optimization problem for 
applying to a given controller associated with a given one of the plurality of workloads (DNN model applied to controllers in at least Fig. 1, allocator, scheduler, profiler, et cetera): (i) the dynamic system model (DNN models in at least Fig. 1 and DNN model received and applied to controllers in at least Fig. 11), (ii) an interference effect that accounts for interference of an allocation of resources to one or more additional workloads of the plurality of workloads (At 1140, image streams are received. The image streams are mapped to DNN workloads. At 1150, the DNN workloads are scheduled to be executed. At 1160, the DNN workloads are executed at the scheduled time on the assigned processing core. In addition, the corresponding image stream is provided o the DNN workload. After a period of time, the scheduling process may run again to continue to schedule any uncompleted DNN workload as well as new DNN workloads in at least ¶ [0138] and Fig. 11) on a performance of the given one of the plurality of workloads (to optimize the core allocations and the parameter allocations, the DNN parameters may be randomly initialized. For given DNN parameters, the optimal core allocation scheme is calculated for DNN workloads. Then, given the core allocation scheme, the optimal DNN parameters are determined. The optimal core allocation is then determined again, this time using the optimized DNN parameters. Using the latest core allocation, the DNN parameters may be optimized again. This process repeats until there is convergence. The core and DNN parameter allocators are described in greater detail below in at least ¶ [0053]), and
(iii) parameters for optimizing the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric, wherein the given controller determines an optimization to the amount of the at least one resource for the given one of the plurality of workloads based at least in part on the parameters; and initiating, by the given controller, an application of the determined optimization to the amount of the at least one resource to the given one of the plurality of workloads (to optimize the core allocations and the parameter allocations, the DNN parameters may be randomly initialized. For given DNN parameters, the optimal core allocation scheme is calculated for DNN workloads. Then, given the core allocation scheme, the optimal DNN parameters are determined. The optimal core allocation is then determined again, this time using the optimized DNN parameters. Using the latest core allocation, the DNN parameters may be optimized again in at least ¶ [0053] and Fig. 11).
Chandra teaches utilizing deep neural network models for optimizing resource allocations to workloads in view of instantaneous utilization and target values based on metrics. However, although Chandra clearly iterates to refine optimization (see at least ¶ [0053]) and accommodate additional workloads (see at least ¶ [0138] and Fig. 11), Chandra does not specifically recite that the workloads themselves are iterative. Further, Chandra teaches corrective adjustment (i.e., optimizing iteratively, see again the aforementioned citations) but does not specifically teach that these optimized corrections are pursuant to a difference between the instantaneous value and a target value.
iterative workloads (The optimizer implements a plurality of iterations of execution of the model, interleaved with observation collection intervals. During a given observation collection interval, tunable parameter settings suggested by the previous model execution iteration are used in the execution environment, and the observations collected during the interval are used as inputs for the next model execution iteration. When an optimization goal is attained, the tunable settings that led to achieving the goal are stored in at least abstract and col. 4 lines 49-59)
a difference between the instantaneous value of the at least one predefined service metric and a target value for the at least one predefined service metric (The optimization tool may determine a definition of, or a formula for, an objective function which is to be maximized or minimized with respect to the optimization target execution environment in various embodiments. An objective function may be identified in various ways in different embodiments, e.g., based on input from the client or based on the types of applications being run. Objective functions may sometimes be referred to as “loss functions”. While a number of different types of performance-related execution results (such as throughput, response times, consumed CPU-minutes, network bandwidth used, etc.) may be collected from the execution environment in some cases, a scalar objective function (an optimization objective expressible and computable as a single value) may be used in various embodiments for Bayesian optimization using Gaussian processes. The value of the objective function for a given observation collection interval may be calculated using a combination of raw execution results by the optimization tool in some embodiments. For example, an objective  wherein the given controller determines an adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads based at least in part on the difference (In at least some embodiments, the optimization tool may determine the boundaries (e.g., start and end times) of the observation collection intervals as discussed below. The iterations of model execution and observation collection may be repeated until either the targeted extremum of the objective function has been attained (at least to within some tolerance limit), or until the resources available for the optimization task have been exhausted. The combination of tunable parameter settings that correspond to the attainment of the optimization goal (or the combination of tunable parameter settings that came closest to the optimization goal) may be stored in various embodiments, e.g., in a persistent storage repository accessible to the optimization tool and/or in a knowledge base in at least col. 4 line 59 – col. 5 line 6 and col. 15 lines 39-55);
initiating, by the given controller, an application of the determined adjustment to the amount of the at least one resource to the given one of the plurality of iterative workloads (The service 620 may select the appropriate resources for the execution environment corresponding to each of the computation requests 615, and initiate the execution of the program code 625 indicated in the computation requests in at least col. 12 lines 46-50),


With regard to Claim 20, Dirac teaches wherein the determination of the adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads is performed substantially in parallel with an execution of the plurality of iterative workloads (a given optimization coordinator 13A0 may work on orchestrating the optimization of several different execution environments 150, e.g., either in parallel or sequentially in at least col. 6 lines 8-31).

Claims 10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Chandra et al. Pub. No. US 2019/0266015 A1 (hereafter Chandra) in view of Dirac et al. Pat. No. US 10,257,275 B1 (hereafter Dirac) as applied to claims 1, 3-9, 11-12, 14-15, .

With regard to Claim 10, Chandra and Dirac teach the method of claim 9,
Chandra and Dirac do not specifically teach removing workloads from sequence.
However, in analogous art Calderone teaches wherein one of the plurality of iterative workloads that one or more of finished processing and failed processing is removed from the sequence (Failures result in a task and all tasks that depend on it being removed from the queue; they will be re-added and therefore automatically retried because of the convergence loop in at least ¶ [0149]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the removing workloads from sequence of Calderone with the systems and methods of Chandra and Dirac resulting in a system in which when a workload as in Chandra fails as in Calderone is removed from sequence as in Calderone. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing system efficiency by removing operations from the queue that are no longer needed (i.e., removing a workload and it dependencies upon failure) as well as for the purpose of providing for reattempting workloads that have failed (See at least Calderone ¶ [0148] – [0149]).

With regard to Claim 16, Chandra and Dirac teach the computer program product of claim 15,

However, in analogous art Calderone teaches wherein one of the plurality of iterative workloads that one or more of finished processing and failed processing is removed from the sequence (Failures result in a task and all tasks that depend on it being removed from the queue; they will be re-added and therefore automatically retried because of the convergence loop in at least ¶ [0149]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the removing workloads from sequence of Calderone with the systems and methods of Chandra and Dirac resulting in a system in which when a workload as in Chandra fails as in Calderone is removed from sequence as in Calderone. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing system efficiency by removing operations from the queue that are no longer needed (i.e., removing a workload and it dependencies upon failure) as well as for the purpose of providing for reattempting workloads that have failed (See at least Calderone ¶ [0148] – [0149]).

Allowable Subject Matter
Claims 2, 13 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims as well as overcoming any outstanding rejections under 35 U.S.C. § 101 and 112.

Response to Arguments
Applicant's arguments filed 10/29/2021 have been fully considered but they are not persuasive. Applicant argues in substance:

Applicant respectfully traverses the first nonstatutory double patenting provisional rejection, at least on the grounds that copending Application No. 16/259,244 (reference application) does not claim, for example, “applying to a given controller .... (ii) an interference effect that accounts for interference of an allocation of resources to one or more additional iterative workloads of the plurality of iterative workloads on a performance of the given one of the plurality of iterative workloads,” as currently claimed. It does not appear that the Examiner addressed this limitation in the table that appears, for example, on page 13 of the rejection. Further, Applicant respectfully submits that the independent claims of the reference application recite an iterative workload, and not “a plurality of iterative workloads.”
With regard to point (a), Examiner respectfully disagrees with Applicant. First, the assertion that Examiner did not address a limitation is factually incorrect. Please refer to detailed mapping in rejection above. Second, Applicant argues that the provisional non-statutory double patenting rejection over 16/259,244 is improper as '244 pertains to a single iterative workload and the instant claims pertain to a plurality of iterative workloads. Examiner notes that '244 also pertains to utilizing the 
With respect to the nonstatutory double patenting provisional rejection based on copending Application No. 16/400,289, Applicant respectfully maintains that copending Application No. 16/400,289 in view of the disclosure of 16/400,289 (US 2020/0348979 Al) also does not claim, for example, “applying to a given controller .... (ii) an interference effect that accounts for interference of an allocation of resources to one or more additional iterative workloads of the plurality of iterative workloads on a performance of the given one of the plurality of iterative workloads,” as currently claimed. While the independent claims of the ‘289 application do recite “one or more iterative workloads,” the Examiner appears to acknowledge that the claimed “interference effect” is not recited in the claims. The Examiner asserts, however, that this limitation is taught (not claimed) by 2020/0348979. Even if there was a suggestion to include the disclosed features of 2020/0348979 into the claims of the ‘289 application, par. 0057 teaches an evaluation of a completion time of an epoch and does not teach “(ii) an interference effect that accounts for interference of an allocation of resources to one or more additional iterative workloads of the plurality of iterative workloads on a performance of the given one of the plurality of iterative workloads,” as currently claimed.
With regard to point (b), Examiner respectfully disagrees with Applicant. Applicant argues that the provisional non-statutory double patenting rejection over 16/400,289 was improper as '289 does not teach "an interference effect of one or more additional iterative workloads of the plurality of iterative workloads on a performance of the given one of the plurality of iterative workloads” because “the evaluation of the epoch does not teach an interference effect. Examiner disagrees ¶ [0057] also clearly recites “Assuming that the SLA metric to be controlled is the execution time (et=T), one can feedback the amount of time t it takes to complete an epoch and compare this time to the desired time per epoch, which is T/n. If an epoch took longer than T/n to finish, more resources might me be needed”. Not to mention, reducing resource usage and making room for more jobs would also be an interference effect. Argument has not been found to be persuasive.
Chandra does not each applying an interference effect, as claimed, to a controller that “determines an adjustment to the amount of the at least one resource for the given one of the plurality of iterative workloads based at least in part on the difference.” Applicant respectfully submits that FIG. 11 of Chandra does not consider the impact of resource allocations of other workloads on a performance of a given workload.
With regard to point (c), Examiner respectfully disagrees with Applicant. Applicant has not relied on Chandra to teach the argued limitation, rather for the determining an adjustment based at least in part on the difference, Examiner has relied on Dirac. See detailed mapping in rejection above. Dirac teaches iteratively adjusting and optimizing, adjusting between the current value and an optimized value, the difference. See at least Dirac col. 3 lines 32-55 and col. 4 line 59 – col. 5 line 6 and col. 15 lines 39-55. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Argument has not been found to be persuasive.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner respectfully requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. 

When responding to this Office Action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRADLEY A TEETS whose telephone number is (571)272-3338.  The examiner can normally be reached on Monday - Friday, 6am-2pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng An can be reached on 5712723756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.




/BRADLEY A TEETS/Primary Examiner, Art Unit 2195