Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/15/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-6, 9, 11-16, 19, 21-26, 29, 31-36, 39, 41-48, and 49-50 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Dimitrov et al. US 2019/0213775 Al(“Dimitrov”). 
Regarding claim 1, Dimitrov teaches a processor configured to determine a derived counter value based on a hardware performance counter, the processor comprising: input circuitry configured to input a hardware performance counter value (Dimitrov, para. 0017, ); counter engine circuitry configured to determine the derived counter value by applying a model to the hardware performance counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.” ); the counter engine circuitry comprising an artificial neural network (ANN) configured to dynamically modify the model based on the derived counter value (Dimitrov, para. 0020, “Furthermore, different portions of a given application can have different model parameters[for a neural network]. Model parameters can be loaded into the neural network subsystem prior to launching the application, and the model parameters can be updated as the application executes.”); and output circuitry configured to communicate the derived counter value to a hardware control circuit (Dimitrov, para. 0019, “The neural network subsystem generates operating parameters that are transmitted back to the multiprocessing unit. As the application progresses and PM values change during the course of application execution, the neural network responds by updating the operating parameters to tune the ongoing operation of the multiprocessing unit.”).
Regarding dependent claim 2, Dimitrov teaches the processor of claim 1, wherein the hardware control circuit comprises an operating system scheduler, a memory controller, a power manager, a data prefetcher, or a cache controller(Dimitrov, para. 0025, “[T]he one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.” Note: It is being interpreted that the memory interface clock frequency represent a data prefetcher).1
Regarding dependent claim 3, Dimitrov teaches the processor of claim 1, further comprising circuitry configured to dynamically change the model during operation of the processor (Dimitrov, para. 0020, “Furthermore, different portions of a given application can have different model parameters [for a neural network]. Model parameters can be loaded into the neural network subsystem prior to launching the application, and the model parameters can be updated as the application executes.”).
Regarding dependent claim 4, Dimitrov teaches the processor of claim 1, wherein the model comprises or is generated by the artificial neural network (ANN)(Dimitrov, para. 0030, fig. 1C(124), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment.”). 
Regarding dependent claim 5, Dimitrov teaches the processor of claim 4, wherein the ANN comprises at least one of a convolutional neural network (CNN), a recurrent neural network (RNN), a fully connected neural network or a combination of a CNN, RNN, and/or fully connected neural network(Dimitrov, para. 0030, fig. 1C(124, 150, 152, 154, 156), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment… [e]ach node of the first rank of neural net nodes 150 may receive each available input or a subset thereof. As shown, the first rank of A neural net nodes 150 is fully  A third rank of C neural net nodes 154 provides outputs 156.” Note: It is being interpreted that the exemplary neural network of 124 of fig. 1C represents a fully connected neural network).2
Regarding dependent claim 6, Dimitrov teaches the processor of claim 1, wherein the model comprises a user-defined function (Dimitrov, para. 0029, fig. 1C(124), “The neural network subsystem 124 can be implemented using any technically feasible techniques, including, without limitation, programming instructions executed on a processing unit that perform neural network evaluation.”).
Regarding dependent claim 9 Dimitrov teaches the processor of claim 1, wherein the derived counter value indicates a predicted memory address, a predicted power requirement, or a predicted frequency requirement (Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the one or more operating parameters as outputs during execution of the multithreaded application…the one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.”).3
Regarding claim 11, Dimitrov teaches the processor of claim 1, further comprising circuitry configured to manage power or frequency of the processor based on the derived counter value (Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the Note: It is being interpreted that the core clock frequency represents the frequency of the processor and the core operating voltage represents manage power).4
Regarding claim 12, Dimitrov teaches a prediction unit implemented on a processor core and configured to determine a derived counter value based on a hardware performance counter, the processor core comprising: input circuitry configured to input a hardware performance counter value(Dimitrov, para. 0017, “The multiprocessing unit includes performance monitoring counters (PMs), comprising logic circuits configured to measure different performance-related values inreal-time. In one embodiment, PMs may be configured to monitor at least one of a memory request counter, a memory system bandwidth utilization, a memory system storage capacity utilization, a cache hit rate, a count of instructions executed per clock cycle for one or more threads of amultithreaded program, and a count of instructions executed for one or more threads of the multithreaded program.”); counter engine circuitry configured to determine the derived counter value based on applying a model to the hardware performance counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.” ); and output circuitry configured to communicate the derived counter value to a hardware control circuit(Dimitrov, para. 0019, “The neural network ).
Regarding claim 13, Dimitrov teaches the prediction unit of claim 12, wherein the hardware control circuit comprises an operating system scheduler, a memory controller, a power manager, a data prefetcher, or a cache controller(Dimitrov, para. 0025, “[T]he one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.” Note: It is being interpreted that the memory interface clock frequency represent a data prefetcher).5
Regarding claim 14, Dimitrov teaches the prediction unit of claim 12, further comprising circuitry configured to dynamically change the model during operation of the processor (Dimitrov, para. 0020, “Furthermore, different portions of a given application can have different model parameters [for a neural network]. Model parameters can be loaded into the neural network subsystem prior to launching the application, and the model parameters can be updated as the application executes.”).
Regarding claim 15, Dimitrov teaches the prediction unit of claim 12, wherein the model comprises or is generated by an artificial neural network (ANN) (Dimitrov, para. 0030, fig. 1C(124), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment.”).
wherein the ANN comprises at least one of a convolutional neural network (CNN), a recurrent neural network (RNN), a fully connected neural network, or a combination of a CNN, RNN, and/or fully connected neural network(Dimitrov, para. 0030, fig. 1C(124, 150, 152, 154, 156), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment… [e]ach node of the first rank of neural net nodes 150 may receive each available input or a subset thereof. As shown, the first rank of A neural net nodes 150 is fully connected to a second rank of B neural net nodes 152. A third rank of C neural net nodes 154 provides outputs 156.” Note: It is being interpreted that the exemplary neural network of 124 of fig. 1C represents a fully connected neural network).6
Regarding claim 17, Dimitrov teaches the prediction unit of claim 12, wherein the model comprises a user- defined function(Dimitrov, para. 0029, fig. 1C(124), “The neural network subsystem 124 can be implemented using any technically feasible techniques, including, without limitation, programming instructions executed on a processing unit that perform neural network evaluation.”).
Regarding claim 20, Dimitrov teaches the prediction unit of claim 12, wherein the derived counter value indicates a predicted memory address, a predicted power requirement, or a predicted frequency requirement(Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the one or more operating parameters as outputs during execution of the multithreaded application…the one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active ).7
Regarding claim 22, Dimitrov teaches the prediction unit of claim 12, further comprising circuitry configured to manage power or frequency of the processor based on the derived counter value(Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the one or more operating parameters as outputs during execution of the multithreaded application…the one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.” Note: It is being interpreted that the core clock frequency represents the frequency of the processor and the core operating voltage represents manage power).8
Regarding claim 23 Dimitrov teaches a method for determining a derived counter value based on a hardware performance counter of a processor, the method comprising: inputting a hardware performance counter value to a counter engine(Dimitrov, para. 0017, “The multiprocessing unit includes performance monitoring counters (PMs), comprising logic circuits configured to measure different performance-related values in real-time. In one embodiment, PMs may be configured to monitor at least one of a memory request counter, a memory system bandwidth utilization, a memory system storage capacity utilization, a cache hit rate, a count of instructions executed per clock cycle for one or more threads of a multithreaded program, and a count of instructions executed for one or more threads of the multithreaded program.”); determining the derived counter value by applying a model to the hardware performance counter value using the counter engine(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”); and communicating the derived counter value to a hardware control circuit (Dimitrov, para. 0019, “The neural network subsystem generates operating parameters that are transmitted back to the multiprocessing unit. As the application progresses and PM values change during the course of application execution, the neural network responds by updating the operating parameters to tune the ongoing operation of the multiprocessing unit.”).
Regarding claim 24, Dimitrov teaches the method of claim 23, wherein the hardware control circuit comprises an operating system scheduler, a memory controller, a power manager, a data prefetcher, or a cache controller(Dimitrov, para. 0025, “[T]he one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.” Note: It is being interpreted that the memory interface clock frequency represent a data prefetcher).9
Regarding claim 25, Dimitrov teaches the method of claim 23, further comprising dynamically changing the model during operation of the processor(Dimitrov, para. 0020, “Furthermore, different portions of a given application can have different model parameters [for a neural network]. Model parameters can be loaded into the neural network subsystem prior to ).
Regarding claim 26, Dimitrov teaches the method of claim 23, wherein the model comprises or is generated by an artificial neural network (ANN) (Dimitrov, para. 0030, fig. 1C(124), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment.”).
Regarding claim 27, Dimitrov teaches the method of claim 26, wherein the ANN comprises at least one of a convolutional neural network (CNN), a recurrent neural network (RNN), a fully connected neural network, or a combination of a CNN, RNN, and/or fully connected neural network(Dimitrov, para. 0030, fig. 1C(124, 150, 152, 154, 156), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment… [e]ach node of the first rank of neural net nodes 150 may receive each available input or a subset thereof. As shown, the first rank of A neural net nodes 150 is fully connected to a second rank of B neural net nodes 152. A third rank of C neural net nodes 154 provides outputs 156.” Note: It is being interpreted that the exemplary neural network of 124 of fig. 1C represents a fully connected neural network).10
Regarding claim 28, Dimitrov teaches the method of claim 23, wherein the model comprises a user-defined function(Dimitrov, para. 0029, fig. 1C(124), “The neural network subsystem 124 can be implemented using any technically feasible techniques, including, without limitation, programming instructions executed on a processing unit that perform neural network evaluation.”).
wherein the derived counter value indicates a predicted memory address, a predicted power requirement, or a predicted frequency requirement(Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the one or more operating parameters as outputs during execution of the multithreaded application…the one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.”).11
Regarding claim 33, Dimitrov teaches the method of claim 23, further comprising determining a power or frequency of the processor based on the derived counter value(Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the one or more operating parameters as outputs during execution of the multithreaded application…the one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.” Note: It is being interpreted that the core clock frequency represents the frequency of the processor and the core operating voltage represents manage power).12

inputting a hardware performance counter value to a counter engine(Dimitrov, para. 0017, “The multiprocessing unit includes performance monitoring counters (PMs), comprising logic circuits configured to measure different performance-related values in real-time. In one embodiment, PMs may be configured to monitor at least one of a memory request counter, a memory system bandwidth utilization, a memory system storage capacity utilization, a cache hit rate, a count of instructions executed per clock cycle for one or more threads of a multithreaded program, and a count of instructions executed for one or more threads of the multithreaded program.”); determining the derived counter value by applying a model to the hardware performance counter value using the counter engine(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.” ); and communicating the derived counter value to a hardware control circuit(Dimitrov, para. 0019, “The neural network subsystem generates operating parameters that are transmitted back to the multiprocessing unit. As the application progresses and PM values change during the course of application execution, the neural network responds by updating the operating parameters to tune the ongoing operation of the multiprocessing unit.”).
Regarding claim 35, Dimitrov teaches the instructions of claim 34, wherein the hardware control circuit comprises an operating system scheduler, a memory controller, a power manager, a data prefetcher, or a cache controller(Dimitrov, para. 0025, “[T]he one or Note: It is being interpreted that the memory interface clock frequency represent a data prefetcher).13
Regarding claim 36, Dimitrov teaches the instructions of claim 34, further comprising instructions for dynamically changing the model during operation of the processor (Dimitrov, para. 0020, “Furthermore, different portions of a given application can have different model parameters [for a neural network]. Model parameters can be loaded into the neural network subsystem prior to launching the application, and the model parameters can be updated as the application executes.”).
Regarding claim 37, Dimitrov teaches the instructions of claim 34, wherein the model comprises or is generated by an artificial neural network (ANN) (Dimitrov, para. 0030, fig. 1C(124), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment.”).
Regarding claim 38, Dimitrov teaches the instructions of claim 37, wherein the ANN comprises at least one of a convolutional neural network (CNN), a recurrent neural network (RNN), a fully connected neural network, or a combination of a CNN, RNN, and/or fully connected neural network(Dimitrov, para. 0030, fig. 1C(124, 150, 152, 154, 156), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment… [e]ach node of the first rank of neural net nodes 150 may receive each available input or a subset thereof. As shown, the first rank of A neural net nodes 150 is fully connected to a second rank of B neural net nodes 152. A third rank of C neural net nodes Note: It is being interpreted that the exemplary neural network of 124 of fig. 1C represents a fully connected neural network).14
Regarding claim 39, Dimitrov teaches the instructions of claim 34, wherein the model comprises a user- defined function(Dimitrov, para. 0029, fig. 1C(124), “The neural network subsystem 124 can be implemented using any technically feasible techniques, including, without limitation, programming instructions executed on a processing unit that perform neural network evaluation.”).
Regarding claim 42, Dimitrov teaches the instructions of claim 34, wherein the derived counter value indicates a predicted memory address, a predicted power requirement, or a predicted frequency requirement(Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the one or more operating parameters as outputs during execution of the multithreaded application…the one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.”).15
Regarding claim 44, Dimitrov teaches the instructions of claim 34, further comprising instructions for determining a power or frequency of the processor based on the derived counter value(Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update [i.e., predict] the one or more operating parameters as outputs during execution of the multithreaded Note: It is being interpreted that the core clock frequency represents the frequency of the processor and the core operating voltage represents manage power).16
Regarding claim 45, Dimitrov teaches a system comprising: a processor (Dimitrov, para. 0018, “Tuning the operating parameters in response to varying [performance monitoring counters] PM values can improve throughput and/or power efficiency of the multiprocessing unit. A given multiprocessing unit can include many thousands of [performance monitoring counters] PMs and multiple different operating parameters that can be changed to tune the operation of the multiprocessing unit.”); and a counter engine which comprises: input circuitry configured to input a hardware performance counter value from the processor (Dimitrov, para. 0017, “The multiprocessing unit includes performance monitoring counters (PMs), comprising logic circuits configured to measure different performance-related values in real-time. In one embodiment, PMs may be configured to monitor at least one of a memory request counter, a memory system bandwidth utilization, a memory system storage capacity utilization, a cache hit rate, a count of instructions executed per clock cycle for one or more threads of a multithreaded program, and a count of instructions executed for one or more threads of the multithreaded program.”); counter engine circuitry configured to determine a derived counter value based on applying a model to the hardware performance counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including ); and output circuitry configured to communicate the derived counter value to a hardware control circuit of the processor(Dimitrov, para. 0019, “The neural network subsystem generates operating parameters that are transmitted back to the multiprocessing unit. As the application progresses and PM values change during the course of application execution, the neural network responds by updating the operating parameters to tune the ongoing operation of the multiprocessing unit.”).
Regarding claim 46, Dimitrov teaches the system of claim 45, wherein the hardware control circuit comprises an operating system scheduler, a memory controller, a power manager, a data prefetcher, or a cache controller(Dimitrov, para. 0025, “[T]he one or more operating parameters include at least one of a maximum number of concurrently executing threads, a maximum number of active processing cores, a tile caching enable/disable flag, a core clock frequency, a memory interface clock frequency, and a core operating voltage.” Note: It is being interpreted that the memory interface clock frequency represent a data prefetcher).17
Regarding claim 47, Dimitrov teaches the system of claim 45, wherein the model comprises or is generated by an artificial neural network (ANN) (Dimitrov, para. 0030, fig. 1C(124), “FIG. 1C illustrates an exemplary neural network 124, configured to implement one or more aspects of one embodiment.”).
Regarding claim 49, Dimitrov teaches the system of claim 45, wherein the derived counter value indicates a predicted memory address, a predicted power requirement, or a predicted frequency requirement(Dimitrov, para. 0025, “[T]he control unit includes a machine learning model configured to receive the performance monitor values as inputs and to update ).18.
Regarding claim 50, Dimitrov teaches the system of claim 45, wherein the counter engine is disposed on the processor (Dimitrov, paras. 0026-0027, fig. 1B(110,112, 120, 122, 114), “As shown, the processing system 110 includes a multiprocessing unit 112 and a control unit 120… [in] one embodiment, the multiprocessing unit 112 and the control unit 120 are fabricated within a common integrated circuit die, such as a GPU die… [t]he control unit 120 implements a machine learning model 122, configured to receive the monitor values 114 as inputs.”).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 7, 18, 29, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over by Dimitrov et al. US 2019/0213775 Al(“Dimitrov”) in view of Gene, et al. "GPGPU performance and power estimation using machine learning." 2015 IEEE 21st international symposium on high performance computer architecture (HPCA). IEEE, 2015(“Gene”).
Regarding dependent claim 7, Dimitrov teaches the processor of claim 1, wherein the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”). 
Dimitrov does not teach: indicates a predicted execution time for a portion of a program executing on the processor. 
However, Gene teaches indicates a predicted execution time for a portion of a program executing on the processor(Gene pg. 567, sec. A Overview, fig. 2, “Once the model is constructed, it can be used to predict the performance of new kernels, from outside the training set, at any target hardware configuration within the range of the training data. To make a prediction, the kernel’s performance counter values and base execution time must first be ).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify Dimitrov’s processor in view of Gene to teach: indicates a predicted execution time for a portion of a program executing on the processor. The motivation to do so would be to get an accurate estimate of the execution time of an application running on a GPGPU when the CPU reconfigures its micro architectural parameters(Gene pg. 564, sec. I Introduction, “Graphics processing units (GPUs) have become standard devices in systems ranging from cellular phones to supercomputers. Their designs span a wide range of configurations and capabilities…[a]dding to the complexity, modern processors reconfigure themselves at runtime in order to maximize performance under tight power constraints.  These designs will rapidly change core frequency and voltage…modify available bandwidth…and quickly power gate unused hardware to reduce static power usage…[w]ith this wide range of possible configurations, it is critical to rapidly analyze application performance and power. Early in the design process, architects must verify that their plan will meet performance and power goals on important applications.”). 
Regarding claim 18, Dimitrov teaches the prediction unit of claim 12, wherein the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”). 

However, Gene teaches indicates a predicted execution time for a portion of a program executing on the processor(Gene pg. 567, sec. A Overview, fig. 2, “Once the model is constructed, it can be used to predict the performance of new kernels, from outside the training set, at any target hardware configuration within the range of the training data. To make a prediction, the kernel’s performance counter values and base execution time must first be gathered by executing it on the base hardware configuration. These are then passed to the model, along with the desired target hardware configuration, which will output a predicted execution time at that target configuration.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify Dimitrov’s prediction unit in view of Gene to teach: indicates a predicted execution time for a portion of a program executing on the processor. The motivation to do so would be to get an accurate estimate of the execution time of an application running on a GPGPU when the CPU reconfigures its micro architectural parameters(Gene pg. 564, sec. I Introduction, “Graphics processing units (GPUs) have become standard devices in systems ranging from cellular phones to supercomputers. Their designs span a wide range of configurations and capabilities…[a]dding to the complexity, modern processors reconfigure themselves at runtime in order to maximize performance under tight power constraints.  These designs will rapidly change core frequency and voltage…modify available bandwidth…and quickly power gate unused hardware to reduce static power usage…[w]ith this wide range of possible configurations, it is critical to rapidly analyze application performance 
Regarding claim 29, Dimitrov teaches the method of claim 23, wherein the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”).
Dimitrov does not teach: indicates a predicted execution time for a portion of a program executing on the processor. 
However, Gene teaches indicates a predicted execution time for a portion of a program executing on the processor(Gene pg. 567, sec. A Overview, fig. 2, “Once the model is constructed, it can be used to predict the performance of new kernels, from outside the training set, at any target hardware configuration within the range of the training data. To make a prediction, the kernel’s performance counter values and base execution time must first be gathered by executing it on the base hardware configuration. These are then passed to the model, along with the desired target hardware configuration, which will output a predicted execution time at that target configuration.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify Dimitrov’s method in view of Gene to teach: indicates a predicted execution time for a portion of a program executing on the processor. The motivation to do so would be to get an accurate estimate of the execution time of an application running on a GPGPU when the CPU reconfigures its micro architectural parameters(Gene pg. 564, sec. I Introduction, “Graphics processing units (GPUs) have become  These designs will rapidly change core frequency and voltage…modify available bandwidth…and quickly power gate unused hardware to reduce static power usage…[w]ith this wide range of possible configurations, it is critical to rapidly analyze application performance and power. Early in the design process, architects must verify that their plan will meet performance and power goals on important applications.”). 
Regarding claim 40, Dimitrov teaches the instructions of claim 34, wherein the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”).  
Dimitrov does not teach: indicates a predicted execution time for a portion of a program executing on the processor. 
However, Gene teaches indicates a predicted execution time for a portion of a program executing on the processor(Gene pg. 567, sec. A Overview, fig. 2, “Once the model is constructed, it can be used to predict the performance of new kernels, from outside the training set, at any target hardware configuration within the range of the training data. To make a prediction, the kernel’s performance counter values and base execution time must first be gathered by executing it on the base hardware configuration. These are then passed to the model, along with the desired target hardware configuration, which will output a predicted execution time at that target configuration.”).

effective filing date of the claimed invention to modify Dimitrov’s instructions in view of Gene to teach: indicates a predicted execution time for a portion of a program executing on the processor. The motivation to do so would be to get an accurate estimate of the execution time of an application running on a GPGPU when the CPU reconfigures its micro architectural parameters(Gene pg. 564, sec. I Introduction, “Graphics processing units (GPUs) have become standard devices in systems ranging from cellular phones to supercomputers. Their designs span a wide range of configurations and capabilities…[a]dding to the complexity, modern processors reconfigure themselves at runtime in order to maximize performance under tight power constraints.  These designs will rapidly change core frequency and voltage…modify available bandwidth…and quickly power gate unused hardware to reduce static power usage…[w]ith this wide range of possible configurations, it is critical to rapidly analyze application performance and power. Early in the design process, architects must verify that their plan will meet performance and power goals on important applications.”). 
Regarding claim 48, Dimitrov teaches the system of claim 45, wherein the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”). 
Dimitrov does not teach: indicates a predicted execution time for a portion of a program executing on the processor. 
However, Gene teaches indicates a predicted execution time for a portion of a program executing on the processor(Gene pg. 567, sec. A Overview, fig. 2, “Once the model ).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify Dimitrov’s system in view of Gene to teach: indicates a predicted execution time for a portion of a program executing on the processor. The motivation to do so would be to get an accurate estimate of the execution time of an application running on a GPGPU when the CPU reconfigures its micro architectural parameters(Gene pg. 564, sec. I Introduction, “Graphics processing units (GPUs) have become standard devices in systems ranging from cellular phones to supercomputers. Their designs span a wide range of configurations and capabilities…[a]dding to the complexity, modern processors reconfigure themselves at runtime in order to maximize performance under tight power constraints.  These designs will rapidly change core frequency and voltage…modify available bandwidth…and quickly power gate unused hardware to reduce static power usage…[w]ith this wide range of possible configurations, it is critical to rapidly analyze application performance and power. Early in the design process, architects must verify that their plan will meet performance and power goals on important applications.”). 

Claims 8, 19, 31 and 41 are rejected under 35 U.S.C. 103 as being unpatentable over by Dimitrov et al. US 2019/0213775 Al(“Dimitrov”) in view of Zheng, et al. "Integrating profile-ACM Transactions on Architecture and Code Optimization (TACO) 11.1 (2014)(“Zheng”).
Regarding dependent claim 8, Dimitrov teaches the processor of claim 1, further comprising based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”). 
 Dimitrov does not teach: circuitry configured to determine whether to execute a portion of a program serially or in parallel.
However, Zheng teaches: circuitry configured to determine whether to execute a portion of a program serially or in parallel(Zheng pg. 2, sec. Overview, fig. 4, fig. 5, “Our approach integrates profile-driven parallelism detection and machine-learning-based mapping into a single framework. We use profiling data to extract actual control and data dependence and enhance the corresponding static analysis with dynamic information. Subsequently, we apply an offline trained machine learning-based prediction mechanism to each parallel loop candidate and decide if and how the parallel mapping should be performed.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify Dimitrov’s processor in view of Zheng to teach: circuitry configured to determine whether to execute a portion of a program serially or in parallel. The motivation to do so would be automate the difficult task of parallelizing sequential code rather than having expensive expert programmers do so(Zheng, pgs. 1-2, “Multicore computing systems are widely seen as the most viable means of delivering performance with increasing transistor densities…[h]owever, this potential cannot be realized unless the  Unfortunately, efficient parallelization of a sequential program is a challenging and error-prone task. It is widely acknowledged that manual parallelization by expert programmers results in the most efficient parallel implementation but is a costly and time-consuming approach. Parallelizing compiler technology, on the other hand, has
the potential to greatly reduce this cost.”). 
Regarding claim 19, Dimitrov teaches the prediction unit of claim 12, further comprising based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”). 
 Dimitrov does not teach: circuitry configured to determine whether to execute a portion of a program serially or in parallel.
However, Zheng teaches: circuitry configured to determine whether to execute a portion of a program serially or in parallel(Zheng pg. 2, sec. Overview, fig. 4, fig. 5, “Our approach integrates profile-driven parallelism detection and machine-learning-based mapping into a single framework. We use profiling data to extract actual control and data dependence and enhance the corresponding static analysis with dynamic information. Subsequently, we apply an offline trained machine learning-based prediction mechanism to each parallel loop candidate and decide if and how the parallel mapping should be performed.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify Dimitrov’s prediction unit in view of Zheng to teach: circuitry configured to determine whether to execute a portion of a program serially or in parallel. The motivation to do so would be automate the difficult task of  Unfortunately, efficient parallelization of a sequential program is a challenging and error-prone task. It is widely acknowledged that manual parallelization by expert programmers results in the most efficient parallel implementation but is a costly and time-consuming approach. Parallelizing compiler technology, on the other hand, has
the potential to greatly reduce this cost.”). 
Regarding claim 30, Dimitrov teaches the method of claim 23, further comprising based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”). 
 Dimitrov does not teach: determining whether to execute a portion of a program serially or in parallel.
However, Zheng teaches: determining whether to execute a portion of a program serially or in parallel (Zheng pg. 2, sec. Overview, fig. 4, fig. 5, “Our approach integrates profile-driven parallelism detection and machine-learning-based mapping into a single framework. We use profiling data to extract actual control and data dependence and enhance the corresponding static analysis with dynamic information. Subsequently, we apply an offline trained machine learning-based prediction mechanism to each parallel loop candidate and decide if and how the parallel mapping should be performed.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the
 Unfortunately, efficient parallelization of a sequential program is a challenging and error-prone task. It is widely acknowledged that manual parallelization by expert programmers results in the most efficient parallel implementation but is a costly and time-consuming approach. Parallelizing compiler technology, on the other hand, has
the potential to greatly reduce this cost.”). 
Regarding claim 41, Dimitrov teaches the instructions of claim 34, further comprising based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”). 
 Dimitrov does not teach: instructions for determining whether to execute a portion of a program serially or in parallel.
However, Zheng teaches: instructions for determining whether to execute a portion of a program serially or in parallel (Zheng pg. 2, sec. Overview, fig. 4, fig. 5, “Our approach integrates profile-driven parallelism detection and machine-learning-based mapping into a single framework. We use profiling data to extract actual control and data dependence and enhance the corresponding static analysis with dynamic information. Subsequently, we apply an offline 
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify Dimitrov’s instructions in view of Zheng to teach: instructions for determining whether to execute a portion of a program serially or in parallel. The motivation to do so would be automate the difficult task of parallelizing sequential code rather than having expensive expert programmers do so(Zheng, pgs. 1-2, “Multicore computing systems are widely seen as the most viable means of delivering performance with increasing transistor densities…[h]owever, this potential cannot be realized unless the application has been well parallelized. Unfortunately, efficient parallelization of a sequential program is a challenging and error-prone task. It is widely acknowledged that manual parallelization by expert programmers results in the most efficient parallel implementation but is a costly and time-consuming approach. Parallelizing compiler technology, on the other hand, has
the potential to greatly reduce this cost.”). 

Claims 10, 21, 32 and 43 are rejected under 35 U.S.C. 103 as being unpatentable over by Dimitrov et al. US 2019/0213775 Al(“Dimitrov”) in view of Song et al. "A simplified and accurate model of power-performance efficiency on emergent GPU architectures." 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IEEE, 2013(“Song”).
Regarding claim 10, Dimitrov teaches the processor of claim 1, further comprising based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data ).  
Dimitrov does not teach: circuitry configured to determine an address for a memory access.
However, Song teaches: circuitry configured to determine an address for a memory access (Song, pg. 683, sec. Identifying Potential Performance Bottlenecks, fig. 6, fig. 15, Fig. 15 details that a’s use of global memory was optimized in c when global memory usage was reduced by coalescing memory access and using shared memory units and then in d in which shared memory bank conflicts were eliminated. Note: It is being interpreted that the memory optimization from a to c to d represents circuitry configured to determine an address for a memory access).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov’s processor in view of Gene to teach: circuitry configured to determine an address for a memory access. The motivation to do so would be to decrease performance bottlenecks through the effective usage of GPU memory and performance counters(Song, pg. 674, sec. I Introduction, fig. 2, “We believe GPU power models must be simpler, more accurate, and applicable to emergent systems...[f]urthermore, such models should lend themselves to use at runtime and provide enough insight to isolate both power and performance bottlenecks despite their simplicity. We propose an approach (see Fig. 2) that relies on GPU performance counter data to estimate energy use on a real system without the need of external power metering hardware or simulation.” ). 
Regarding claim 21, Dimitrov teaches the prediction unit of claim 12, further comprising based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem ).  
Dimitrov does not teach: circuitry configured to determine an address for a memory access.
However, Song teaches: circuitry configured to determine an address for a memory access (Song, pg. 683, sec. Identifying Potential Performance Bottlenecks, fig. 6, fig. 15, Fig. 15 details that a’s use of global memory was optimized in c when global memory usage was reduced by coalescing memory access and using shared memory units and then in d in which shared memory bank conflicts were eliminated. Note: It is being interpreted that the memory optimization from a to c to d represents circuitry configured to determine an address for a memory access).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov’s prediction unit in view of Gene to teach: circuitry configured to determine an address for a memory access. The motivation to do so would be to decrease performance bottlenecks through the effective usage of GPU memory and performance counters(Song, pg. 674, sec. I Introduction, fig. 2, “We believe GPU power models must be simpler, more accurate, and applicable to emergent systems...[f]urthermore, such models should lend themselves to use at runtime and provide enough insight to isolate both power and performance bottlenecks despite their simplicity. We propose an approach (see Fig. 2) that relies on GPU performance counter data to estimate energy use on a real system without the need of external power metering hardware or simulation.” ).  
based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”).  
Dimitrov does not teach: circuitry configured to determine an address for a memory access.
However, Song teaches: circuitry configured to determine an address for a memory access (Song, pg. 683, sec. Identifying Potential Performance Bottlenecks, fig. 6, fig. 15, Fig. 15 details that a’s use of global memory was optimized in c when global memory usage was reduced by coalescing memory access and using shared memory units and then in d in which shared memory bank conflicts were eliminated. Note: It is being interpreted that the memory optimization from a to c to d represents circuitry configured to determine an address for a memory access).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov’s method in view of Gene to teach: circuitry configured to determine an address for a memory access. The motivation to do so would be to decrease performance bottlenecks through the effective usage of GPU memory and performance counters(Song, pg. 674, sec. I Introduction, fig. 2, “We believe GPU power models must be simpler, more accurate, and applicable to emergent systems...[f]urthermore, such models should lend themselves to use at runtime and provide enough insight to isolate both power and performance bottlenecks despite their simplicity. We propose an approach (see Fig. 2) that relies 
Regarding claim 43, Dimitrov teaches the instructions of claim 34, further comprising based on the derived counter value(Dimitrov, para. 0019, “A neural network subsystem receives PM values from the multiprocessing unit and may also receive one or more forms of other state data including application state, current operating parameter state, and driver cues for the multiprocessing unit.”).  
Dimitrov does not teach: instructions for determining an address for a memory access.
However, Song teaches: instructions for determining an address for a memory access (Song, pg. 683, sec. Identifying Potential Performance Bottlenecks, fig. 6, fig. 15, Fig. 15 details that a’s use of global memory was optimized in c when global memory usage was reduced by coalescing memory access and using shared memory units and then in d in which shared memory bank conflicts were eliminated. Note: It is being interpreted that the memory optimization from a to c to d represents circuitry configured to determine an address for a memory access).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov’s instructions in view of Gene to teach: instructions for determining an address for a memory access. The motivation to do so would be to decrease performance bottlenecks through the effective usage of GPU memory and performance counters(Song, pg. 674, sec. I Introduction, fig. 2, “We believe GPU power models must be simpler, more accurate, and applicable to emergent systems...[f]urthermore, such models should lend themselves to use at runtime and provide enough insight to isolate both power and performance bottlenecks despite their simplicity. We propose an approach (see Fig. 2) that relies . 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 2018/0246762 A1(details processor optimization unit in which runtime information is collected on programs that are being executed and analyzed using Chi-Square Comparisons)
US 2017/0220942 A1(details predicating application performance on hardware accelerators using a predictive model)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ADAM CLARK STANDKE whose telephone number is (571)270-1806.  The examiner can normally be reached on 9:30AM-6PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/ADAM CLARK STANDKE/Examiner, Art Unit 2122                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2122                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        2 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        3 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        4 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        5 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        6 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        7 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        8 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        9 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        10 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        11 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        12 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        13 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        14 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        15 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        16 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        17 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        18 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.