DETAILED ACTION

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/26/2021 has been entered. 

Status of Application
Claims 1-14, 16, 17, and 19-30 are pending in the present application.
 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 08/09/2021 and 07/28/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-14, 16, 17, and 19-30 have been considered but are moot because the new ground of rejection does not rely on any 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 6, 10, 20, 23, 24, and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (hereinafter Wu), U.S. Publication No. 2014/0223166, in view of Gummaraju et al (hereinafter Gummaraju), U.S. Publication No. 2013/0160016 A1, in view of Brackman, U.S. Publication No. 2012/0272224 A1.
Referring to claim 1, Wu discloses a system comprising:
a plurality of heterogeneous processing elements [fig. 1, elements 101, 102; paragraphs 16, 18];
a hardware heterogeneous scheduler [fig. 2, element 210; paragraph 41, code distribution module 210; A module as used herein may refer to hardware] to dispatch instructions for execution on one or more of the plurality of heterogeneous processing elements [paragraph 40, “code distribution module to distribute code among two cores is illustrated”], the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements [paragraphs 12, 29, 
Wu does not explicitly disclose wherein the hardware heterogeneous scheduler is to support a code type including offload.
However, Gummaraju discloses the hardware heterogeneous scheduler is to support a code type including offload [paragraphs 34, 38, fig. 2; Unified kernel scheduler 109 operates to schedule functions and processing logic on one more types of processors available in heterogeneous computing system 100; Kernel profiler may include functionality to analyze code, such as OpenCL code, and identify code fragments that can be advantageously scheduled for execution in a CPU or alternatively on a GPU (code fragment of OpenCL code scheduled for a GPU is equivalent to “offload” code type)], in order to eliminate the burden on a programmer to determine 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of Wu to eliminate the burden on a programmer to determine what parts of an application are executed on elements of a heterogeneous system. It is for this reason one of ordinary skill in the art would have been motivated to implement the hardware heterogeneous scheduler is to support a code type including offload.
The modified Wu discloses the hardware heterogeneous scheduler is to support a code type including offload which is one of three or more of compiled, intrinsics, assembly, libraries, intermediate, and offload [Gummaraju, paragraphs 34, 38]. The modified Wu does not explicitly disclose the hardware heterogeneous scheduler is to support multiple code types including three or more of compile, intrinsics, assembly, libraries, intermediate, and offload.
However, Brackman discloses the hardware heterogeneous scheduler is to support multiple code types including assembly and libraries [fig. 1, paragraph 20; OpenCL provides a framework for writing or otherwise coding programs, such as host program 22, that are capable of being executed across heterogeneous platforms; OpenCL is for writing so-called "kernels” which represent OpenCL functions that are capable of being executed by computing devices that support OpenCL; paragraph 22, In executing host program 22 in this JIT compilation framework, control unit 14 identifies kernels 24 and forwards these kernels 24 to runtime module 26. Runtime module 26 may forward kernels 24 to compiler 28. In instances where the one of compute devices hence Brackman discloses module 26 which schedules kernels to be executed on compute devices 16 (see fig. 1). These kernels, which represent OpenCL code, can include assembly code (equivalent to claimed “assembly” code type) to be executed on a compute device 16 and user-defined code which references libraries 30 (user-defined code equivalent to claimed library code type)]. 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to reduce power consumption [Brackman, paragraph 6]. It is for this reason one of ordinary skill in the art would have been motivated to implement the hardware heterogeneous scheduler is to support multiple code types including assembly and libraries. Given that Gummaraju discloses supporting offload code type and Brackman discloses supporting assembly and libraries code types, the modified Wu discloses wherein the hardware heterogeneous scheduler is to support multiple code three or more of compiled, intrinsics, assembly, libraries, intermediate, and offload.
Referring to claim 5, the modified Wu discloses the system of claim 1, wherein the hardware heterogeneous scheduler further comprises:
a selector to select a type of processing element of the plurality of processing elements to execute the received code fragment and schedule the code fragment on a processing element of the selected type of processing elements via dispatch [Wu, paragraph 94, “dynamic core selection techniques”; fig. 2, code distribution module 210; paragraph 34, Note that concurrent or parallel execution may include execution of separate software threads on cores 101, 102 as well].
Referring to claim 6, the modified Wu discloses the system of claim 1, wherein the code fragment is one or more instructions associated with a software thread [Wu, paragraph 34, Note that concurrent or parallel execution may include execution of separate software threads on cores 101, 102 as well].
Referring to claim 10, the modified Wu discloses the system of claim 5, wherein a data parallel program phase [Wu, paragraph 34, Note that concurrent or parallel execution may include execution of separate software threads on cores 101, 102 as well] comprises data elements that are processed simultaneously using a same control flow [Wu, paragraph 34, parallel execution of separate software threads].
of a type of processing element of the plurality of heterogeneous processing elements is transparent to an operating system [Wu, paragraph 19].
Referring to claim 23, the modified Wu discloses the system of claim 1, wherein the plurality of heterogeneous processing elements is to share a memory address space [Wu, paragraphs 21, 27, 28].
Referring to claim 24, the modified Wu discloses the system of claim 1, wherein the hardware heterogeneous scheduler includes a binary translator that is to be executed on one of the heterogeneous processing elements [Wu, paragraphs 4, 13, 42, 46, 58].
Referring to claim 25, the modified Wu discloses the system of claim 5, wherein a default selection of a type of processing element of the plurality of heterogeneous processing elements is a latency optimized core [Wu, paragraph 42, “Often a section of code with a high-recurrence and predictable latency pattern may be optimized to be executed more efficiently on an in-order core”; “Essentially, in this example, cold code (low-recurrence) is distributed to native, OOO core 101, while hot code (high-recurrence) is distributed to software-managed, in-order core 102”].
Claims 2-3, 9, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claims 1 and 5 above, and further in view of Toll et al (hereinafter Toll), U.S. Publication No. 2015/0007196 A1.
Referring to claim 2, the modified Wu discloses the system of claim 1, wherein the plurality of heterogeneous processing elements comprises an in-order processor core, an out-of-order processor core [Wu, paragraph 9].
The modified Wu does not explicitly disclose and a packed data processor core.

One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide improved performance or speed and reduced power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement a packed data processor core.
Referring to claim 3, the modified Wu discloses the system of claim 2, wherein the plurality of heterogeneous processing elements further comprises an accelerator [Toll, paragraph 27, “hardware accelerators”].
Referring to claim 9, the modified Wu does not explicitly disclose the system of claim 5, wherein for a data parallel program phase the selected type of processing element is an accelerator.
However, Toll discloses wherein for a data parallel program phase the selected type of processing element is an accelerator [paragraph 45], in order to provide improved performance or speed and reduced power consumption [paragraph 37].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide improved performance or speed and reduced power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a data parallel program phase the selected type of processing element is an accelerator.
Referring to claim 11, the modified Wu does not explicitly disclose the system of claim 5, wherein for a thread parallel program phase the selected type of processing element is a scalar processing core.
However, Toll discloses wherein for a thread parallel program phase the selected type of processing element is a scalar processing core [paragraph 76, “The execution units 762 may perform various operations (e.g., shifts, addition, subtraction, multiplication) and on various types of data (e.g., scalar floating point, packed integer, packed floating point, vector integer, vector floating point). While some embodiments may include a number of execution units dedicated to specific functions or sets of functions, other embodiments may include only one execution unit or multiple execution units that all perform all functions”], in order to provide improved performance or speed and reduced power consumption [paragraph 37].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide improved performance or speed and reduced power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a thread parallel program phase the selected type of processing element is a scalar processing core.
Claims 4 and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claims 1 and 6 above, and further in view of Breternitz et al (hereinafter Breternitz), U.S. Publication No. 2014/0359633 A1.
Referring to claim 4, the modified Wu discloses the system of claim 1, wherein the plurality of heterogeneous processing elements includes a first processing element having a first microarchitecture and a second processing element having a second microarchitecture different from the first microarchitecture [Wu, paragraph 20, architecture state registers 101a, 101b of core 1; architecture state register 102a of core 2; “Or registers 102a may instead be unique to the architecture of core 102”].
The modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler further comprises:
a program phase detector to detect a program phase of the code fragment; 	
wherein the program phase is one of a plurality of program phases, including a first phase and a second phase and the dispatch of instructions is based in part on the detected program phase.
However, Breternitz discloses a program phase detector to detect a program phase of the code fragment [paragraph 26]; 	
wherein the program phase is one of a plurality of program phases, including a first phase and a second phase and the dispatch of instructions is based in part on the detected program phase [paragraph 26, In some embodiments, threads are dynamically reassigned in response to detecting a change in phase of an application. Different phases of an application may operate more efficiently in different performance states and/or on different types of processor cores or processing nodes. A CPU-bound phase of an application may run most efficiently in a high-performance state and/or on a big core 222 (or a high-performance processing node 104), while an input/output (I/O)-bound phase of the same application may run most efficiently in a low-performance 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide efficient processor operation to reduce cost of ownership. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler further comprises: a program phase detector to detect a program phase of the code fragment; wherein the program phase is one of a plurality of program phases, including a first phase and a second phase and the dispatch of instructions is based in part on the detected program phase.
Referring to claim 7, the modified Wu does not explicitly disclose the system of claim 6, wherein for a data parallel program phase the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions.
However, Breternitz discloses wherein for a data parallel program phase [paragraph 26] the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions [paragraphs 18, 20, “processor cores of different types”], in order to provide efficient processor operation to reduce cost of ownership [paragraph 2].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide efficient processor operation to reduce cost of ownership. It is for this reason one of ordinary skill in the art would have been motivated to implement .
Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claim 5 above, and further in view of Nordquist, U.S. Patent No. 7,447,873 B1.
Referring to claim 8, the modified Wu does not explicitly disclose the system claim 5, wherein for a data parallel program phase the selected type of processing element is circuitry to support dense arithmetic primitives.
However, Nordquist discloses wherein for a data parallel program phase the selected type of processing element is circuitry to support dense arithmetic primitives [claim 12, “a selected one of the processing cores of one or more SIMD groups”], in order to provide a processor that can adapt to varying load while maintain a high degree of parallelism [col. 2, lines 13-15].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide a processor that can adapt to varying load while maintain a high degree of parallelism. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a data parallel program phase the selected type of processing element is circuitry to support dense arithmetic primitives.
Claims 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claim 5 above, and further in view of Chen et al (hereinafter Chen), U.S. Publication No. 2017/0315847 A1.
Referring to claim 12, the modified Wu does not explicitly disclose the system of claim 5, wherein a thread parallel program phase comprises data dependent branches that use unique control flows.
However, Chen discloses wherein a thread parallel program phase comprises data dependent branches that use unique control flows [paragraph 40, “the AF may characterize how well the application is utilizing the GPU SIMD parallel execution model. When threads within a wavefront diverge due to a data -dependent control flow statement, the wavefront serially executes each branch path taken”], in order to provide reduced overhead [paragraph 10].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide reduced overhead. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein a thread parallel program phase comprises data dependent branches that use unique control flows.
Referring to claim 19, the modified Wu does not explicitly disclose the system of claim 5, wherein the selection of a type of processing element of the plurality of heterogeneous processing elements is transparent to a user.
However, Chen disclose wherein the selection of a type of processing element of the plurality of heterogeneous processing elements is transparent to a user [paragraph 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide reduced overhead. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the selection of a type of processing element of the plurality of heterogeneous processing elements is transparent to a user.
Claims 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, in view of Toll, as applied to claim 2 above, and further in view of Breternitz et al (hereinafter Breternitz), U.S. Publication No. 2014/0359633 A1.
Referring to claim 13, the modified Wu does not explicitly disclose the system of claim 2, wherein for a serial program phase the selected type of processing element is an out-of-order core.
However, Breternitz discloses wherein for a serial program phase [paragraph 26]
the selected type of processing element is an out-of-order core [paragraph 26, Different phases of an application may operate more efficiently in different performance states and/or on different types of processor cores], in order to provide efficient processor operation to reduce cost of ownership [paragraph 2].

Referring to claim 14, the modified Wu does not explicitly disclose the system of claim 2, wherein for a data parallel program phase the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions.
However, Breternitz discloses wherein for a data parallel program phase [paragraph 26] the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions [paragraphs 18, 20, “processor cores of different types”], in order to provide efficient processor operation to reduce cost of ownership [paragraph 2].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide efficient processor operation to reduce cost of ownership. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a data parallel program phase the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions. In .
Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claim 1 above, and further in view of Muthiah et al (hereinafter Muthiah), U.S. Publication No. 2014/0281008 A1.
Referring to claim 16, the modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler is to emulate functionality when the selected type of processing element cannot natively handle the code fragment.
However, Muthiah discloses wherein the hardware heterogeneous scheduler is to emulate functionality when the selected type of processing element cannot natively handle the code fragment [paragraph 17, “Rapid translation or emulation of the non-native binaries using QoS criteria allows the delivery of remote application streaming to client devices with latencies consistent with local execution”], in order to minimize latency while providing latencies consistent with local execution [paragraph 17].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to minimize latency while providing latencies consistent with local execution. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler is to emulate functionality .
Claim 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claim 1 above, and further in view of Bugenhagen et al (hereinafter Bugenhagen), U.S. Publication No. 2015/0019713 A1.
Referring to claim 17, the modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler is to emulate functionality when a number of hardware threads available is oversubscribed.
 However, Bugenhagen discloses wherein the hardware heterogeneous scheduler is to emulate functionality when a number of hardware threads available is oversubscribed [paragraph 88, Given that virtualization of test servers place test servers in an over-subscribed compute environment, and shifts the network and input/output ("I/O") cards' load from physical assets into an emulated central processing unit ("CPU")], in order to provide more robust and scalable network performance [paragraph 11].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified the Wu to provide more robust and scalable network performance. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler is to emulate functionality when a number of hardware threads available is oversubscribed.
Claims 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claim 1 above, and further in view of Lin et al (hereinafter Lin), U.S. Publication No. 2005/0148358 A1.
Referring to claim 21, the modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler is to present a homogeneous multiprocessor programming model to make each thread appear to a programmer as if it is executing on a scalar core.
However, Lin discloses present a homogeneous multiprocessor programming model [paragraph 40, RPU 218 may be configured according to a homogeneous multiprocessing model, to a heterogeneous multiprocessing model, or to a hybrid multiprocessing model; Conveniently, MM 212 can be adapted to be one of a homogeneous multiprocessing model, a heterogeneous multiprocessing model, and a hybrid multiprocessing model] to make each thread appear to a programmer as if it is executing on a scalar core [this is the result of presenting a homogeneous multiprocessor programming model; hence if presented, a thread appears to a programmer as if it is executing on a scalar core], in order to provide energy-efficiency and flexibility [paragraph 22].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide energy-efficiency and flexibility. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler is to present a homogeneous multiprocessor programming 
Referring to claim 22, the modified Wu discloses the system of claim 21, wherein the presented homogeneous multiprocessor programming model is to present an appearance of support for a full instruction set [Lin, paragraph 22, versatile instruction set, and possibly supporting parallelism, within MM 112].
Claims 26-30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Gummaraju, in view of Brackman, as applied to claim 1 above, and further in view of Bresniker, U.S. Publication No. 2009/0037641 A1.
Referring to claim 26, the modified Wu does not explicitly disclose the system of claim 1, wherein the heterogeneous hardware scheduler to select a protocol to use on a multi-protocol bus interface for the dispatched instructions.
However, Bresniker discloses wherein the heterogeneous hardware scheduler to select a protocol to use on a multi-protocol bus interface for the dispatched instructions [paragraph 21, fig. 2], in order to expand memory capacity without lower performance levels [paragraph 2].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the Wu to expand memory capacity without lower performance levels. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the heterogeneous hardware scheduler to select a protocol to use on a multi-protocol bus interface for the dispatched instructions.
Referring to claim 27, the modified Wu discloses the system of claim 26, wherein a first protocol supported by a multi-protocol bus interface comprises a memory interface protocol to be used to access a system memory address space [Bresniker, paragraph 16, figs. 1-2].
Referring to claim 28, the modified Wu discloses the system of claim 27, wherein a second protocol supported by the multi-protocol bus interface comprises a cache coherency protocol to maintain coherency between data stored in a local memory of the accelerator and a memory subsystem of a host processor including a host cache hierarchy and a system memory [Bresniker, paragraph 16, fig. 1].
Referring to claim 29, the modified Wu discloses the system of claim 28, wherein a third protocol supported by the multi-protocol bus interface comprises a serial link protocol supporting device discovery, register access, configuration, initialization, interrupts, direct memory access, and address translation services [Bresniker, paragraph 16, PCI-E protocol].
Referring to claim 30, the modified Wu discloses the system of claim 29, wherein the third protocol comprises the Peripheral Component Interface Express (PCIe) protocol [Bresniker, paragraph 16].

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARLEY J ABAD whose telephone number is (571)270-3425.  The examiner can normally be reached on M-Th 6:30 - 3:00 PM; Fri 7:30 - 4:00 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached on (571) 270-1023.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Farley Abad/Primary Examiner, Art Unit 2181