DETAILED ACTION

Status of Application
Claims 1-14, 16-17, 19-30 are pending in the present application.
The Double Patenting rejection has been withdrawn based on applicant’s amendment.
The 35 U.S.C. 112(b) rejection has been withdrawn based on applicant’s amendment.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/28/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
The examiner notes that the amendment to claim 1 includes subject matter from canceled claim 15. The examiner also notes that the amendment to claim 1 differs in scope than what was presented in claim 15 since claim 1 now requires support for three or more code types.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 6, 10, 20, and 23-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (hereinafter Wu), U.S. Publication No. 2014/0223166, in view of Espy et al (hereinafter Espy), U.S. Patent No. 10,042,673 B1.
Referring to claim 1, Wu discloses a system comprising:
a plurality of heterogeneous processing elements [fig. 1, elements 101, 102; paragraphs 16, 18];
a hardware heterogeneous scheduler [fig. 2, element 210; paragraph 41, code distribution module 210; A module as used herein may refer to hardware] to dispatch instructions for execution on one or more of the plurality of heterogeneous processing elements [paragraph 40, “code distribution module to distribute code among two cores is illustrated”], the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements [paragraphs 12, 29, 
Wu does not explicitly disclose wherein the hardware heterogeneous scheduler is to support multiple code types including three or more of compiled, intrinsics, assembly, libraries, intermediate, offload, and device.
However, Espy discloses the hardware heterogeneous scheduler is to support multiple code types including three or more of compiled, intrinsics, assembly, libraries, intermediate, offload, and device [col. 1, lines 48-51, “utilizable by the scheduler to select from the plurality of heterogeneous elements of the information technology infrastructure to schedule the given application workload”; col. 6, lines 20-24, “HERE 102 can work in conjunction with scheduler 106 to effectively align available heterogeneous resources or elements of IT infrastructure 108 against requests from a diverse set of applications 104 for different application workloads”; col. 10, line 58 – col. 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of Wu to provide efficient use or resources and improvements in productivity. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler is to support multiple code types including three or more of compiled, intrinsics, assembly, libraries, intermediate, offload, and device.
Referring to claim 5, the modified Wu discloses the system of claim 1, wherein the hardware heterogeneous scheduler further comprises:
a selector to select a type of processing element of the plurality of processing elements to execute the received code fragment and schedule the code fragment on a processing element of the selected type of processing elements via dispatch [Wu, paragraph 94, “dynamic core selection techniques”; fig. 2, code distribution module 210; paragraph 34, Note that concurrent or parallel execution may include execution of separate software threads on cores 101, 102 as well].
Referring to claim 6, the modified Wu discloses the system of claim 1, wherein the code fragment is one or more instructions associated with a software thread [Wu, paragraph 34, Note that concurrent or parallel execution may include execution of separate software threads on cores 101, 102 as well].
Referring to claim 10, the modified Wu discloses the system of claim 5, wherein a data parallel program phase [Wu, paragraph 34, Note that concurrent or parallel execution may include execution of separate software threads on cores 101, 102 as well] comprises data elements that are processed simultaneously using a same control flow [Wu, paragraph 34, parallel execution of separate software threads].
Referring to claim 20, the modified Wu discloses the system of claim 5, wherein the selection of a type of processing element of the plurality of heterogeneous processing elements is transparent to an operating system [Wu, paragraph 19].
Referring to claim 23, the modified Wu discloses the system of claim 1, wherein the plurality of heterogeneous processing elements is to share a memory address space [Wu, paragraphs 21, 27, 28].
Referring to claim 24, the modified Wu discloses the system of claim 1, wherein the hardware heterogeneous scheduler includes a binary translator that is to be executed on one of the heterogeneous processing elements [Wu, paragraphs 4, 13, 42, 46, 58].
Referring to claim 25, the modified Wu discloses the system of claim 5, wherein a default selection of a type of processing element of the plurality of heterogeneous processing elements is a latency optimized core [Wu, paragraph 42, “Often a section of code with a high-recurrence and predictable latency pattern may be optimized to be .
Claims 2-3, 9, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claims 1 and 5 above, and further in view of Toll et al (hereinafter Toll), U.S. Publication No. 2015/0007196 A1.
Referring to claim 2, the modified Wu discloses the system of claim 1, wherein the plurality of heterogeneous processing elements comprises an in-order processor core, an out-of-order processor core [paragraph 9].
The modified Wu does not explicitly disclose and a packed data processor core.
However, Toll discloses a packed data processor core [paragraph 30, “For example, some cores may support wide packed data instructions”], in order to provide improved performance or speed and reduced power consumption [paragraph 37].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide improved performance or speed and reduced power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement a packed data processor core.
Referring to claim 3, the modified Wu discloses the system of claim 2, wherein the plurality of heterogeneous processing elements further comprises an accelerator [Toll, paragraph 27, “hardware accelerators”].
Referring to claim 9, the modified Wu does not explicitly disclose the system of claim 5, wherein for a data parallel program phase the selected type of processing element is an accelerator.
However, Toll discloses wherein for a data parallel program phase the selected type of processing element is an accelerator [paragraph 45], in order to provide improved performance or speed and reduced power consumption [paragraph 37].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide improved performance or speed and reduced power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a data parallel program phase the selected type of processing element is an accelerator.
Referring to claim 11, the modified Wu does not explicitly disclose the system of claim 5, wherein for a thread parallel program phase the selected type of processing element is a scalar processing core.
However, Toll discloses wherein for a thread parallel program phase the selected type of processing element is a scalar processing core [paragraph 76, “The execution units 762 may perform various operations (e.g., shifts, addition, subtraction, multiplication) and on various types of data (e.g., scalar floating point, packed integer, packed floating point, vector integer, vector floating point). While some embodiments may include a number of execution units dedicated to specific functions or sets of functions, other embodiments may include only one execution unit or multiple execution 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide improved performance or speed and reduced power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a thread parallel program phase the selected type of processing element is a scalar processing core.
Claims 4 and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claims 1 and 6 above, and further in view of Breternitz et al (hereinafter Breternitz), U.S. Publication No. 2014/0359633 A1.
Referring to claim 4, the modified Wu discloses the system of claim 1, wherein the plurality of heterogeneous processing elements includes a first processing element having a first microarchitecture and a second processing element having a second microarchitecture different from the first microarchitecture [Wu, paragraph 20, architecture state registers 101a, 101b of core 1; architecture state register 102a of core 2; “Or registers 102a may instead be unique to the architecture of core 102”].
The modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler further comprises:
a program phase detector to detect a program phase of the code fragment; 	
wherein the program phase is one of a plurality of program phases, including a first phase and a second phase and the dispatch of instructions is based in part on the detected program phase.

wherein the program phase is one of a plurality of program phases, including a first phase and a second phase and the dispatch of instructions is based in part on the detected program phase [paragraph 26, In some embodiments, threads are dynamically reassigned in response to detecting a change in phase of an application. Different phases of an application may operate more efficiently in different performance states and/or on different types of processor cores or processing nodes. A CPU-bound phase of an application may run most efficiently in a high-performance state and/or on a big core 222 (or a high-performance processing node 104), while an input/output (I/O)-bound phase of the same application may run most efficiently in a low-performance state and/or on a small core 224], in order to provide efficient processor operation to reduce cost of ownership [paragraph 2].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide efficient processor operation to reduce cost of ownership. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler further comprises: a program phase detector to detect a program phase of the code fragment; wherein the program phase is one of a plurality of program phases, including a first phase and a second phase and the dispatch of instructions is based in part on the detected program phase.
Referring to claim 7, the modified Wu does not explicitly disclose the system of claim 6, wherein for a data parallel program phase the selected type of processing 
However, Breternitz discloses wherein for a data parallel program phase [paragraph 26] the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions [paragraphs 18, 20, “processor cores of different types”], in order to provide efficient processor operation to reduce cost of ownership [paragraph 2].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide efficient processor operation to reduce cost of ownership. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a data parallel program phase the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions. In addition, given the finite number of types of cores, it would have been "Obvious to try" a processing core to execute single instruction, multiple data (SIMD) instructions from a finite number of identified cores, which would provide a reasonable expectation of success.
Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claim 5 above, and further in view of Nordquist, U.S. Patent No. 7,447,873 B1.
Referring to claim 8, the modified Wu does not explicitly disclose the system claim 5, wherein for a data parallel program phase the selected type of processing element is circuitry to support dense arithmetic primitives.

One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide a processor that can adapt to varying load while maintain a high degree of parallelism. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a data parallel program phase the selected type of processing element is circuitry to support dense arithmetic primitives.
Claims 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claim 5 above, and further in view of Chen et al (hereinafter Chen), U.S. Publication No. 2017/0315847 A1.
Referring to claim 12, the modified Wu does not explicitly disclose the system of claim 5, wherein a thread parallel program phase comprises data dependent branches that use unique control flows.
However, Chen discloses wherein a thread parallel program phase comprises data dependent branches that use unique control flows [paragraph 40, “the AF may characterize how well the application is utilizing the GPU SIMD parallel execution model. When threads within a wavefront diverge due to a data -dependent control flow statement, the wavefront serially executes each branch path taken”], in order to provide reduced overhead [paragraph 10].
 Wu to provide reduced overhead. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein a thread parallel program phase comprises data dependent branches that use unique control flows.
Referring to claim 19, the modified Wu does not explicitly disclose the system of claim 5, wherein the selection of a type of processing element of the plurality of heterogeneous processing elements is transparent to a user.
However, Chen disclose wherein the selection of a type of processing element of the plurality of heterogeneous processing elements is transparent to a user [paragraph 37, “The API 404 may allow end-users to register their applications with the runtime, which may give the runtime complete control over dispatching work and transferring data between the first processing unit 110 and the second processing unit 120 without requiring any programmer intervention or even rebuild of application binaries”], in order to provide reduced overhead [paragraph 10].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide reduced overhead. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the selection of a type of processing element of the plurality of heterogeneous processing elements is transparent to a user.
Claims 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, in view of Toll, as applied to claim 2 above, and further in view of Breternitz et al (hereinafter Breternitz), U.S. Publication No. 2014/0359633 A1.
Referring to claim 13, the modified Wu does not explicitly disclose the system of claim 2, wherein for a serial program phase the selected type of processing element is an out-of-order core.
However, Breternitz discloses wherein for a serial program phase [paragraph 26]
the selected type of processing element is an out-of-order core [paragraph 26, Different phases of an application may operate more efficiently in different performance states and/or on different types of processor cores], in order to provide efficient processor operation to reduce cost of ownership [paragraph 2].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide efficient processor operation to reduce cost of ownership. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a serial program phase the selected type of processing element is an out-of-order core. In addition, given the finite number of types of cores, it would have been "Obvious to try" an out-of-order core from a finite number of identified cores, which would provide a reasonable expectation of success.
Referring to claim 14, the modified Wu does not explicitly disclose the system of claim 2, wherein for a data parallel program phase the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions.

One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide efficient processor operation to reduce cost of ownership. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein for a data parallel program phase the selected type of processing element is a processing core to execute single instruction, multiple data (SIMD) instructions. In addition, given the finite number of types of cores, it would have been "Obvious to try" a processing core to execute single instruction, multiple data (SIMD) instructions from a finite number of identified cores, which would provide a reasonable expectation of success.
Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claim 1 above, and further in view of Muthiah et al (hereinafter Muthiah), U.S. Publication No. 2014/0281008 A1.
Referring to claim 16, the modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler is to emulate functionality when the selected type of processing element cannot natively handle the code fragment.
 discloses wherein the hardware heterogeneous scheduler is to emulate functionality when the selected type of processing element cannot natively handle the code fragment [paragraph 17, “Rapid translation or emulation of the non-native binaries using QoS criteria allows the delivery of remote application streaming to client devices with latencies consistent with local execution”], in order to minimize latency while providing latencies consistent with local execution [paragraph 17].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to minimize latency while providing latencies consistent with local execution. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler is to emulate functionality when the selected type of processing element cannot natively handle the code fragment.
Claim 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claim 1 above, and further in view of Bugenhagen et al (hereinafter Bugenhagen), U.S. Publication No. 2015/0019713 A1.
Referring to claim 17, the modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler is to emulate functionality when a number of hardware threads available is oversubscribed.
 However, Bugenhagen discloses wherein the hardware heterogeneous scheduler is to emulate functionality when a number of hardware threads available is oversubscribed [paragraph 88, Given that virtualization of test servers place test servers 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified the Wu to provide more robust and scalable network performance. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler is to emulate functionality when a number of hardware threads available is oversubscribed.
Claims 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claim 1 above, and further in view of Lin et al (hereinafter Lin), U.S. Publication No. 2005/0148358 A1.
Referring to claim 21, the modified Wu does not explicitly disclose the system of claim 1, wherein the hardware heterogeneous scheduler is to present a homogeneous multiprocessor programming model to make each thread appear to a programmer as if it is executing on a scalar core.
However, Lin discloses present a homogeneous multiprocessor programming model [paragraph 40, RPU 218 may be configured according to a homogeneous multiprocessing model, to a heterogeneous multiprocessing model, or to a hybrid multiprocessing model; Conveniently, MM 212 can be adapted to be one of a homogeneous multiprocessing model, a heterogeneous multiprocessing model, and a hybrid multiprocessing model] to make each thread appear to a programmer as if it is 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the modified Wu to provide energy-efficiency and flexibility. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the hardware heterogeneous scheduler is to present a homogeneous multiprocessor programming model to make each thread appear to a programmer as if it is executing on a scalar core.
Referring to claim 22, the modified Wu discloses the system of claim 21, wherein the presented homogeneous multiprocessor programming model is to present an appearance of support for a full instruction set [Lin, paragraph 22, versatile instruction set, and possibly supporting parallelism, within MM 112].
Claims 26-30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Espy, as applied to claim  1 above, and further in view of Bresniker, U.S. Publication No. 2009/0037641 A1.
Referring to claim 26, the modified Wu does not explicitly disclose the system of claim 1, wherein the heterogeneous hardware scheduler to select a protocol to use on a multi-protocol bus interface for the dispatched instructions.
However, Bresniker discloses wherein the heterogeneous hardware scheduler to select a protocol to use on a multi-protocol bus interface for the dispatched instructions 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the system of the Wu to expand memory capacity without lower performance levels. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the heterogeneous hardware scheduler to select a protocol to use on a multi-protocol bus interface for the dispatched instructions.
Referring to claim 27, the modified Wu discloses the system of claim 26, wherein a first protocol supported by a multi-protocol bus interface comprises a memory interface protocol to be used to access a system memory address space [Bresniker, paragraph 16, figs. 1-2].
Referring to claim 28, the modified Wu discloses the system of claim 27, wherein a second protocol supported by the multi-protocol bus interface comprises a cache coherency protocol to maintain coherency between data stored in a local memory of the accelerator and a memory subsystem of a host processor including a host cache hierarchy and a system memory [Bresniker, paragraph 16, fig. 1].
Referring to claim 29, the modified Wu discloses the system of claim 28, wherein a third protocol supported by the multi-protocol bus interface comprises a serial link protocol supporting device discovery, register access, configuration, initialization, interrupts, direct memory access, and address translation services [Bresniker, paragraph 16, PCI-E protocol].
Referring to claim 30, the modified Wu discloses the system of claim 29, wherein the third protocol comprises the Peripheral Component Interface Express (PCIe) protocol [Bresniker, paragraph 16].

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARLEY J ABAD whose telephone number is (571)270-3425.  The examiner can normally be reached on M-Th 6:30 - 3:00 PM; Fri 7:30 - 4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached on (571) 270-1023.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Farley Abad/Primary Examiner, Art Unit 2181