DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is responsive to the reply filed 10 June 2021.
Claims 1-30 are pending and have been presented for examination.

Response to Arguments
Applicant's arguments filed 10 June 2021 have been fully considered but they are not persuasive.

Applicant argues (see page 9):
Such implementation of passing on the data produced by the coprocessor 4000 to the main processor 3000 in Maeda would have the shortcomings that the present application particularly points out: “[s]aving all of the state produced by the helper core may be unnecessary and wasteful of processor and memory resources if only a limited portion of it is needed by the parent core when offload processing is complete.” Specification, paragraph [0206].
As shown, the cited Maeda paragraph teaches no instruction comprising at least one operand to identify a subset of the first execution state produced by the second core. Since neither Maeda nor Lavasani teaches or suggest such instruction as the amended claim 1 requires, and Kim and Zhao are not cited for the related features, the Applicant respectfully submits that Maeda in view of Kim, Zhao, and Lavasani fails to teach or suggest amended claim 1.

.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-30 is/are rejected under 35 U.S.C. 103 as being unpatentable over MAEDA (U.S. Patent Application Publication #2006/0010305) in view of KIM (U.S. Patent Application Publicaiton #2019/0163650), ZHAO (U.S. Patent Application Publication #2017/0083364) and LAVASANI (U.S. Patent Application Publication #2017/0024338).

1. MAEDA discloses A processor comprising: a plurality of cores (see [0078]: processor and co-processor; see KIM and ZHAO below); an interconnect coupling (see figure 3, multiple bus connections between main processor and co-processor); and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention (see [0063]: main processor requests the co-processor to perform an operation), wherein the second core is to produce a first execution state upon completing the offload work and to store results in a first memory location or register (see [0100]-[0101]: co-processor performs the requested operation and stores the result in a designated register; see [0219]: co-process can store the result in a register bank in the co-processor rather than the main processor); memory to store program code and offload state components from the plurality of cores (see LAVASANI below); the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify a subset of the first execution state produced by the second core (see [0098]: co-processor decodes the instruction; see [0093]: co-processor controls the ALU to perform the operation; see [0101]: main processor desires the output of the operation executed by the co-processor, this output is a subset of the execution state as the execution state would include all the registers in the co-processor, the program counter, etc.; see [0123]-[0124]: coprocessor can contain a pipeline and output multiple results into a FIFO, each result that is transferred from the FIFO would also be considered a subset of the execution state); and execution circuitry to execute the first instruction to save the subset of the first execution state identified by the first instruction to a specified region (see [0101]: result is stored in a designated register in the memory that is to store the offload state components (see LAVASANI below).

ZHAO discloses a single processor with multiple cores (see [0002]-[0003]: a system on a chip {SoC} would contain multiple cores, some which are specialized for specific tasks).  MAEDA discloses two processors, each one specialized for a specific task.  Putting these two processors together onto a single chip, such as a SoC is known in the art and would result in a single processor with multiple cores.  Furthermore, combining multiple elements onto a single chip is beneficial in situations where the number of chips to be mounted is limited (see KIM [0003]).
	It would have been obvious, before the effective filing date of the claimed invention, to a person having ordinary skill in the art to which said subject matter pertains to modify the system disclosed by MAEDA to combine the main processor and the coprocessor into a SoC, as disclosed by ZHAO.  The motivation for making such a modification would be to reduce the amount of chips that are to be mounted, which is beneficial in situations where space is limited, as taught by KIM.  MAEDA, KIM and ZHAO are analogous references as they are all directed to multiple processor/multi-core processing systems where the workload is shared among the processing elements.
	MAEDA fails to disclose memory to store program code and offload state components from the plurality of cores and saving the one or more components in the memory that is to store the offload state components.
	LAVASANI discloses memory to store program code and offload state components from the plurality of cores (see [0106]: global shared memory) and saving the one or more components in the memory that is to store the offload state (see [0104]: context memory stores data that is the result of processing by the accelerator {this would be comparable to the co-processor storing output in the FIFO in MAEDA}; [0109],[0111] global shared memory is made coherent, this indicates that data written to the context memory as a result of the execution would be copied to the global shared memory).  The system disclosed by LAVASANI overcomes the challenges of traditional systems with an accelerator that pass control between the CPU and the accelerator (see [0004]-[0005]).  Here, instructions and data are passed through a global shared memory, and only the data that is needed by the other processors/accelerators would be copied to the global shared memory (see [0111]: a portion of the changes are not visible and therefore need to be made coherent; [0117]: bailout table is used to minimize overhead of data transfer).
	It would have been obvious, before the effective filing date of the claimed invention, to a person having ordinary skill in the art to which said subject matter pertains to modify the system disclosed by MAEDA to include a memory that stores offload components for a plurality of processor cores, as disclosed by LAVASANI.  The motivation for making such a modification would be to overcome the challenges of offloading processing to another core where execution needs to be passed back and forth between each processor, as taught by LAVASANI.  MAEDA and LAVASANI are analogous references as both references are directed to passing execution between processing elements.

(see MAEDA [0074]: operation code field).

3. The processor of claim 2 wherein the bitfield comprises one or more bits set to identify the subset of the first execution state to be saved (see MAEDA [0074]: two bits are used to define the operation requested).

4. The processor of claim 3 wherein each of the one or more bits identifies a component of the subset of the first execution state (see MAEDA [0074]: the two bits identify the operation to be performed; see [0093]: ALU is used to perform the operation).

5. The processor of claim 2 wherein the at least one operand is to identify a register in which the bitfield is to be stored (see MAEDA [0062]: register is identified for storing the result).

6. The processor of claim 2 wherein the decoder is to decode a second instruction and the execution circuitry is to execute the second instruction to transmit an offload end message to the first core informing the first core that the work is complete (see MAEDA [0101]: when the operation ends a signal is sent to the main processor).

(see MAEDA [0101]: end signal includes the designated register where the result is stored).

8. The processor of claim 6 wherein the subset of the first execution state is stored in specific registers or sets of registers (see MAEDA [0062]: registers store the data used for the operation).

9. The processor of claim 8 wherein the specific registers or sets of registers include one or more of: control/status registers, flag registers, vector registers, scalar registers, and general purpose registers (see MAEDA [0062]: registers used to store data).

10. MAEDA discloses A method comprising: transferring work from a first core of a plurality of cores to a second core of the plurality of cores without operating system (OS) intervention (see [0063]: main processor requests the co-processor to perform an operation; see KIM and ZHAO below), wherein to transfer the work, the first core is to transmit a message to the second core (see [0072]: xexec instruction is sent to the coprocessor), executing the work on the second core to completion to reach a first execution state (see [0100]-[0101]: co-processor performs the requested operation and stores the result in a designated register); storing results of the work in a first memory location or register (see [0100]-[0101]: co-processor performs the requested operation and stores the result in a designated register; see [0219]: co-process can store the result in a register bank in the co-processor rather than the main processor); decoding and executing a first instruction on the second core (see [0098]: co-processor decodes the instruction; see [0093]: co-processor controls the ALU to perform the operation), the first instruction comprising at least one operand to identify a subset of the first execution state produced by the second core (see [0098]: co-processor decodes the instruction; see [0093]: co-processor controls the ALU to perform the operation; see [0101]: main processor desires the output of the operation executed by the co-processor, this output is a subset of the execution state as the execution state would include all the registers in the co-processor, the program counter, etc.; see [0123]-[0124]: coprocessor can contain a pipeline and output multiple results into a FIFO, each result that is transferred from the FIFO would also be considered a subset of the execution state) and to save the subset of the first execution state identified by the first instruction to a specified region in memory (see [0101]: result is stored in a designated register), that is to store offload components from the plurality of cores (see LAVASANI below).
MAEDA discloses a multi-processor system, but does not explicitly disclose a single processor with multiple cores.
ZHAO discloses a single processor with multiple cores (see [0002]-[0003]: a system on a chip {SoC} would contain multiple cores, some which are specialized for specific tasks).  MAEDA discloses two processors, each one specialized for a specific task.  Putting these two processors together onto a single chip, such as a SoC (see KIM [0003]).
	It would have been obvious, before the effective filing date of the claimed invention, to a person having ordinary skill in the art to which said subject matter pertains to modify the system disclosed by MAEDA to combine the main processor and the coprocessor into a SoC, as disclosed by ZHAO.  The motivation for making such a modification would be to reduce the amount of chips that are to be mounted, which is beneficial in situations where space is limited, as taught by KIM.  MAEDA, KIM and ZHAO are analogous references as they are all directed to multiple processor/multi-core processing systems where the workload is shared among the processing elements.
MAEDA fails to disclose memory to store program code and offload state components from the plurality of cores and saving the one or more components in the memory that is to store the offload state components.
	LAVASANI discloses memory to store program code and offload state components from the plurality of cores (see [0106]: global shared memory) and saving the one or more components in the memory that is to store the offload state components (see [0104]: context memory stores data that is the result of processing by the accelerator {this would be comparable to the co-processor storing output in the FIFO in MAEDA}; [0109],[0111] global shared memory is made coherent, this indicates that data written to the context memory as a result of the execution would be copied to the global shared memory).  The system disclosed by LAVASANI overcomes the challenges of traditional systems with an (see [0004]-[0005]).  Here, instructions and data are passed through a global shared memory, and only the data that is needed by the other processors/accelerators would be copied to the global shared memory (see [0111]: a portion of the changes are not visible and therefore need to be made coherent; [0117]: bailout table is used to minimize overhead of data transfer).
	It would have been obvious, before the effective filing date of the claimed invention, to a person having ordinary skill in the art to which said subject matter pertains to modify the system disclosed by MAEDA to include a memory that stores offload components for a plurality of processor cores, as disclosed by LAVASANI.  The motivation for making such a modification would be to overcome the challenges of offloading processing to another core where execution needs to be passed back and forth between each processor, as taught by LAVASANI.  MAEDA and LAVASANI are analogous references as both references are directed to passing execution between processing elements.

11. The method of claim 10 wherein the at least one operand is to include or identify a bitfield, the method comprising identifying the subset of the first execution state based on the bitfield (see MAEDA [0074]: operation code field).

12. The method of claim 11 wherein the bitfield comprises one or more bits set to identify the subset of the first execution state to be saved (see MAEDA [0074]: two bits are used to define the operation requested).

13. The method of claim 12 wherein each of the one or more bits identifies a component of the subset of the first execution state (see MAEDA [0074]: the two bits identify the operation to be performed; see [0093]: ALU is used to perform the operation).

14. The method of claim 11 wherein the at least one operand is to identify a register in which the bitfield is to be stored (see MAEDA [0062]: register is identified for storing the result).

15. The method of claim 11 further comprising: decoding and executing a second instruction to transmit an offload end message to the first core informing the first core that the offload work is complete (see MAEDA [0101]: when the operation ends a signal is sent to the main processor).

16. The method of claim 15 wherein the offload end message includes an indication of the first memory location or register from which to access the results (see MAEDA [0101]: end signal includes the designated register where the result is stored).

17. The method of claim 15 wherein the one or more components of the first execution state is stored in specific registers or sets of registers (see MAEDA [0062]: registers store the data used for the operation).

18. The method of claim 17 wherein the specific registers or sets of registers include one or more of: control/status registers, flag registers, vector registers, scalar registers, and general purpose registers (see MAEDA [0062]: registers used to store data).

19. MAEDA discloses A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: transferring work from a first core of a plurality of cores to a second core of the plurality of cores without operating system (OS) intervention (see [0063]: main processor requests the co-processor to perform an operation; see KIM and ZHAO below), wherein to transfer the work, the first core is to transmit a message to the second core (see [0072]: xexec instruction is sent to the coprocessor), executing the work on the second core to completion to reach a first execution state (see [0100]-[0101]: co-processor performs the requested operation and stores the result in a designated register); storing results of the work in a first memory location or register (see [0100]-[0101]: co-processor performs the requested operation and stores the result in a designated register; see [0219]: co-process can store the result in a register bank in the co-processor rather than the main processor); decoding and executing a first instruction on the second core (see [0098]: co-processor decodes the instruction; see [0093]: co-processor controls the ALU to perform the operation), the first instruction comprising at least one operand to identify a subset of the first execution state produced by the second core (see [0098]: co-processor decodes the instruction; see [0093]: co-processor controls the ALU to perform the operation; see [0101]: main processor desires the output of the operation executed by the co-processor, this output is a subset of the execution state as the execution state would include all the registers in the co-processor, the program counter, etc.; see [0123]-[0124]: coprocessor can contain a pipeline and output multiple results into a FIFO, each result that is transferred from the FIFO would also be considered a subset of the execution state) and to save the subset of the first execution state to a specified region in memory (see [0101]: result is stored in a designated register), that is to store offload components from the plurality of cores (see LAVASANI below).
MAEDA discloses a multi-processor system, but does not explicitly disclose a single processor with multiple cores.
ZHAO discloses a single processor with multiple cores (see [0002]-[0003]: a system on a chip {SoC} would contain multiple cores, some which are specialized for specific tasks).  MAEDA discloses two processors, each one specialized for a specific task.  Putting these two processors together onto a single chip, such as a SoC is known in the art and would result in a single processor with multiple cores.  Furthermore, combining multiple elements onto a single chip is beneficial in situations where the number of chips to be mounted is limited (see KIM [0003]).
	It would have been obvious, before the effective filing date of the claimed invention, to a person having ordinary skill in the art to which said subject matter pertains to modify the system disclosed by MAEDA to combine the main processor and the coprocessor into a SoC, as disclosed by ZHAO.  The motivation for making such a 
MAEDA fails to disclose memory to store program code and offload state components from the plurality of cores and saving the one or more components in the memory that is to store the offload state components.
	LAVASANI discloses memory to store program code and offload state components from the plurality of cores (see [0106]: global shared memory) and saving the one or more components in the memory that is to store the offload state components (see [0104]: context memory stores data that is the result of processing by the accelerator {this would be comparable to the co-processor storing output in the FIFO in MAEDA}; [0109],[0111] global shared memory is made coherent, this indicates that data written to the context memory as a result of the execution would be copied to the global shared memory).  The system disclosed by LAVASANI overcomes the challenges of traditional systems with an accelerator that pass control between the CPU and the accelerator (see [0004]-[0005]).  Here, instructions and data are passed through a global shared memory, and only the data that is needed by the other processors/accelerators would be copied to the global shared memory (see [0111]: a portion of the changes are not visible and therefore need to be made coherent; [0117]: bailout table is used to minimize overhead of data transfer).


20. The machine-readable medium of claim 19 wherein the at least one operand is to include or identify a bitfield, the machine-readable medium further comprising program code to cause the machine to perform the operations of: identifying the subset of the first execution state based on the bitfield (see MAEDA [0074]: operation code field).

21. The machine-readable medium of claim 20 wherein the bitfield comprises one or more bits set to identify the subset of the first execution state to be saved (see MAEDA [0074]: two bits are used to define the operation requested).

22. The machine-readable medium of claim 21 wherein each of the one or more bits identifies a component of the subset of the first execution state (see MAEDA [0074]: the two bits identify the operation to be performed; see [0093]: ALU is used to perform the operation).

23. The machine-readable medium of claim 20 wherein the at least one operand is to identify a register in which the bitfield is to be stored (see MAEDA [0062]: register is identified for storing the result).

24. The machine-readable medium of claim 20 further comprising program code to cause the machine to perform the operations of: decoding and executing a second instruction to transmit an offload end message to the first core informing the first core that the offload work is complete (see MAEDA [0101]: when the operation ends a signal is sent to the main processor).

25. The machine-readable medium of claim 24 wherein the offload end message includes an indication of the first memory location or register from which to access the results (see MAEDA [0101]: end signal includes the designated register where the result is stored).

26. The machine-readable medium of claim 24 wherein the subset of the first execution state is stored in specific registers or sets of registers (see MAEDA [0062]: registers store the data used for the operation).

(see MAEDA [0062]: registers used to store data).

28. The processor of claim 2, wherein the bitfield comprises a bitmask, setting of which indicates whether a corresponding component of the first execution state is to be saved in the specified region in the memory (see LAVASANI [0120]-[0122] bailout table comprises a bit to track variables that have been used in a write operation and that will be used later by software following continuation; [0125]: move set variables are copied to the global shared memory).

29. The processor of claim 1, wherein the subset of the first execution state identified by the at least one operand of the first instruction are ones that are to be modified during execution of the work (see MAEDA [0093]: operation indicates which data should be input to the requested operation).

30. The processor of claim 1, wherein the specified region comprises a legacy region (see LAVASANI [0127], [0147]: changes can be rolled back, this would be considered legacy data), a header region (see LAVASANI [0120]: bailout table), and an extended region (see LAVASANI [0106]: storage for shared data structures).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD J DUDEK JR whose telephone number is (571)270-1030.  The examiner can normally be reached on Monday - Friday, 8:00A-4:00P.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on 571-272-4085.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/EDWARD J DUDEK  JR/           Primary Examiner, Art Unit 2136