DETAILED ACTION
The present application is being examined under the pre-AIA  first to invent provisions. 
Claims 1-20 are presented for examination.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claim 1-19 of U.S. Patent No. 10,558,490 (hereinafter ‘490).  Although the conflicting claims are not identical, they are not patentably distinct from each other because Claims 1-20 of the instant application define an obvious variation of the invention claimed in ‘490.
It is noted that the instant application is a later-filed continuation of ‘490. Claims 1-19 of ‘490 contain every element of Claims 1-20 of the instant application and thus anticipate the claims of the instant application. Claims of the instant application therefore are not patentably distinct from the earlier patent claims and as such are unpatentable for obviousness-type double patenting. A later application claim is not patentably distinct from an earlier claim if the later claim is anticipated by the earlier 
Claim 1 and 9 of the instant application is shown in the table below with Claims 1 and 9 of ‘490 with the differences boldfaced for the Applicant's convenience.

Claim 1 of Instant
Claim 1 of US Pat 10,558,490
An apparatus, comprising: 
An apparatus, comprising:
a) a CPU; b) an accelerator; c) a controller and
a CPU; an accelerator; a controller and
a plurality of order buffers coupled between said CPU and said accelerator, each of said order buffers dedicated to a different one of said CPU's threads, each one of said order buffers to hold one or more requests issued to said accelerator from its corresponding thread, said controller to control issuance of said order buffers' respective requests to said accelerator.
a plurality of order buffers coupled between said CPU and said accelerator, each of said order buffers dedicated to a different one of said CPU's threads, each one of said order buffers to hold one or more requests issued to said accelerator from its corresponding thread and status information of said request, said controller to control issuance of said order buffers' respective requests to said accelerator based on the status information,

wherein the status information is one of new, executing, done, page fault, and invalid.



Claim 9 of Instant
Claim 9 of US Pat 10,558,490
A method, comprising: executing first and second threads on a CPU in a core of a multiple core semiconductor chip;
A method, comprising: executing first and second threads in a core of a multiple core semiconductor chip;
issuing a first acceleration request from said first thread to a first order buffer that is dedicated to said first thread;
issuing a first acceleration request from said first thread to a first order buffer that is dedicated to said first thread, wherein the first order buffer holds two or more requests;

identifying said first request's status as a new request in said first order buffer upon said first request being received by said first order buffer, and, adjusting a tail pointer to point to said first request's entry in said order buffer, wherein the status information is one of new, executing, done, page fault, and invalid;
issuing a second acceleration request from said second thread to a second order buffer that is dedicated to said second thread;
issuing a second acceleration request from said second thread to a second order buffer that is dedicated to said second thread, wherein the second order buffer holds two or more requests;
issuing said first acceleration request from said first order buffer to an accelerator, said accelerator processing said first request utilizing a first virtual to physical address translation scheme utilized by said first thread; and,
issuing said first acceleration request from said first order buffer to an accelerator, said accelerator processing said first request utilizing a first virtual to physical address translation scheme utilized by said first thread; and,
issuing said second acceleration request from said second order buffer to said accelerator, said accelerator processing said second request utilizing a second virtual to physical address translation scheme utilized by said second thread.
issuing said second acceleration request from said second order buffer to said accelerator, said accelerator processing said second request utilizing a second virtual to physical address translation scheme utilized by said second thread.




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:



Claims 1, 5, and 7-8 are rejected under 35 U.S.C. 103(a) as being unpatentable over Chang et al. (US PG Pub No. US 2012/0030421 A1) in view of Mejdrich et al. (US Pat No. 8,423,749), further in view of Gaster et al. (US PG Pub No. US 2011/0022817 A1).
Chang, Mejdrich, and Gaster were disclosed in IDS dated 04/20/2020.

Regarding claim 1, Chang teaches an apparatus, comprising: 
a) a CPU ([0002]);
b) an accelerator ([0003]);
c) a controller (Fig 6, Control Module 605) and an order buffer coupled between said CPU and said accelerator ([0009]),each one of said order buffers to hold one or more requests issued to said accelerator from its CPU ([0036]), said controller to control issuance of said order buffers' respective requests to said accelerator (Fig 6, Control Module 605).
Chang does not explicitly teach that each of said order buffers dedicated to a different one of said CPU's threads.
Mejdrich teaches allocating a output buffer for each executing thread (Fig 2; col 3 lines 12-15, wherein each hardware thread has a dedicated output buffer; col 19 lines 25-31). Therefore, Mejdrich provides for the duplication of components as opposed to sharing of a single component by each of the execution units. It would have been obvious to one of ordinary skill at the time the invention was made to modify Chang to include that each order buffer is dedicated to a different one of 
The combination of Chang and Mejdrich do not teach that each core has a CPU and an accelerator.
Gaster teaches the use of hybrid cores on a single die comprising both CPU and GPU (accelerator) cores ([0005]). It would have been obvious to one of ordinary skill in the art that each core has a CPU and an accelerator. One would be motivated by the desire to realize substantial processing capabilities that are available with hybrid cores as taught by Gaster ([0005])

Regarding claim 5, Chang teaches that status information of said request is stored along with said input data ([0004]; [0009]).

Regarding claim 7, Gaster teaches that said accelerator has multiple functional units so as to make said accelerator capable of executing multiple tasks simultaneously ([0027]). 

Regarding claim 8, Gaster teaches that said accelerator can execute different instances of the same task simultaneously ([0027]). 


Claims 2-4 are rejected under 35 U.S.C. 103(a) as being unpatentable over Chang et al. (US PG Pub No. US 2012/0030421 A1) in view of Mejdrich et al. (US Pat .
Yamada was disclosed in IDS dated 04/20/2020.

Regarding claim 2, Chang, Mejdrich, and Gaster do not teach that one of said requests is composed of a pointer identifying a memory address where said request's associated input data for said accelerator can be found. 
Yamada teaches that requests can typically comprise a pointer to an address where input data can be found ([0049-50]). It would have been obvious to one of ordinary skill at the time the invention was made to include that the request is composed of a pointer identifying a memory address where said associated input data can be found. One would be motivated by the desire to enable the correct addressing of input data. 

Regarding claim 3, Yamada teaches that said request is also composed of an indicator of how large said input data is ([0049-50]). 

Regarding claim 4, the combination of Chang, Mejdrich, Gaster, and Yamada does not explicitly teach that said input data's size is specified as a number of cache lines. 
However, Official Notice is made that it is old and well known to the skilled artisan to express a measurement of input data as a number of cache lines. 


Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Chang et al. (US PG Pub No. US 2012/0030421 A1) in view of Mejdrich et al. (US Pat No. 8,423,749), in view of Gaster et al. (US PG Pub No. US 2011/0022817 A1), further in view of Biles et al. (U.S. 2009/0216958).
Biles was disclosed in IDS dated 04/20/2020.

Regarding claim 6, Chang, Mejdrich, and Gaster do not teach that said accelerator uses same virtual-to-physical address translations as a thread on said CPU that has requested said accelerator to perform a task. 
However, Biles discloses the limitation wherein said accelerator uses same virtual to physical address translations as CPU core of said processor that is tasked with said executing of said thread ([0033], this limitation is disclosed such that both a processor and hardware accelerator share memory management and virtual-to-physical address translation). It would have been obvious to one of ordinary skill in the art at the time the invention was made to modify Chang, Mejdrich, and Gaster by using shared memory space as taught by Biles because it would enhance the teaching of Chang, Mejdrich, and Gaster, with an efficient, low-overhead interface for operating a hardware accelerator in conjunction with a processor (as suggested by Biles, see for example paragraph [0014]).


Claim 9 is rejected under 35 U.S.C. 103(a) as being unpatentable over Ekanadham et al. (US PG Pub No. US 2012/0239904 A1) in view of Mejdrich et al. (US Pat No. 8,423,749) in view of Biles et al. (U.S. 2009/0216958). 
Ekanadham was disclosed in IDS dated 04/20/2020.

Regarding claim 9, Ekanadham teaches a method, comprising: 
executing first and second threads on a CPU in a core of a multiple core semiconductor chip ([0002]); 
issuing a first acceleration request from said first thread to an order buffer ([0002]); 
issuing a second acceleration request from said second thread to an order buffer ([0020]); 
issuing said first acceleration request from said first order buffer to an accelerator ([0020]); and, 
issuing said second acceleration request from said second order buffer to said accelerator ([0020]).
Ekanadham does not explicitly teach a first order buffer that is dedicated to said first thread and a second order buffer that is dedicated to said second thread.
Mejdrich teaches allocating a output buffer for each executing thread (Fig 2; col 3 lines 12-15, wherein each hardware thread has a dedicated output buffer; col 19 lines 25-31). It would have been obvious to one of ordinary skill at the time the invention was made to modify Chang to include that each order buffer is dedicated to a different one of 
Ekanadham and Mejdrich do not teach said accelerator processing said first request utilizing a first virtual to physical address translation scheme utilized by said first thread and said accelerator processing said second request utilizing a second virtual to physical address translation scheme utilized by said second thread.
However, Biles discloses the limitation wherein said accelerator uses same virtual to physical address translations as CPU core of said processor that is tasked with said executing of said thread ([0033], this limitation is disclosed such that both a processor and hardware accelerator share memory management and virtual-to-physical address translation). It would have been obvious to one of ordinary skill in the art at the time the invention was made to modify Ekanadham and Mejdrich by using shared memory space as taught by Biles because it would enhance the teaching of Ekanadham and Mejdrich, with an efficient, low-overhead interface for operating a hardware accelerator in conjunction with a processor (as suggested by Biles, see for example paragraph [0014]).


Claims 10-16 are rejected under 35 U.S.C. 103(a) as being unpatentable over Ekanadham et al. (US PG Pub No. US 2012/0239904 A1) in view of Mejdrich et al. (US Pat No. 8,423,749) in view of Biles et al. (U.S. 2009/0216958), further in view of Yamada et al. (US PG Pub No. US 2009/0189686 A1).

Regarding claim 10, Ekanadham, Mejdrich, and Biles do not teach that said first request contains a memory address pointer that identifies where input data for said first task can be found 
Yamada teaches that requests can typically comprise a pointer to an address where input data can be found ([0049-50]). It would have been obvious to one of ordinary skill at the time the invention was made to include that the request is composed of a pointer identifying a memory address where said associated input data can be found. One would be motivated by the desire to enable the correct addressing of input data. 

Regarding claim 11, Yamada teaches that said first request also contains an indication of how large said input data is ([0049-50]). 

Regarding claim 12, the combination of Ekanadham, Mejdrich, Biles, and Yamada does not explicitly teach that said indication is articulated as a number of cache lines.
However, Official Notice is made that it is old and well known to the skilled artisan to express a measurement of input data as a number of cache lines. 

Regarding claim 13, Yamada teaches identifying said first request's status as a new request in said first order buffer upon said first request being received by said first order buffer, and, adjusting a tail pointer to point to said first request's entry in said order buffer ([0049-50]).

Regarding claim 14, Yamada teaches adjusting a next pointer to point to said first request's entry in said order buffer when said first request is the earliest new entry in said first order buffer ([0049-50]).

Regarding claim 15, Yamada teaches changing said first request's status in said order buffer from new to executing when said first request is passed to said accelerator and adjusting a head pointer to point to said first request's entry in said first order buffer when said first request is an oldest uncompleted request in said first order buffer ((0049-50]).

Regarding claim 16, Ekanadham teaches changing said first request's status from executing to done upon said accelerator completing said first request's associated task and deleting said first request from said first order buffer ([0020]).


Claims 17-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Ekanadham et al. (US PG Pub No. US 2012/0239904 A1) in view of Mejdrich et al. (US Pat No. 8,423,749) in view of Biles et al. (U.S. 2009/0216958), further in view of Keefer et al. (US PG Pub No. US 2003/0033345 A1).
Keefer was disclosed in IDS dated 04/20/2020.

Regarding claim 17, Ekanadham teaches a method, comprising: 

issuing a first acceleration request from said first thread to an order buffer ([0002]); 
issuing a second acceleration request from said second thread to an order buffer ([0020]); 
issuing said first acceleration request from said first order buffer to an accelerator ([0020]); and, 
issuing said second acceleration request from said second order buffer to said accelerator ([0020]).
Ekanadham does not explicitly teach a first order buffer that is dedicated to said first thread and a second order buffer that is dedicated to said second thread.
Mejdrich teaches allocating a output buffer for each executing thread (Fig 2; col 3 lines 12-15, wherein each hardware thread has a dedicated output buffer; col 19 lines 25-31). It would have been obvious to one of ordinary skill at the time the invention was made to modify Chang to include that each order buffer is dedicated to a different one of said CPU’s thread. One would be motivated by the desire to have separate storage for each individual thread so that each thread has its own storage as taught by Mejdrich. 
Ekanadham and Mejdrich do not teach said accelerator processing said first request utilizing a first virtual to physical address translation scheme utilized by said first thread and said accelerator processing said second request utilizing a second virtual to physical address translation scheme utilized by said second thread.

 Ekanadham, Mejdrich and Biles do not teach switching said first thread from an active state to an inactive state and switching a third thread from an inactive state to an active state, including, replacing said first order buffer's content with requests from said first thread with requests from said third thread. 
However, Keefer discloses, in response to a decision to place said thread in an inactive state, storing said context information in said allocated storage space (see for example Keefer, paragraph [0033], this limitation is disclosed such that the context of an idle thread is saved). It would have been obvious to one of ordinary skill in the art at the time of the invention to include saving context of an idle thread as taught by Keefer because it would enhance the teaching of Ekanadham, Mejdrich and Biles with an effective means of allowing scheduled higher-priority threads to execute (as suggested by Keefer, see for example paragraph [0033]).

Regarding claim 18, Biles teaches that said switching said first thread and said switching said third thread includes switching virtual to physical address translations of said first thread out of said CPU, and, switching virtual to physical address translations of said third thread into said CPU ([0033]).

Regarding claim 19, Ekanadham teaches that said accelerator processes said first and second requests simultaneously ([0040]).

Regarding claim 20, Ekanadham teaches said accelerator detects a page fault in processing said second request and writes an indication of said page fault in a block of memory address space where said second request's input data is stored ([0008-9]).

Response to Arguments
Applicant's arguments filed 10/07/2021 have been fully considered but they are not persuasive. 
Regarding claim 1, Applicant argues on pages 6-7 of Remarks:
For example, the combination does not at least describe “a controller and a plurality of order buffers coupled between said CPU and said accelerator, each of said order buffers dedicated to a different one of said CPU’s threads, each one of said order buffers to hold one or more requests issued to said accelerator from its corresponding thread, said controller to control issuance of said order buffers’ respective requests to said accelerator.” 
For example, the Office Action cites Mejdrich as allegedly describing order buffers dedicated to threads. What Mejdrich describes, and shows, in FIG. 2 are “output” buffers which are not order buffers that hold any sort of request. This is clear in that they are AFTER the execution units where no ordering would be needed. As such, there would be no reason to modify Chang to add output buffers.
Examiner disagrees. Chang was cited for teaching the use of order buffers which are used to hold request before being sent to execution units (Chang [0009], “request queue”). However, Chang does not teach the use of a plurality of order buffers. Therefore, Mejdrich was cited for teaching the use of a plurality of buffers for each of the execution units (Mejdrich Fig 2). In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Furthermore, Mejdrich does not explicitly define how the output buffers are used as asserted by Applicant. Mejdrich in Fig 2 simply discloses the arrangement of components and does not provide any specificity as to how the output buffer is utilized (col 3 lines 23-28). Mejdrich only discloses that each hardware thread has a set of registers, execution unit, and output buffer. 
Therefore, Mejdrich provides for the duplication of components as opposed to sharing of a single component by each of the execution units. If a technique has been used to improve one device, and a person of ordinary skill in the art would recognize that it would improve similar devices in the same way, using the technique is obvious unless its actual application is beyond his or her skill.  One must ask whether the improvement is more than the predictable use of prior art elements according to their established functions.  KSR v. Teleflex

Regarding claims 9 and 17, Applicant argues on page 8 of Remarks:
For example, as noted above, Mejdrich deals with output buffers, not order buffers. Additionally, Biles does not describe different address translation schemes. Ekanadham is admitted as describing neither of those aspects.
Examiner disagrees. Ekanadham was cited for teaching the use of order buffers which are used to hold request before being sent to execution units (Ekanadham [0020], “command buffers”). However, Ekanadham does not teach the use of a plurality of order buffers. Therefore, Mejdrich was cited for teaching the use of a plurality of buffers for each of the execution units (Mejdrich Fig 2). Biles teaches using address translation (Biles [0033]). In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC C WAI whose telephone number is (571)270-1012.  The examiner can normally be reached on Monday - Friday 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on 571-272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ERIC C WAI/
Primary Examiner, Art Unit 2195