DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8/16/2021 has been entered.
Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner Notes
3.	The Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the Applicant(s). Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
6.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
7.	Claims 1, 2, 5, 11, 15 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over McKenzie et al (U.S. Publication 2013/0155083) (McKenzie hereinafter) (Identified by Applicant in IDS) in view of Diard (U.S. Publication 2011/0210976) (Diard hereinafter) (Identified by Applicant in IDS).
8.	As per claim 1, McKenzie teaches a data processing method comprising:
          associating a replay log with each of a plurality of physical coprocessors, which include a first physical coprocessor; storing at least a portion of the intercepted data and command stream in the replay log associated with the first physical coprocessor [“the control virtual machine may copy, save and/or store state information associated with the GPU and/or the first virtual machine. For example, the control virtual machine may save a graphics stack, the state of one or more drivers (e.g., hardware or graphics drivers), graphics calls intercepted or queued for the GPU, GPU output to an output device, and/or buffered graphics content in a frame buffer,” ¶ 0132; GPU mapped to first physical coprocessor; “Illustrated in FIG. 2A is one embodiment of a virtualization environment. Included on a computing device 201 is a hardware layer that can include one or more physical disks 204, one or more physical devices 206, one or more physical processors 208 and a physical memory 216.” ¶ 0059; “Physical devices 206 may include graphics hardware 389, such as a video card, graphics acceleration hardware, a frame buffer device, and a graphics processing unit (GPU).” ¶ 0094, fig. 3A; “the hardware layer 210 can include a processor 208. The processor 208, in some embodiments, can be any processor, while in other embodiments the processor 208 can be any processor described herein. The processor 208 can include one or more processing cores. In other embodiments the computing device 201 can include one or more processors 208. In some embodiments, the computing device 201 can include one or more different processors, e.g. a processing unit, a graphics processing unit, or a physics engine.” ¶ 0064]; and acquiring and storing execution state information for the first physical coprocessor [“The control virtual machine may redirect an application of the first virtual machine providing the image to the GPU emulation program. The control virtual machine may provide, to a second virtual machine, access to the GPU responsive to the removal. In certain embodiments, the control virtual machine may restoring the stored state information to the GPU and redirecting the first virtual machine to the GPU.” ¶ 0007];
	selecting one of the plurality of physical processors other than the first physical coprocessors as a second physical coprocessor [“The control virtual machine may select the emulation program from a plurality of emulation programs. The control virtual machine may select the emulation program based on the make, Version and/or configuration of the GPU.” ¶ 0130; the emulation program executes on a hardware device and is thus interpreted as a physical coprocessor];
          reading out from the replay log associated with the first physical coprocessor the stored portion of the intercepted data and command stream and submitting the stored portion, as well as the stored execution state information, to the second physical coprocessor for the second physical coprocessor to service the data and command stream issued by the application instead of the first physical coprocessor [“The control virtual machine may store state information of a graphics processing unit (GPU) of the computing device (Step 301). The GPU may render an image from a first virtual machine. The control virtual machine may remove, from the first virtual machine, access to the GPU (Step 303). The control virtual machine may redirect the first virtual machine to a GPU emulation program (Step 305). The GPU emulation program may render the image from the first virtual machine using at least a portion of the stored state information (Step 307).” ¶ 0122, fig, 3B; “the emulation program executes in conjunction with a processor and/or other hardware to perform graphics processing. Such a processor may, for example be a central processing unit (CPU), and may not be optimized for performing graphics operations.” ¶ 0107; the emulation program executes on a hardware device and is thus interpreted as a physical coprocessor]. 
          McKenzie does not explicitly disclose but Diard discloses intercepting a data and command stream issued by an application, which is running on at least one processor and the plurality of physical coprocessors, to an application program interface (API) of the first coprocessors for execution of the stream on the first coprocessor [“The shim layer 125 loads and configures the DDI 130 for the first GPU 210 on the primary adapter and the DDI 135 for the second GPU 475 If there is a specified affinity for executing rendering commands from the application 110 on the second GPU 475, the shim layer 125 intercepts the rendering commands sent by the runtime API 120 to the DDI on the primary adapter 130, calls the DDI on the unattached adapter to set the commands buffers for the second GPU 475, and routes them to the driver 465 for the second GPU 475.” ¶ 0034].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie and Diard available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie to include the capability of transferring graphics data to GPUs as taught by Diard, thereby providing a mechanism to improve system performance by improving the transfer of data to coprocessors or GPUs [Diard ¶ 0009].
9.	As per claim 2, McKenzie and Diard teach the method of claim 1.  Diard further teaches in which the portion of the intercepted data and command stream in the replay log associated with the first physical coprocessor corresponds to the portion of the stream from a most recent synchronization point [“If there is a specified affinity for executing rendering commands from the application 110 on the second GPU 475, the shim layer 125 intercepts the rendering commands sent by the runtime API 120 to the DDI on the primary adapter 130, calls the DDI on the unattached adapter to set the commands buffers for the second GPU 475, and routes them to the driver 465 for the second GPU 475.” ¶ 0034; suggests a synchronization of stopping point prior to sending of commands to the primary DDI].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie and Diard available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie to include the capability of transferring graphics data to GPUs as taught by Diard, thereby providing a mechanism to improve system performance by improving the transfer of data to coprocessors or GPUs [Diard ¶ 0009].
10.	As per claim 5, McKenzie and Diard teach the method of claim 1.  Diard further teaches pre-fetching part of the data and command stream and storing the pre-fetched part in the replay log associated with the first physical coprocessor [“If there is a specified affinity for executing rendering commands from the application 110 on the second GPU 475, the shim layer 125 intercepts the rendering commands sent by the runtime API 120 to the DDI on the primary adapter 130, calls the DDI on the unattached adapter to set the commands buffers for the second GPU 475, and routes them to the driver 465 for the second GPU 475.” ¶ 0034].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie and Diard available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie to include the capability of transferring graphics data to GPUs as taught by Diard, thereby providing a mechanism to improve system performance by improving the transfer of data to coprocessors or GPUs [Diard ¶ 0009].
11.	As per claim 11, McKenzie and Diard teach the method of claim 1.  McKenzie further teaches in which the first and second coprocessors are graphics processing units [“Physical devices 206 may include graphics hardware 389, such as a video card, graphics acceleration hardware, a frame buffer device, and a graphics processing unit (GPU).” ¶ 0094, fig. 3A].
12.	As per claim 15, McKenzie and Diard teach the method of claim 1.  Diard further teaches dividing the intercepted and stored portion of the data and command stream into a plurality of parts and submitting each part to a different one of the first and second physical coprocessors for simultaneous execution [“the shim layer 125 implements the DDI 135 for the second GPU 475. Accordingly, the shim layer 125 splits graphics command and redirects them to the two DDIs 130, 135,” ¶ 0034].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie and Diard available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie to include the capability of transferring graphics data to GPUs as taught by Diard, thereby providing a mechanism to improve system performance by improving the transfer of data to coprocessors or GPUs [Diard ¶ 0009].
13.	As per claim 25, McKenzie and Diard teach the method of claim 16.  Diard further teaches intercepting and storing the portion of the stream at the same time as the application communicates the stream directly to the first physical coprocessor [“If there is a specified affinity for executing rendering commands from the application 110 on the second GPU 475, the shim layer 125 intercepts the rendering commands sent by the runtime API 120 to the DDI on the primary adapter 130, calls the DDI on the unattached adapter to set the commands buffers for the second GPU 475, and routes them to the driver 465 for the second GPU 475.” ¶ 0034].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie and Diard available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie to include the capability of transferring graphics data to GPUs as taught by Diard, thereby providing a mechanism to improve system performance by improving the transfer of data to coprocessors or GPUs [Diard ¶ 0009].
14.	Claims 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over McKenzie and Diard in further view of Lutter (U.S. Patent 7,178,049) (Lutter hereinafter) (Identified by Applicant in IDS).
15.	As per claim 3, McKenzie and Diard teach the method of claim 1.  McKenzie and Diard do not explicitly disclose but Lutter discloses in which the reading out from the replay log associated with the first physical coprocessor and submitting the stored portion to the second physical coprocessor is done upon sensing a redirection condition [“using the task manager for automatically identifying another processor in the multiprocessor system for running the identified vehicle application and redirecting the vehicle application associated with the detected failure to the other identified processor in the vehicle; using the configuration manager to redirect the data and state information to the other identified processor in the vehicle after detecting the failure; and initiating the identified application in the identified other processor,” cl. 29].
         It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie, Diard and Lutter available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie and Diard to include the capability of redirecting an application to a co-processor upon failure detection as taught by Lutter, thereby providing a mechanism to improve system operability by dynamically selecting co-processors to distribute processing loads and ensuring continued operation of applications in light of processor failures.
16.	As per claim 4, McKenzie, Diard and Lutter teach the method of claim 3.  Lutter further teaches in which the redirection condition is failure of the first coprocessor [“using the task manager for automatically identifying another processor in the multiprocessor system for running the identified vehicle application and redirecting the vehicle application associated with the detected failure to the other identified processor in the vehicle; using the configuration manager to redirect the data and state information to the other identified processor in the vehicle after detecting the failure; and initiating the identified application in the identified other processor,” cl. 29].
         It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie, Diard and Lutter available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie and Diard to include the capability of redirecting an application to a co-processor upon failure detection as taught by Lutter, thereby providing a mechanism to improve system operability by dynamically selecting co-processors to distribute processing loads and ensuring continued operation of applications in light of processor failures.
17.	Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over McKenzie and Diard in further view of Zhang et al. (U.S. Publication 2013/0151747) (Zhang hereinafter) (Identified by Applicant in IDS).
18.	As per claim 7, McKenzie and Diard teach the method of claim 1.  McKenzie and Diard do not explicitly disclose but Zhang discloses in which the second physical coprocessor is selected according to a utility policy [“the idle co-processor card may be a co-processor card currently having no co-processing task; and may also be a co-processor which is selected according to a load balancing policy and has a lighter load or is relatively idle,” ¶ 0060].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie, Diard and Zhang available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie and Diard to include the capability of co-processor acceleration as taught by Zhang, thereby providing a mechanism to improve system performance by improving the memory overheads and increasing co-processing speed [Zhang ¶ 0006].
19.	As per claim 8, McKenzie, Diard and Zhang teach the method of claim 7.  Zhang further teaches in which the utility policy is a function of the relative loads of the coprocessors [“the idle co-processor card may be a co-processor card currently having no co-processing task; and may also be a co-processor which is selected according to a load balancing policy and has a lighter load or is relatively idle,” ¶ 0060].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie, Diard and Zhang available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie and Diard to include the capability of co-processor acceleration as taught by Zhang, thereby providing a mechanism to improve system performance by improving the memory overheads and increasing co-processing speed [Zhang ¶ 0006].
20.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over McKenzie, Diard and Zhang in further view of Wolf (U.S. Publication 2011/0134132) (Wolf hereinafter) (Identified by Applicant in IDS).
21.	As per claim 9, McKenzie, Diard and Zhang teach the method of claim 7.  McKenzie, Diard and Zhang do not explicitly disclose but Wolf discloses in which the utility policy is a function of the relative speed of the coprocessors [“The GPU selection may be based on a plurality of heuristics, e.g., bottleneck, utilization of GPUs, GPU speed, type of graphics application, etc. It is appreciated that the GPU selection may be based on the first available GPU and it may occur in a round robin fashion.” ¶ 0014].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie, Diard, Zhang and Wolf available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie, Diard and Zhang to include the capability of directing graphics processing to GPUs in a multi-GPU system as taught by Wolf, thereby providing a mechanism to improve system operability and maintainability by improving the memory overheads and increasing co-processing speed without the need to rewrite existing graphics applications [Wolf  ¶ 0007].
22.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over McKenzie, Diard and Zhang in further view of Bellows et al. (U.S. Publication 2011/0161943) (Bellows hereinafter) (Identified by Applicant in IDS).
23.	As per claim 10, McKenzie, Diard and Zhang teach the method of claim 7.  McKenzie, Diard and Zhang do not explicitly disclose but Bellows discloses in which the utility policy is a function of a degree of dissimilarity of the physical coprocessors relative to the first physical coprocessor [“the selectively allocating of the work elements to selected processor cores involves and/or is based on a scheduling criteria that takes into account workload allocation and work balancing across the system architecture, processing capabilities of the different types of processing units (e.g., CPU, GPU, SPU), and other factors,” ¶ 0060; criteria including processing unit type suggests consideration of dissimilarities].
           It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie, Diard, Zhang and Bellows available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie, Diard and Zhang to include the capability of dynamic distribution as taught by Bellows, thereby providing a mechanism to improve system operability and maintainability by improving the memory overheads and increasing co-processing speed without the need to rewrite existing graphics applications [Wolf  ¶ 0007].
24.	Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over McKenzie and Diard in further view of Diard (U.S. Publication 2010/0271375) (Diard’375 hereinafter) (Identified by Applicant in IDS).
25.	As per claim 21, McKenzie and Diard teach the method of claim 16.  McKenzie and Diard do not explicitly disclose but Diard’375 discloses the second physical coprocessor is selected randomly [“Prior to rendering images or writing any feedback data, the feedback array may be initialized, e.g., by randomly selecting either of the GPU identifiers for each entry or by filling alternating entries with different identifiers,” ¶ 0061].
          It would have been obvious to one of ordinary skill in the art, having the teachings of McKenzie, Diard and Diard’375 available before the effective filing date of the claimed invention, to modify the capability of intercepting commands and data sent to coprocessors as disclosed by McKenzie and Diard to include the capability of co-processor load balancing as taught by Diard’375, thereby providing a mechanism to improve system performance by dynamically selecting co-processors to distribute processing loads [Diard’375 ¶ 0009].
26.	Claims 16 – 20, 22 - 24 and 28 are variants of claims 1 – 5, 6 – 11 and 15 and are thus rejected on the same basis as described above.
27.	Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over Hendry et al (U.S. Publication 2013/0038615) (Hendry hereinafter) in view of Diard.
28.	As per claim 31, Hendry teaches a data processing system comprising:
a central processing unit (CPU); a plurality of graphics processing units (GPUs), the GPUs including a first GPU [“the disclosed technique can generally work in any computer system comprising two or more GPUs, each of which may independently drive display 114. Moreover, GPUs in the same computer system may have different operating characteristics, such as power consumption levels. For example, the computer system may switch between a general-purpose processor 102 (e.g., central processing unit (CPU)) and a special-purpose GPU (e.g., discrete GPU 110) to drive display 114.” ¶ 0034];
at least one application comprising computer-executable code executable on the CPU [“The data structures and code described in this detailed description are typically stored on a computer-read able storage medium, which may be any device or medium that can store code and/or data for use by a computer system,” ¶ 0024];
store at least a portion of the intercepted data and command stream in the replay log associated with the first GPU [“In addition, the GPU configuration state of the first GPU is saved in video memory of the first GPU (operation 506), the first GPU is placed into a low-power state (operation 508), and graphics calls to the first GPU are intercepted (operation 510). To place the first GPU into the low-power state, the first GPU and an interface with the first GPU are powered off, and power to video memory of the first GPU is maintained. The shim may then intercept graphics calls by acquiring a lock for the first graphics call to the GPU and queuing the first graphics call and subsequent graphics calls to the first GPU. (In some embodiments, the shim is inserted above the driver to reduce the amount of driver hardening that is required. In this way, the shim may acquire relevant locks to help avoid having the driver touch powered-down hardware. This makes it possible to prevent calls from reaching the driver to avoid having to harden drivers as much. Note that the drivers could alternatively be hardened to themselves to achieve the same effect.).” ¶ 0061];
select one of the GPUs other than the first GPU as a second GPU [“a switch from using the first GPU to using a second GPU to drive the display is made (operation 504),” ¶ 0060]; and
read out from the replay log associated with the first GPU the stored portion of the intercepted data and command stream and submit the stored portion, as well as the stored execution state information, to the second GPU for the second GPU to service the data and command stream issued by the application instead of the first GPU [“a switch from using the first GPU to using a second GPU to drive the display is made (operation 504). The second GPU may correspond to a low-power (e.g., embedded) GPU, while the first GPU may correspond to a high-power (e.g., discrete) GPU. To make the switch, pixel values may be copied from a first framebuffer for the first GPU to a second framebuffer for the second GPU, and a switch may be initiated from the first framebuffer to the second framebuffer as a signal source for driving the display,” ¶ 0060].
Hendry does not explicitly disclose but Diard discloses an interception layer logically located between the at least one application and the GPUs, and including a replay log associated with each of the GPUs, wherein the interception layer is executed on the CPU to: intercept a data and command stream issued by the at least one application to an application program interface (API) of the first GPU for execution of the stream on the first GPU, and acquiring and storing execution state information for the first GPU [“The shim layer 125 loads and configures the DDI 130 for the first GPU 210 on the primary adapter and the DDI 135 for the second GPU 475 If there is a specified affinity for executing rendering commands from the application 110 on the second GPU 475, the shim layer 125 intercepts the rendering commands sent by the runtime API 120 to the DDI on the primary adapter 130, calls the DDI on the unattached adapter to set the commands buffers for the second GPU 475, and routes them to the driver 465 for the second GPU 475.” ¶ 0034].
It would have been obvious to one of ordinary skill in the art, having the teachings of Hendry and Diard available before the effective filing date of the claimed invention, to modify the capability of managing GPU power states as disclosed by Hendry to include the capability of transferring graphics data to GPUs as taught by Diard, thereby providing a mechanism to improve system performance by improving the transfer of data to coprocessors or GPUs [Diard ¶ 0009].
Allowable Subject Matter
29.	Claims 26 and 27 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
30.	Applicant’s arguments have been carefully considered but are not persuasive.
31.	Applicant argues on page 8 of the remarks dated 8/16/2021 that McKenzie does not teach a “second physical coprocessor.”  However, the term is not explicitly defined in the specification nor are any limiting examples provided, thus the term is interpreted broadly.  As is noted above, the referenced GPU emulation program executes on a host processor and thus constitutes a physical coprocessor when considered as a whole. 
Conclusion
32.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM C WOOD whose telephone number is (571)272-5285. The examiner can normally be reached Monday - Friday, 8:00 am - 4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat C Do can be reached on 571-272-3721. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/WILLIAM C WOOD/
Examiner, Art Unit 2193

/Chat C Do/Supervisory Patent Examiner, Art Unit 2193