DETAILED ACTION
This office action is responsive to amendment filed on October 8, 2020 in this application Meixner et al., U.S. Patent Application No. 16/529,633 (Filed August 1, 2019) claiming priority to Meixner et al., U.S. Patent Application No. 15/628,480 (Filed June 20, 2017) claiming priority to U.S. Patent Application No. 15/389,113 (Filed December 28, 2016) claiming priority to U.S. Provisional Patent Application No. 62/300,684 (Filed February 26, 2016) (“Meixner”).  Claims 21 - 44 are pending.  Claim 21, 29, and 37 are amended.  Claims 21 - 44 are pending.
Applicants' arguments have been carefully and respectfully considered and found not persuasive.  Accordingly, this action has been made FINAL.

Response to Arguments
	With respect to Applicant’s argument on pg. 8 – 9 of the Applicant’s Remarks (“Remarks”) stating that the prior art fails to teach causing a “first kernel to store data to memory external to the device,” examiner respectfully disagrees.  See infra § Claim Rejections - 35 USC §103, § Claim 1.  While Wu teaches kernel fusion, it also teaches combining fusion with fission which splits some kernels and requires them to communicate data between each other via the PCIe bus and main memory or disk.  Id. at ¶¶ 0015 & 0017 (fusion to reduce data latency); id. at ¶¶ 0006, 0007, 0057, 0063 (Kernel fission when data is too large with data latency ameliorated using concurrent overlapping scheduling).  Therefore Wu teaches the amended claim limitations. 

Information Disclosure Statement
The information disclosure statements (IDS) filed on 7/23/2020 is in compliance with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609.  The references listed therein have been considered, and placed in the application file.

Double Patenting
	The rejections made under the doctrine of Double Patenting are withdrawn in light of Terminal Disclaimer approved on 10/8/2020.

Claim Rejections 35 U.S.C. §103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 21 – 44 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al., United States Patent Application Publication No. 2013/0091507 (Published April 11, 2013, filed Wu”), in view of Taylor et al., United States Patent Application Publication No. 2016/0210720 (Published July 21, 2016, filed March 20, 2015) (“Taylor”).

Claims 21, 29, and 37
With respect to claims 21, 29, and 37 Wu teaches the invention as claimed including a method performed by one or more computers, the method comprising:
receiving [a pipeline] to be executed by a device having a plurality of processors, the original processing pipeline comprising a plurality of kernels to be executed in a particular order; {The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).}
wherein the original processing pipeline includes instructions to feed, to a second kernel and from an internal line buffer, data generated by a first kernel, wherein the internal line buffer is internal to the device and is configured to buffer data between the first kernel and the second kernel in the pipeline; {Original program is comprised of various kernels that execute on a CPU using the main memory and pass data between each other using the main memory as line buffers, whereas the split program may execute on GPU cores and pass the data using external memory as line buffers.  Wu at ¶ 0061 (CPU and main memory); ¶¶ 0006, 0007, 0057 (Kernel fission and distribution to GPUs); id. at fig 1A (Kernels in original program).}
determining that the original processing pipeline satisfies one or more graph-splitting criteria; in response, generating multiple processing pipelines including: generating a first modified processing pipeline including modifying one or more store instructions of the first kernel in the original processing pipeline that cause the first kernel to store data to the internal line buffer to be one or more respective store instructions that cause the first kernel to store data to memory external to the device; and generating a second modified processing pipeline including modifying one or more load instructions of the second kernel in the original processing pipeline that cause the second kernel to load data from the internal line buffer to be load instructions that cause the second kernel to load the data from the memory external to the device. {The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel writes data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}
However, Wu does not explicitly teach the limitation:
[receiving] instructions that define an original processing pipeline {Taylor does teach this limitation.  Taylor teaches that the image pipeline for processing, as taught by Wu may be represented in the form of a directed acyclic graph that defines the processing pipeline by indicating each processing task the logical connections between them, how the outputs from one task are provided as inputs to another task, and the graph compiled and buffers inserted between tasks for inter-task communication.  Taylor at ¶ 0003; id. at ¶¶ 0039 & 0040 (graph may be compiled to functions which process the image data); id. at ¶¶ 0041 & 0042 (spatial and task decomposition); id. at ¶ 0045 (producer-consumer).
Wu and Taylor are analogous art because they are from the “same field of endeavor” and are both from the same “problem-solving area.”  Specifically, they are both from the field of image processing, and both are trying to solve the problem of how to optimize image processing performance.
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the use of a method for image pipeline processing, as taught in Wu, with representing the pipeline in the form of a directed acyclic graph that defines the processing pipeline by indicating each processing task the logical connections between them, how the outputs from one task are provided as inputs to another task, and the graph compiled and buffers inserted between tasks for inter-task communication, as taught in Taylor.  Taylor teaches that a graph based implementation of the processing pipeline allows for “a useful level of abstraction” for processing optimizations on the pipeline.  Id at ¶ 0003.  Therefore, one having ordinary skill in the art would have been motivated to combine the use of a method for image pipeline for processing, as taught in Wu, with representing the pipeline in the form of a directed acyclic graph that defines the processing pipeline by indicating each Taylor, for the purpose of improving image processing performance.}

Claims 22, 30, and 38
With respect to claims 22, 30, and 38, Wu and Taylor teach the invention as claimed, including:
wherein modifying the one or more load instructions comprises modifying the one or more load instructions to be instructions executed by the device after the first kernel has been executed.  First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{A directed acyclic graph defines the processing pipeline by indicating each processing task the logical connections between them, how the outputs from one task are provided as inputs to another task, and the graph compiled and buffers inserted between tasks for inter-task communication.  Taylor at ¶ 0003; id. at ¶¶ 0039 & 0040 (graph may be compiled to functions which process the image data); id. at ¶¶ 0041 & 0042 (spatial and task decomposition); id. at ¶ 0045 (producer-consumer).}

Claims 23, 31, and 39
With respect to claims 23, 31, and 39 Wu and Taylor teach the invention as claimed, including:
wherein determining that the original processing pipeline satisfies one or more graph-splitting criteria comprises determining that an amount of data generated by the original processing pipeline exceeds an internal memory size of the device.   First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{ The logical Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claims 24, 32, and 40
With respect to claims 24, 32, and 40 Wu and Taylor teach the invention as claimed, including:
wherein determining that the amount of data generated by the original processing pipeline exceeds the internal memory size comprises determining that the amount of data generated exceeds a memory size of a line buffer of the device, a memory size of a sheet generator of the device, or an internal memory size of one of the plurality of processors of the device. First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}
Substitute Specification - Clean Version
Claims 25, 33, and 41
With respect to claims 25, 33, and 41, Wu and Taylor teach the invention as claimed, including:
wherein determining that the original processing pipeline satisfies one or more graph-splitting criteria comprises determining that a kernel of the plurality of kernels exceeds a predetermined measure of computational complexity.  First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claims 26, 34, and 42
With respect to claims 26, 34, and 42, Wu and Taylor teach the invention as claimed, including:
assigning a same processor of the device to execute a first kernel belonging to the first modified processing pipeline and to subsequently execute a second kernel belonging to the second modified processing pipeline.  Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claims 27, 35, and 43
Wu and Taylor teach the invention as claimed, including:
wherein each of the plurality of processors is interconnected to a line buffer unit, and wherein modifying the one or more load instructions comprises modifying the one or more load instructions to be instructions for loading data from memory external to the device into the line buffer unit or loading the data from the line buffer unit into the memory external to the device.  Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claims 28, 36, and 44
With respect to claims 28, 36, and 44, Wu and Taylor teach the invention as claimed, including:
wherein each of the plurality of processors is interconnected to a line buffer unit, and wherein modifying the one or more store instructions comprises modifying the one or more store instructions to be instructions to store data representing output from the first kernel in the line buffer unit.  Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063; id. at ¶ 0056 (line buffer).}

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE E HEBERT whose telephone number is (571)270-1409.  The examiner can normally be reached on Monday to Friday 9:00 a.m. to 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 

like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


Examiner, Art Unit 2199

/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199