DETAILED ACTION
This office action is responsive to request for continued examination filed on October 20, 2022 in this application Meixner et al., U.S. Patent Application No. 16/529,633 (Filed August 1, 2019) claiming priority to Meixner et al., U.S. Patent Application No. 15/628,480 (Filed June 20, 2017) claiming priority to U.S. Patent Application No. 15/389,113 (Filed December 28, 2016) claiming priority to U.S. Provisional Patent Application No. 62/300,684 (Filed February 26, 2016) (“Meixner”).  Claims 21 – 27, 29 – 35, 37 – 39, 41 - 43, and 45 – 48 are pending.  Claims 36 and 40 are cancelled.  Claim 21, 29, 37, and 45 are amended.  Claims 21 – 27, 29 – 35, 37 – 39, 41 - 43, and 45 – 48 are pending.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission of on October 20, 2022 has been entered.

Response to Arguments
	With respect to Applicant’s argument on pgs. 10 – 13 of the Applicant’s Remarks (“Remarks”) stating that prior art reference Wu fail to teach converting line buffers between kernels such that some kernels exchange data via on-chip memory while others use external memory, based on the size of the data exchanged in comparison to the size of the on-chip memory, examiner respectfully neither agrees nor disagrees.  See infra § Claim Rejections - 35 USC §103, § Claim 21.  However, this argument is moot in part in light of newly added prior art reference prior art reference Mattson which teaches that converting kernels to exchange data using internal or external line buffers may include where when the kernels specify optionally using on on-chip memory the compiler may determine whether sufficient on-chip space exists and, if not, may add and remove instructions in the kernels to convert the kernel code to use off-chip memory or cache instead.  Mattson at ¶¶ 0050, 0048 & 0005, 0128.  In addition, prior art reference Taylor teaches comparing the size of data to the size of the on-chip memory line buffer and when the size of the line buffers are too small to fit the data converting some kernels to process using a local buffer and others to store the processed data into a destination memory 250.  Id. at figs. 2A & 2C and ¶¶ 0040 – 0042; Id. at figs. 1C & 2A and ¶¶ 0039 & 0040 (line buffers are on-chip memory).
Therefore Mattson and Taylor teach converting line buffers between kernels such that some kernels exchange data via on-chip memory while others use external memory, based on the size of the data exchanged in comparison to the size of the on-chip memory.

Claim Rejections 35 U.S.C. §103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 21 – 27, 29 – 43, 45, and 46 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al., United States Patent Application Publication No. 2013/0091507 (Published April 11, 2013, filed October 6, 2012) (“Wu”), in view of Taylor et al., United States Patent Application Publication No. 2016/0210720 (Published July 21, 2016, filed March 20, 2015) (“Taylor”) in view of Mattson et al., United States Patent Application Publication No. 2008/0301395 (Published December 4, 2008, filed August 15, 2008) (“Mattson”).

Claims 21, 29, and 37
With respect to claims 21, 29, and 37 Wu teaches the invention as claimed including a computer-implemented method performed by a compiler for a device comprising a plurality of processors and one or more internal line buffers… wherein each of the one or more internal line buffers is internal to the device and wherein the device is configured to read from and write to an external memory that is external to the device, the method comprising: {A compiler to performs kernel fusions/fission for kernels executing on graphical processing units where the GPUs have internal “on-chip” GPU memory for holding CUDA line buffers and also communicate to external main system memory.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0061 & fig. 7 (internal GPU line buffer “on-chip” memory and external host main memory); id. at ¶¶ 0006, 0007, 0056, 0057 (Kernel fission and distribution of CUDA lines to GPUs).}
receiving [a pipeline] to be executed by the device, the original processing pipeline comprising a plurality of kernels, wherein each kernel of the plurality of kernels is to be executed on a respective one of the plurality of processors in a particular order; {The logical processing flow between a plurality of sequential kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The plurality of sequential kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  The kernel pipeline is split into multiple kernels, one of each GPU core, where each kernel writes data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063; id. at ¶ 0056 CTA (line buffer).}
However, Wu does not explicitly teach the limitation:
[line buffers] each comprising on-chip memory, …[receiving] the instructions wherein the instructions define an original processing pipeline…and an associated graph representation, wherein each link between kernels in the graph representation represents that the kernels share data using a line buffer of the one or more internal line buffers that are internal to the device, wherein: the original processing pipeline includes instructions to feed, to a second kernel and from a first internal line buffer, data generated by a first kernel, wherein the first internal line buffer is configured to buffer data between the first kernel and the second kernel in the pipeline; and the original processing pipeline includes instructions to feed, to a third kernel and from a second internal line buffer, data generated by the second kernel, wherein the second internal line buffer is configured to buffer data between the second kernel and the third kernel in the pipeline; determining that the original processing pipeline satisfies one or more graph-splitting criteria including determining that an amount of data generated by the original processing pipeline exceeds a size of the on-chip memory of the first internal line buffer; …thereby generating program code that causes the first kernel and the second kernel to communicate using the external memory and that causes the second kernel and the third kernel to communicate using the on-chip memory of the second internal line buffer.  {Taylor does teach this limitation.  Taylor teaches that the kernel pipeline, as taught by Wu may be represented in the form of a directed acyclic graph that defines the processing pipeline by indicating each processing task, the logical connections between them, how the outputs from one task are provided as inputs to another task, and where the graph is compiled and buffers inserted between tasks for inter-task communication.  Taylor at ¶ 0003; id. at ¶¶ 0039 & 0040 (graph may be compiled to functions which process the image data); id. at ¶¶ 0040 - 0042 (spatial and task decomposition); id. at ¶ 0045 (producer-consumer).  The line buffers may be memory that is internal to a processor, such as “a second-level cache of a processor.”  Id. at figs. 1C & 2A and ¶¶ 0039 & 0040 (line buffers may be used to store the line data that nodes [kernels] process, such as a filter or resize node).  The graph may be decomposed by splitting into subgraphs when the line buffers are too small to fit the whole image, where the resulting split task graphs utilize multiple processor core local memory caches between processing step and then output their individual results to the destination memory.  Id. at figs. 2A & 2C and  ¶¶ 0040 - 0042 (the task graph [processing pipeline graph representation] may be spilt between processes [kernels] that process using a local buffer and those that store the processed data into a destination memory 250.)
Wu and Taylor are analogous art because they are from the “same field of endeavor” and are both from the same “problem-solving area.”  Specifically, they are both from the field of image processing, and both are trying to solve the problem of how to optimize image processing performance.
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the use of a method for image pipeline processing, as taught in Wu, with representing the pipeline in the form of a directed acyclic graph that defines the processing pipeline by indicating each processing task the logical connections between them, how the outputs from one task are provided as inputs to another task, and the graph compiled and buffers inserted between tasks for inter-task communication, as taught in Taylor.  Taylor teaches that a graph based implementation of the processing pipeline allows for “a useful level of abstraction” for processing optimizations on the pipeline.  Id at ¶ 0003.  Therefore, one having ordinary skill in the art would have been motivated to combine the use of a method for image pipeline for processing, as taught in Wu, with representing the pipeline in the form of a directed acyclic graph that defines the processing pipeline by indicating each processing task the logical connections between them, how the outputs from one task are provided as inputs to another task, and the graph compiled and buffers inserted between tasks for inter-task communication, as taught in Taylor, for the purpose of improving image processing performance.}
However, Wu and Taylor do not explicitly teach the limitation:
in response, adding memory access instructions to the first kernel and removing line buffer access instructions from the first kernel, wherein the memory access instructions cause the first kernel to store data in the external memory instead of in the first internal line buffer; and adding memory access instructions to the second kernel and removing line buffer access instructions from the second kernel, wherein the memory access instructions cause the second kernel to load data from the external memory instead of from the first internal line buffer.  {Mattson does teach this limitation.  Mattson teaches that converting kernels to communicate data using internal or external line buffers based on on-chip memory size, as taught by Wu and Taylor may include where when the kernels specify optionally using on on-chip memory the compiler may determine whether sufficient on-chip space exists and if not may add and remove instructions in the kernels by converting the kernel code to issue requests to access the off-chip memory or cache.  Mattson at ¶¶ 0050, 0048 & 0005, 0128.
Wu, Taylor, and Mattson are analogous art because they are from the “same field of endeavor” and are both from the same “problem-solving area.”  Specifically, they are both from the field of parallel processing, and both are trying to solve the problem of how to optimize processing performance.
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the use of a method for converting kernels to communicate data using internal or external line buffers based on on-chip memory size, as taught by Wu and Taylor, with converting kernel code to use on-chip or external memory based on on-chip memory size, as taught in Mattson.  Taylor teaches that a graph based implementation of the processing pipeline allows for “a useful level of abstraction” for processing optimizations on the pipeline.  Id at ¶ 0003.  Therefore, one having ordinary skill in the art would have been motivated to combine the use of a method for converting kernels to communicate data using internal or external line buffers based on on-chip memory size, as taught by Wu and Taylor, with converting kernel code to use on-chip or external memory based on on-chip memory size, as taught in Mattson, for the purpose of improving parallel processing performance.}

Claims 22, 30, and 38
With respect to claims 22, 30, and 38, Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein the memory access instructions added to the second kernel comprise instructions to be executed by the device after the first kernel has been executed.  First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{A directed acyclic graph defines the processing pipeline by indicating each processing task the logical connections between them, how the outputs from one task are provided as inputs to another task, and the graph compiled and buffers inserted between tasks for inter-task communication.  Taylor at ¶ 0003; id. at ¶¶ 0039 & 0040 (graph may be compiled to functions which process the image data); id. at ¶¶ 0040 - 0042 (spatial and task decomposition); id. at ¶ 0045 (producer-consumer).}

Claims 23, 31, and 39
With respect to claims 23, 31, and 39 Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein determining that the original processing pipeline satisfies one or more graph-splitting criteria comprises determining that an amount of data generated by the original processing pipeline exceeds an internal memory size of the device.   First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claims 24 and 32
With respect to claims 24 and 32 Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein determining that the amount of data generated by the original processing pipeline exceeds the internal memory size comprises determining that the amount of data generated exceeds a memory size of a line buffer of the device, a memory size of a sheet generator of the device, or an internal memory size of one of the plurality of processors of the device. First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}
Substitute Specification - Clean Version
Claims 25, 33, and 41
With respect to claims 25, 33, and 41, Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein determining that the original processing pipeline satisfies one or more graph-splitting criteria comprises determining that a kernel of the plurality of kernels exceeds a predetermined measure of computational complexity.  First Named Inventor Albert Meixner Attorney Docket No.: 16113-8212002 Application No. 15/628,480Filed June 20, 2017 Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claims 26, 34, and 42
With respect to claims 26, 34, and 42, Wu, Taylor, and Mattson teach the invention as claimed, including:
assigning a same processor of the device to execute the first kernel and to subsequently execute the second kernel.  Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claims 27, 35, and 43
With respect to claims 27, 35, and 43, Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein each of the plurality of processors is interconnected to a line buffer unit, and wherein the memory access instructions added to the second kernel comprise instructions for loading data from memory external to the device into the line buffer unit or loading the data from the line buffer unit into the memory external to the device.  Page 3of6{ The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  Kernel pipeline is split into multiple kernels where a first kernel write data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063.}

Claim 45
With respect to claim 45, Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein the first line buffer is configured to buffer data between the first kernel and the second kernel by receiving line groups of data generated by the first kernel and feeding the line groups to the second kernel.  {The logical processing flow between kernels executing on graphical processing units may be manipulated by a compiler to optimize storage instructions in the GPU line buffers that are used to send line data from one kernel to another.  Wu at Abstract; id. at fig. 1A & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶¶ 0056 (CUDA line buffer).}

Claim 46
With respect to claim 46, Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein the original processing pipeline includes instructions to feed, to a third kernel and from a second line buffer, data generated by the second kernel, the generating including: adding memory access instructions to the second kernel and removing line buffer access instructions for the second kernel wherein the memory access instructions cause the second kernel to store data in the external memory instead of in the second line buffer; and adding memory access instructions to the third kernel and removing line buffer access instructions from the third kernel, wherein the memory access instructions cause the third kernel to load data from the external memory instead of from the second line buffer.  {The logical processing flow between a plurality of sequential kernels executing on graphical processing units may be manipulated by a compiler to perform fusion and fission of the kernels by splitting the kernels based on processing timing, GPU processor availability, and GPU memory size being less than kernel size.  Wu at Abstract; id. at fig. 1 & ¶¶ 0006- 0008 (GPU execution, kernel fusions and fission/splitting); id. at ¶¶ 0015 & 0023 (GPU memory size and fission/fusion benefits); id. at ¶ 0063 & claim 4 (kernel count vs processor count and estimated execution time); id. at ¶ 0034 (kernels are assigned to processors).  The plurality of sequential kernels in the pipeline may need to be split to address the fact that the GPU memory is insufficient to hold all of the data which will be processed by the kernels and thus the data needs to be written to an external memory such as a disk.  Id. at ¶ 0005 – 0007.  The kernel pipeline is split into multiple kernels, one of each GPU core, where each kernel writes data to a disk based buffer and a subsequent kernel loads from that buffer.  Id. at ¶¶ 0015, 0017, 0063; id. at ¶ 0056 (line buffer).}

Claim 47
With respect to claim 47, Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein the compiler is configured to translate virtual code to object code for execution by the plurality of processors.  {A virtual intermediate form of database code is optimized by a compiler and output as code that is capable of being executed by the various computer hardware processors (object code).  Wu at ¶¶ 0010 – 0013.}

Claim 48
With respect to claim 48 Wu, Taylor, and Mattson teach the invention as claimed, including:
wherein the compiler generates the multiple processing pipelines before the device processes any of the data.  {Optimized kernels are generated and then code is dispatched for execution of the data.  Wu at fig. 1B.}
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE E HEBERT whose telephone number is (571)270-1409.  The examiner can normally be reached on Monday to Friday 9:00 a.m. to 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
//T.H./										November 19, 2022
Examiner, Art Unit 2199

/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199