Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on October 20, 2022, in which claims 1, and 25-27 are currently amended. Claims 35-37 are newly added.  Claims 1, 6, 13-17, and 25-37 are currently pending. 

Response to Arguments
The rejections to claim 27 under 35 U.S.C. § 112(a) is hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1, 6, 13-17, and 25-37 under 35 U.S.C. 101 based on amendment have been considered and are deemed persuasive.  The rejections to claims 1, 6, 13-17, and 25-37 under 35 U.S.C. § 101 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1, 6, 13-17, and 25-37 under 35 U.S.C. 103 based on amendment have been considered.
With respect to dynamically compiled shader kernels Paltashev teaches ([¶0044] "For example, the configuration and control block 302 can dynamically configure or reconfigure the virtual graphics pipelines in response to system events or user input indicating that a new virtual graphics pipeline is to be instantiated").  It's not clear how controlling the compilation of the shader from the application is synonymous with non-dynamic compilation, the cited paragraph is also directed towards a particular non-limiting embodiment.  Paragraph [¶0226] of the published instant specification seems to be in line with the particular embodiment cited from Paltashev. Paltashev further teaches in particular embodiments using reconfigurable virtual pipelines explicitly to overcome the limitations of a monolithic pipeline.  Further examples from Paltashev including dynamically compiling and reconfiguring the graphics pipeline ([¶0044] "For example, the configuration and control block 302 can dynamically configure or reconfigure the virtual graphics pipelines in response to system events or user input indicating that a new virtual graphics pipeline is to be instantiated").  Paltashev also teaches that the shader is generated by the CPU and executed on the GPU ([¶0027] " Reconfigurable GPUs with virtualized pipelines components, such as described herein, can support numerous different processing configurations" [¶057] "The graphics processing system 400 includes multiple CPU-type processor cores 401, 402, 403, 404 (collectively referred to herein as “the CPU cores 401-404”) that generate commands and data for execution by one or more graphics pipelines. Some embodiments of the CPU cores 401-404 are multithreaded processor cores that are implemented as part of a central processing unit (CPU)").  For these reasons, Examiner asserts that it is reasonable to maintain the rejection with respect to Paltashev.  See also ([¶0058] "One or more shader engines (SE) 420, 421, 422, 423, 424, 425 (collectively referred to herein as “the shader engines 420-425”) are implemented using shared hardware resources of the graphics processing system 400. Some embodiments of the shader engines 420-425 can be used to implement shaders in the graphics processing system 300 shown in FIG. 3. The shared hardware resources include asynchronous compute engines (ACE) 430, 431, 432, 433, 434 (collectively referred to herein as “the asynchronous compute engines 430-434”) for executing general compute metacommands, a graphics processing engine 435 configured to execute graphics metacommands, and a video processing engine 438 configured to execute video metacommands. The asynchronous compute engines 430-434 are distinct functional hardware blocks that are capable of executing general compute metacommands concurrently with other asynchronous compute engines 430-434.") which shows computer operations being computed as metacommands. 
The remaining arguments are moot in view of a new ground of rejection set forth below. 

Claim Objections
Claims 26-32 and 37 objected to because of the following informalities:  “heterogenous” should be spelled “heterogeneous”.  Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“a compilation unit (CU) configured to” in claims 1 and 26
“the GrPU is configured to” in claims 1 and 26
“the fetch stage is configured to” in claims 15 and 33
“the writeback stage is configured to” in claims 17 and 34
“heterogeneous processing system configured to” in claim 26
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  Fetch stage and writeback stage are given structure in at least ([¶0200] "It is contemplated that compute mechanism 2010 or one or more of their components may be implemented as hardware, software, and/or firmware." [¶0238] "compute mechanism 2010 includes an architecture to perform activation of deep learning functions." [¶0239] "FIG. 27B illustrates one embodiment of architecture 2530, which includes fetch 2532, execute 2534 and writeback 2536 stages") of the published instant specification. 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1, 6, 13-17, and 25-34 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim limitations “a compilation unit” and “GrPU” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. For example, FIG. 21 shows the CU and GrPU as elements of the GPU, but these elements are seen as merely a black box.  No explicit corresponding structure was found in the instant specification as published.  Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Similarly, the “heterogenous processing system” in claim 26 lacks support in the published instant specification.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Regarding claim 26, “the heterogenous processing system” lacks antecedent basis.  “A heterogeneous processing system” is recommended.  It’s unclear from the claims whether the heterogeneous processing system is synonymous with a data processing system or a different system altogether. 

The remaining claims are rejected with respect to their dependence on the rejected claims. 

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 13, 14, 15, 16, 17, 25, 26, 31, 32, 33,  and 34 are rejected under U.S.C. §103 as being unpatentable over the combination of Caulfield (US20190180499A1), and Rouhani (“Deep3: Leveraging Three Levels of Parallelism for Efficient Deep Learning”, 2017) and Paltashev (US20180114290A1).

	 Regarding claim 1, Caulfield teaches An apparatus to facilitate compute optimization, comprising: at least one processor to perform operations to implement a neural network; and([¶0085] "a system, such as the system shown in FIG. 2, may be additionally provided with one or more hardware accelerators to implement and/or utilize convolutional neural networks (CNNs)...Neural network classifiers can run either exclusively using the hardware (HW) convolutional neural network (CNN) accelerator 207 or in a combination of processors and HW CNN accelerator 207")
	compute logic including circuitry configured to accelerate neural network computations.([¶0085] "Neural network classifiers can run either exclusively using the hardware (HW) convolutional neural network (CNN) accelerator 207 or in a combination of processors and HW CNN accelerator 207")
	 the circuitry comprising: a local memory to store one or more graph representations ([¶0080] "The apparatus depicted in FIG. 2 may include a host system composed on host CPU 200 and associated host memory 201" [¶0145] "FIG. 37 is a diagram showing how 2D Path-Finding on a 2D 2×2 bitmap can be accelerated in accordance with some embodiments...This approach prunes branches from the graph search algorithm" [¶0148] "In some instances, the volumetric data structure 4008 may be pre-loaded onto the local memory" FIG. 37 shows a bitmap for performing pathfinding which is a graph traversal method.  Therefore the bitmap is interpreted as synonymous with a graph representation. Caulfield also teaches that the path-finding algorithm may be scaled into three dimensions, and explicitly teaches that volumetric graph representations may be stored in memory.)
	associated with a neural network;([¶0085] "Neural network classifiers can run either exclusively using the hardware (HW) convolutional neural network (CNN) accelerator 207 or in a combination of processors and HW CNN accelerator 207 to produce an output classification 237. The availability of a HW CNN accelerator 207 to do inference on volumetric representations may allow groups of voxels in the measured geometry voxels 227 to be labelled as belonging to a particular object class, among other example uses." [¶0180] "In one example, a convolutional neural network 5355 or other machine learning logic may be utilized to take all or a portion of volumetric data structure and perform inference to identify that various geometry described within the volumetric data structure")
	 and graph processing unit (GrPU) to accelerate computations on the one or more graph representations(¶0145] "FIG. 37 is a diagram showing how 2D Path-Finding on a 2D 2×2 bitmap can be accelerated in accordance with some embodiments...This approach prunes branches from the graph search algorithm").
	However, Caulfield does not explicitly teach wherein the GrPU includes multiple hardware threads to concurrently traverse multiple graph representations and execute instructions associated with nodes of the multiple graph representations;
	and a compilation unit (CU) configured to dynamically compile shader kernels;
	and wherein the GrPU is configured to perform a compute operation implemented via a dynamically compiled shader
	and the dynamically compiled shader is dynamically compiled by the CU and executed by the GrPU in response to a detected condition..

	Rouhani, in the same field of endeavor, teaches wherein the GrPU includes multiple hardware threads to concurrently traverse multiple graph representations and execute instructions associated with nodes of the multiple graph representations;([p. 3 §5.1] "the parameter coordinator initiates a pair of send- and receive-thread. The send-thread subsamples the neurons of the global DL model in accordance with the local DL topology" [p. 4 §5.1] "A local network might compute its gradients (updates) based on a set of parameters that are slightly out of date. This is because the other local networks have probably updated the global values in the parameter coordination unit in the meantime." global DL model and local DL topology interpreted as multiple graph representations.  Rouhani teaches that multiple local graph representations are traversed in parallel (concurrently).).

	Caulfield as well as Rouhani are directed towards hardware-centric neural network processing systems.  Therefore, Caulfield as well as Rouhani are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Caulfield with the teachings of Rouhani by concurrently traversing multiple graph representations of the neural networks.  Rouhani provides as additional motivation for combination ([Abstract] "This paper proposes Deep3, an automated platform-aware Deep Learning (DL) framework that brings orders of magnitude performance improvement to DL training and execution. Deep3 is the first to simultaneously leverage three levels of parallelism for performing DL: data, network, and hardware. It uses platform profiling to abstract physical characterizations of the target platform. The core of Deep3 is a new extensible methodology that enables incorporation of platform characteristics into the higher-level data and neural network transformation. We provide accompanying libraries to ensure automated customization and adaptation to different datasets and platforms. Proof-of-concept evaluations demonstrate 10-100 fold physical performance improvement compared to the state-of-the-art DL frameworks, e.g., TensorFlow.").  This motivation for combination also applies to the remaining claims which depend on this combination. 

	However, the combination of Caulfield, and Rouhani does not explicitly teach and a compilation unit (CU) configured to dynamically compile shader kernels;
	and wherein the GrPU is configured to perform a compute operation implemented via a dynamically compiled shader
	and the dynamically compiled shader is dynamically compiled by the CU and executed by the GrPU in response to a detected condition..

	Paltashev, in the same field of endeavor, teaches and a compilation unit (CU) configured to dynamically compile shader kernels;([¶0041] "Implementing the graphics pipeline 201 as a monolithic pipeline object, allows a reduction in API overhead by enabling up-front shader optimization at compile time. Embodiments of the graphics pipeline 201 also make the CPU performance of the pipeline driver more predictable, since shader compilation is not kicked off by the driver at draw time outside of the application's control.")
	and wherein the GrPU is configured to perform a compute operation implemented via a dynamically compiled shader([¶0066] "To support implementations of a reconfigurable GPU, the graphics processing system 600 also includes shared fixed function hardware blocks 641, 642, 643, 644, 645...For another example, the tessellator 634 can transmit a request to the dedicated fixed function hardware block 642 to perform an operation and the results of the operation can be returned to the kernel domain shader 635")
	and the dynamically compiled shader is dynamically compiled by the CU and executed by the GrPU in response to a detected condition.([¶0027] " Reconfigurable GPUs with virtualized pipelines components, such as described herein, can support numerous different processing configurations" [¶0041] "Implementing the graphics pipeline 201 as a monolithic pipeline object, allows a reduction in API overhead by enabling up-front shader optimization at compile time. Embodiments of the graphics pipeline 201 also make the CPU performance of the pipeline driver more predictable, since shader compilation is not kicked off by the driver at draw time outside of the application's control." [¶0044] "For example, the configuration and control block 302 can dynamically configure or reconfigure the virtual graphics pipelines in response to system events or user input indicating that a new virtual graphics pipeline is to be instantiated"  Since shader compilation is not kicked off by the driver outside of the applications control it is interpreted as synonymous with being executed in response to a detected condition.).

	The combination of Caulfield and Rouhani as well as Paltashev are directed towards heterogeneous processor systems.  Therefore, the combination of Caulfield and Rouhani as well as Paltashev are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, that runtime shader compilation is inherent in graphics processing. The combination of Caulfield, Rouhani, and Paltashev supports the inherency of shader compilation for GPU’s.  The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Paltashev ([¶0068] “Some embodiments of the graphics processing system 600 have a number of advantages over conventional graphics pipelines. For example, the graphics processing system 600 utilizes fixed-function hardware within the compute domain and many compute shaders or virtual GPUs can be scheduled concurrently and load balanced by the asynchronous compute engines”).  This motivation for combination also applies to the remaining claims which depend on this combination. 

	 Regarding claim 13, the combination of Caulfield, Rouhani, and Paltashev teaches  The apparatus of claim 1, wherein the GPU includes circuitry to accelerate application of an activation function for an operation associated with the neural network.(Caulfield [¶0085] "Neural network classifiers can run either exclusively using the hardware (HW) convolutional neural network (CNN) accelerator 207 or in a combination of processors and HW CNN accelerator 207" neural network classifier is interpreted as synonymous with deep learning function.).
	
	 Regarding claim 14, the combination of Caulfield, Rouhani, and Paltashev teaches The apparatus of claim 13, wherein the GPU includes: circuitry to provide a fetch stage to receive input values;(Caulfield [¶0082] "Continuing with the example of FIG. 2, in some implementations the synthetic voxel geometry 202 may be combined with measured geometry voxels 227 constructed using a simultaneous localization and mapping (SLAM) pipeline 217. The SLAM pipeline may use active sensors and/or passive image sensors 214 (e.g., 214.1 and 214.2)")
	circuitry to provide an execute stage to perform computation operations on the input values; and(Caulfield [¶0082] "which are first processed using an image signal processing (ISP) pipeline 215")
	and circuitry to provide a writeback stage to pack and prepare results to be outputted.(Caulfield [¶0082] "to produce an output 225" [¶0135] "FIG. 27 illustrates logic to generate a 6-bit address triplet to control the multiplexers in accordance with some embodiments, which perform voxel insertion, deletion and retrieval...In this example the 16-bit x, y and z addresses of the voxel to be inserted, retrieved, tested for, etc. in a sparse voxel tree are presented to the address formatting logic 2705 as a packed 64-bit input value 2700").
	
	 Regarding claim 15, the combination of Caulfield, Rouhani, and Paltashev teaches The apparatus of claim 14, wherein the fetch stage is configured to analyze and identify a first operation to be implemented via first execute stage circuitry and a second operation to be implemented via second execution stage circuitry.(Caulfield [¶0102] "a neural network includes an initial convolutional processing layer 1100, followed by pooling processing 1110, and finally an activation function processing, such as rectified linear unit (ReLU) function 1120. The output of the ReLU unit 1120, which provides ReLU output vector 1131, may be connected to a following convolutional processing layer 1180 (e.g., possibly via delay 1132), which receives ReLU output vector 1131... a ReLU bitmap 1130 may also be generated in parallel with the connection of the ReLU unit 1120 to the following convolution unit 1180, the ReLU bitmap 1130 denoting which elements in the ReLU output vector 1131 are zeroes and which are non-zeroes." See also FIG. 11 1120 for identified first operation.).
	
	 Regarding claim 16, the combination of Caulfield, Rouhani, and Paltashev teaches The apparatus of claim 15, wherein the execute stage comprises: first execute stage circuitry to implement a first set of activation functions; and second execute stage circuitry to implement a second set of activation functions.(Caulfield [¶0102] "a neural network includes an initial convolutional processing layer 1100, followed by pooling processing 1110, and finally an activation function processing, such as rectified linear unit (ReLU) function 1120. The output of the ReLU unit 1120, which provides ReLU output vector 1131, may be connected to a following convolutional processing layer 1180 (e.g., possibly via delay 1132), which receives ReLU output vector 1131." See also FIG. 11.  Caulfield explicitly shows a first set of ReLU activation circuits for performing activation for a first stage (layer N-1) followed by activations for a second stage (layer N) of the neural network.).
	
	 Regarding claim 17, the combination of Caulfield, Rouhani, and Paltashev teaches The apparatus of claim 16, wherein the writeback stage is configured to: receive results from the first execute stage circuitry and the second execute stage circuitry; and output the results in a format associated with an output tensor.(Caulfield [¶0102] "The output of the ReLU unit 1120, which provides ReLU output vector 1131, may be connected to a following convolutional processing layer 1180 (e.g., possibly via delay 1132), which receives ReLU output vector 1131" ReLU output vector interpreted as synonymous with output tensor.).
	
	 Regarding claim 25, the combination of Caulfield, Rouhani, and Paltashev teaches The apparatus of claim 1, wherein the circuitry of the GPU is configured to: detect a condition associated with input data of a neural network computation; (Caulfield [¶0105] "CNN ReLU layers can produce high numbers of output zeroes corresponding to negative inputs." Detecting negative inputs associated with neural network input interpreted as synonymous with detecting a condition associated with input data of a neural network computation.)
	compile a modified shader that is configures the GPU to perform a modified neural network computation; (Paltashev [¶0041] "Implementing the graphics pipeline 201 as a monolithic pipeline object, allows a reduction in API overhead by enabling up-front shader optimization at compile time. Embodiments of the graphics pipeline 201 also make the CPU performance of the pipeline driver more predictable, since shader compilation is not kicked off by the driver at draw time outside of the application's control." [¶0043] "virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.")
	and perform the modified neural network computation via the compiled modified shader.(Paltashev [¶0066] "To support implementations of a reconfigurable GPU, the graphics processing system 600 also includes shared fixed function hardware blocks 641, 642, 643, 644, 645...For another example, the tessellator 634 can transmit a request to the dedicated fixed function hardware block 642 to perform an operation and the results of the operation can be returned to the kernel domain shader 635"  [¶0043] "virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.").
	
	 Regarding claim 26, claim 26 is substantially similar to claim 1.  Therefore, the rejection applied to claim 1 also applies to claim 26.
	
	 Regarding claim 31, the combination of Caulfield, Rouhani, and Paltashev teaches The data processing system of claim 26, wherein at least one core of the heterogenous processor includes circuitry to accelerate application of an activation function for an operation associated with the neural network.(Caulfield [¶0085] "Neural network classifiers can run either exclusively using the hardware (HW) convolutional neural network (CNN) accelerator 207 or in a combination of processors and HW CNN accelerator 207" [¶0102] "The hardware may include one or more processors, one or more microprocessors, one or more circuits, one or more computers, and the like. In this particular example, a neural network includes an initial convolutional processing layer 1100, followed by pooling processing 1110, and finally an activation function processing," neural network classifier is interpreted as synonymous with deep learning function.).
	
	 Regarding claim 32, the combination of Caulfield, Rouhani, and Paltashev teaches The data processing system of claim 31, wherein at least one core of the heterogenous processor includes: circuitry to provide a fetch stage to receive input values; (Caulfield [¶0082] "Continuing with the example of FIG. 2, in some implementations the synthetic voxel geometry 202 may be combined with measured geometry voxels 227 constructed using a simultaneous localization and mapping (SLAM) pipeline 217. The SLAM pipeline may use active sensors and/or passive image sensors 214 (e.g., 214.1 and 214.2)")
	circuitry to provide an execute stage to perform computation operations on the input values; (Caulfield [¶0082] "which are first processed using an image signal processing (ISP) pipeline 215")
	and circuitry to provide a writeback stage to pack and prepare results to be output.(Caulfield [¶0082] "to produce an output 225" [¶0135] "FIG. 27 illustrates logic to generate a 6-bit address triplet to control the multiplexers in accordance with some embodiments, which perform voxel insertion, deletion and retrieval...In this example the 16-bit x, y and z addresses of the voxel to be inserted, retrieved, tested for, etc. in a sparse voxel tree are presented to the address formatting logic 2705 as a packed 64-bit input value 2700").
	
	 Regarding claim 33, the combination of Caulfield, Rouhani, and Paltashev teaches The data processing system of claim 32, wherein the fetch stage is configured to analyze and identify a first operation to be implemented via first execute stage circuitry and a second operation to be implemented via second execution stage circuitry.(Caulfield [¶0102] "a neural network includes an initial convolutional processing layer 1100, followed by pooling processing 1110, and finally an activation function processing, such as rectified linear unit (ReLU) function 1120. The output of the ReLU unit 1120, which provides ReLU output vector 1131, may be connected to a following convolutional processing layer 1180 (e.g., possibly via delay 1132), which receives ReLU output vector 1131... a ReLU bitmap 1130 may also be generated in parallel with the connection of the ReLU unit 1120 to the following convolution unit 1180, the ReLU bitmap 1130 denoting which elements in the ReLU output vector 1131 are zeroes and which are non-zeroes." See also FIG. 11 1120 for identified first operation.  Activation operation at layer N interpreted as second operation.).
	
	 Regarding claim 34, the combination of Caulfield, Rouhani, and Paltashev teaches (New) The data processing system of claim 33, wherein the execute stage comprises: first execute stage circuitry to implement a first set of activation functions; and second execute stage circuitry to implement a second set of activation functions, and wherein the writeback stage is configured to: receive results from the first execute stage circuitry and the second execute stage(Caulfield [¶0102] "The output of the ReLU unit 1120, which provides ReLU output vector 1131, may be connected to a following convolutional processing layer 1180 (e.g., possibly via delay 1132), which receives ReLU output vector 1131" ReLU output vector interpreted as synonymous with output tensor.).	

	Claims 6 and 30 are rejected under U.S.C. §103 as being unpatentable over the combination of Caulfield and Rouhani and Paltashev and Park (“Weighted-Entropy-based Quantization for Deep Neural Networks”, 2017).

	 Regarding claim 6, the combination of Caulfield, Rouhani, and Paltashev teaches The apparatus of claim 1.
	However, the combination of Caulfield, Rouhani, and Paltashev doesn't explicitly teach, the GPU is configured to perform non-uniform quantization for the neural network.

	Park, in the same field of endeavor, teaches the GPU is configured to perform non-uniform quantization for the neural network.([Abstract] "Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy" [p. 5459 Sec. 4.1] "Maximizing the weighted entropy optimizes the quantization result towards maximizing entropy while considering the importance of data. Thus, our method groups many near-zero values into a large cluster by considering their lower importance. Large, but infrequent values are also grouped into a cluster that covers a wide range of weight values." [p. 5461 §5.1] "For image classification tasks, we evaluate the proposed method by quantizing two widely used CNNs for ImageNet tasks [6]: AlexNet [15] GoogLeNet [21] (both from Caffe framework [13]) and ResNet3 [11]. In order to apply our quantization scheme into these networks, we perform fine tuning combined with our weight/activation quantization schemes under the batch size of 256 (for AlexNet), 64 (for GoogLeNet), or 16 (for ResNet-50/101). In the cases of GoogLeNet and ResNet, the batch size is limited due to insufficient GPU memory capacity" Park explicitly teaches that the quantization is performed on the GPU.).

	Caulfield, Rouhani, Paltashev, and Park are all directed towards heterogeneous processor systems.  Therefore, Caulfield, Rouhani, Paltashev, and Park are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the non-uniform quantization taught in Park with the processing system capable of processing graphics and neural networks taught in the combination of Caulfield, Rouhani, and Paltashev. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Park that the non-uniform quantization not only speeds up training and classification, but also outperforms other well-known neural network quantization methods by allowing greater control over the quantizing.  From Park ([p. 5456 Col. 2 Sec. 1] "Such aggressive quantization methods are promising in that they can achieve significant reductions in the execution time, energy consumption, and memory capacity requirements of neural networks during the inference by exploiting the benefits of dedicated hardware accelerators, e.g. NVIDIA P40 and P4 [2] which support 8-bit integer arithmetic or Stripes [14] which provides execution time and energy consumption proportional to the bitwidth").

	 Regarding claim 30, the combination of Caulfield, Rouhani, and Paltashev teaches The data processing system of claim 26.
	However, the combination of Caulfield, Rouhani, and Paltashev doesn't explicitly teach at least one core of the heterogenous processor is configured to perform non-uniform quantization for the neural network..

	Park, in the same field of endeavor, teaches at least one core of the heterogenous processor is configured to perform non-uniform quantization for the neural network.([Abstract] "Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy" [p. 5459 Sec. 4.1] "Maximizing the weighted entropy optimizes the quantization result towards maximizing entropy while considering the importance of data. Thus, our method groups many near-zero values into a large cluster by considering their lower importance. Large, but infrequent values are also grouped into a cluster that covers a wide range of weight values.").

	Caulfield, Rouhani, Paltashev, and Park are all directed towards heterogeneous processor systems.  Therefore, Caulfield, Rouhani, Paltashev, and Park are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the non-uniform quantization taught in Park with the processing system capable of processing graphics and neural networks taught in the combination of Caulfield, Rouhani, and Paltashev. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Park that the non-uniform quantization not only speeds up training and classification, but also outperforms other well-known neural network quantization methods by allowing greater control over the quantizing.  From Park ([p. 5456 Col. 2 Sec. 1] "Such aggressive quantization methods are promising in that they can achieve significant reductions in the execution time, energy consumption, and memory capacity requirements of neural networks during the inference by exploiting the benefits of dedicated hardware accelerators, e.g. NVIDIA P40 and P4 [2] which support 8-bit integer arithmetic or Stripes [14] which provides execution time and energy consumption proportional to the bitwidth").

	Claim 27 is rejected under U.S.C. §103 as being unpatentable over the combination of Caulfield and Rouhani and Paltashev and Desoli (US20180189229A1).  

	 Regarding claim 27, the combination of Caulfield, Rouhani, and Paltashev teaches The data processing system of claim 26.
	However, the combination of Caulfield, Rouhani, and Paltashev does not explicitly teach the heterogenous processor is configured to process an input image via the neural network, 
	wherein to process the input image includes to: crop the image into a plurality of overlapping planar segments, the overlap of the planar segments configurable according to an overlap ratio;
	cluster the overlapping planar segments into two or more image batches; and process the two or more image batches via the heterogenous processor..

	Desoli, in the same field of endeavor, teaches The data processing system of claim 26, wherein the heterogenous processor is configured to process an input image via the neural network, ([Abstract] "Embodiments are directed towards a system on chip (SoC) that implements a deep convolutional network heterogeneous architecture... The configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system." [¶0048] "Considering the entire CNN, a two-dimensional image is input to the CNN and produces a set of votes at its output. The set of votes at the output are used to predict whether the input image either does or does not contain the object of interest that is characterized by the features.")
	wherein to process the input image includes to: crop the image into a plurality of overlapping planar segments, the overlap of the planar segments configurable according to an overlap ratio;([¶0040] "The pooling process introduces the concepts of “window size” and “stride.” The window size is the dimensions of a window such that a single, maximum value within the window will be selected in the pooling process. A window may be formed having dimensions of m-pixels by n-pixels wherein “m” and “n” are integers, but in most cases, “m” and “n” are equal. In the pooling operation shown in FIG. 1I, each window is formed as a 2-pixel-by-2-pixel window. In the pooling operation, a 4-pixel window is conceptually overlayed onto a selected portion of the kernel map, and within the window, the highest value is selected." Stride interpreted as synonymous with an overlap ratio for overlapping planar segments (filter windows).)
	cluster the overlapping planar segments into two or more image batches; and process the two or more image batches via the heterogenous processor.([¶0072] "Kernel sets may be partitioned in batches and processed sequentially, and intermediate results may be stored in on-chip memory. Various kernel sizes (e.g., up to 12×12), various batch sizes (e.g., up to 16), and parallel kernels (e.g., up to 4) can be handled by a single CA instance, and any size kernel can be accommodated with the accumulator input. The CA includes a line buffer to fetch a plurality (e.g., up to 12) of feature map data words in parallel with a single memory access... Configurable batch size and a variable number of parallel kernels provide a neural network designer with flexibility to trade-off the available input and output bandwidth sharing across different units and the available computing logic resources.").

	The combination of Caulfield, Rouhani, and Paltashev as well as Desoli are directed towards heterogeneous systems which process neural networks.  Therefore, the combination of Caulfield, Rouhani, and Paltashev as well as Desoli are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Caulfield, Rouhani, and Paltashev with the teachings of Desoli by providing an input image into a convolutional neural network.  While Caulfield, Rouhani, and Paltashev all teach using convolutional neural networks, they do not explicitly teach inputting images into said CNN.  Desoli provides as additional motivation for combination ([¶0054] "The performance of known object recognition techniques that use machine learning methods is improved by applying more powerful models to larger datasets, and implementing better techniques to prevent overfitting. Two known large datasets include LabelMe and ImageNet. LabelMe includes hundreds of thousands of fully segmented images, and more than 15 million high-resolution, labeled images in over 22,000 categories are included in ImageNet.").

	 Regarding claim 28, the combination of Caulfield, Rouhani, Paltashev, and Desoli teaches The data processing system of claim 27, wherein the heterogenous processor is configured to process the two or more image batches in parallel via two or more cores.(Desoli [¶0072] "Kernel sets may be partitioned in batches and processed sequentially, and intermediate results may be stored in on-chip memory. Various kernel sizes (e.g., up to 12×12), various batch sizes (e.g., up to 16), and parallel kernels (e.g., up to 4) can be handled by a single CA instance, and any size kernel can be accommodated with the accumulator input. The CA includes a line buffer to fetch a plurality (e.g., up to 12) of feature map data words in parallel with a single memory access... Configurable batch size and a variable number of parallel kernels provide a neural network designer with flexibility to trade-off the available input and output bandwidth sharing across different units and the available computing logic resources."). [¶0152] "Kernel sets are partitioned in batches processed sequentially and intermediate results can be stored in the SoC global memory 126. Various kernel sizes (e.g., up to 12×12), various batch sizes (e.g., up to 16), and parallel kernels (e.g., up to 4) can be handled by a single CA 600 instance but any size kernel can be accommodated with the accumulator input" Forward processing element interpreted as synonymous with compute node.  Fraser explicitly teaches that images may be batched at the input of the neural network system and processed in parallel using the processing elements.).

	Claim 29 is rejected under U.S.C. §103 as being unpatentable over the combination of Caulfield and Rouhani and Paltashev and Desoli and Fraser (US20190080223A1).

	 Regarding claim 29, the combination of Caulfield, Rouhani, Paltashev, and Desoli teaches The data processing system of claim 28.
	However, the combination of Caulfield, Rouhani, Paltashev, and Desoli doesn't explicitly teach, wherein the heterogenous processor is configured to: process a first image batch via the CPU core; 
	process a second image batch via the GPU core; 
	and process a third image batch via the accelerator core..

	Fraser, in the same field of endeavor, teaches The data processing system of claim 28, wherein the heterogenous processor is configured to: process a first image batch via the CPU core; ([¶0002] "neural networks need to be trained on a sufficiently large dataset, and the training is performed on the basis of floating point arithmetic using general purpose graphics processing units (GPGPUs)." [¶0040] "a backpropagation learning method may be used to calculate the error contribution of each neuron after a batch of data (e.g., in image recognition, multiple images) is processed"  [¶0041] "a neural network typically needs to be trained on a sufficiently large training dataset. The training dataset may include a plurality of subsets (batches)")
	process a second image batch via the GPU core; ([¶0002] "neural networks need to be trained on a sufficiently large dataset, and the training is performed on the basis of floating point arithmetic using general purpose graphics processing units (GPGPUs)." [¶0040] "a backpropagation learning method may be used to calculate the error contribution of each neuron after a batch of data (e.g., in image recognition, multiple images) is processed"  [¶0041] "a neural network typically needs to be trained on a sufficiently large training dataset. The training dataset may include a plurality of subsets (batches)")
	and process a third image batch via the accelerator core.([¶0002] "neural networks need to be trained on a sufficiently large dataset, and the training is performed on the basis of floating point arithmetic using general purpose graphics processing units (GPGPUs)." [¶0040] "a backpropagation learning method may be used to calculate the error contribution of each neuron after a batch of data (e.g., in image recognition, multiple images) is processed"  [¶0041] "a neural network typically needs to be trained on a sufficiently large training dataset. The training dataset may include a plurality of subsets (batches)").

	The combination of Caulfield, Yan, Paltashev, and Desoli, as well as Fraser are all directed towards heterogeneous processor systems.  Therefore, the combination of Caulfield, Yan, Paltashev, and Desoli, and Fraser are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural networks of the combination of Caulfield, Yan, Paltashev, and Desoli with that of Fraser by implementing parallel batching. Fraser teaches that batching is a method of removing dependencies and allowing for greater parallelization ([¶0042] “It has been discovered that various techniques may be used to remove the dependencies in a training algorithm, which may prevent or reduce stalling and allow implementations using multiple accelerators...Such a delayed model adaptation allows the forward path and the backward path of neural network training to be implemented in parallel. This is achieved by introducing a delay between the output activations and the weight update and gradient calculations in each layer. This allows the calculations for each layer to be performed independently without different batches of inputs.”).

	Claims 35 and 36 are rejected under U.S.C. §103 as being unpatentable over the combination of Rouhani and Paltashev.

	 Regarding claim 35, Rouhani teaches A method comprising: accelerating computations on graph representations associated with a neural network via a graph processing unit (GrPU) ([Abstract] "This paper proposes Deep3 , an automated platform-aware Deep Learning (DL) framework that brings orders of magnitude performance improvement to DL training and execution. Deep3 is the first to simultaneously leverage three levels of parallelism for performing DL: data, network, and hardware." [p. 2 §1] "Devising accompanying libraries to ensure Deep3 ease of adoption on three common classes of platforms including CPUs, GPUs, and FPGAs. Note that our proposed methodology is universal and directly applicable to other families of computing hardware")
	including multiple hardware threads to concurrently traverse multiple graph representations and execute instructions associated with nodes of the multiple graph representations; ([p. 3 §5.1] "the parameter coordinator initiates a pair of send- and receive-thread. The send-thread subsamples the neurons of the global DL model in accordance with the local DL topology" [p. 4 §5.1] "A local network might compute its gradients (updates) based on a set of parameters that are slightly out of date. This is because the other local networks have probably updated the global values in the parameter coordination unit in the meantime." global DL model and local DL topology interpreted as multiple graph representations.  Rouhani teaches that multiple local graph representations are traversed in parallel (concurrently).).
	However, Rouhani does not explicitly teach and in response to a directed condition: dynamically compiling a shader kernel to generate a dynamically compiled shader; and executing instructions of the dynamically compiled shader via the GrPU to perform a compute operation..

	Paltashev, in the same field of endeavor, teaches and in response to a directed condition: dynamically compiling a shader kernel to generate a dynamically compiled shader; and executing instructions of the dynamically compiled shader via the GrPU to perform a compute operation.([¶0041] "Implementing the graphics pipeline 201 as a monolithic pipeline object, allows a reduction in API overhead by enabling up-front shader optimization at compile time. Embodiments of the graphics pipeline 201 also make the CPU performance of the pipeline driver more predictable, since shader compilation is not kicked off by the driver at draw time outside of the application's control." Since shader compilation is not kicked off by the driver outside of the applications control it is interpreted as synonymous with being executed in response to a directed condition.).

		Rouhani as well as Paltashev are directed towards heterogeneous processor systems.  Therefore, Rouhani as well as Paltashev are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, that runtime shader compilation is inherent in graphics processing. Rouhani, and Paltashev supports the inherency of shader compilation for GPU’s.  The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Paltashev ([¶0068] “Some embodiments of the graphics processing system 600 have a number of advantages over conventional graphics pipelines. For example, the graphics processing system 600 utilizes fixed-function hardware within the compute domain and many compute shaders or virtual GPUs can be scheduled concurrently and load balanced by the asynchronous compute engines”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 36, the combination of Rouhani, and Paltashev teaches The method of claim 35, further comprising: detecting a condition associated with input data of a neural network computation; (Rouhani [p. 2 §2] "The forward and backward propagations are iteratively applied for multiple rounds of reprocessing the input data until the desired accuracy is achieved." Accuracy interpreted as synonymous with a condition associated with input data of a neural network computation.)
	compiling a modified shader that configures the GPU to perform a modified neural network computation;(Paltashev [¶0041] "Implementing the graphics pipeline 201 as a monolithic pipeline object, allows a reduction in API overhead by enabling up-front shader optimization at compile time. Embodiments of the graphics pipeline 201 also make the CPU performance of the pipeline driver more predictable, since shader compilation is not kicked off by the driver at draw time outside of the application's control." [¶0043] "virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.")
	and performing the modified neural network computation via the compiled modified shader.(Paltashev [¶0066] "To support implementations of a reconfigurable GPU, the graphics processing system 600 also includes shared fixed function hardware blocks 641, 642, 643, 644, 645...For another example, the tessellator 634 can transmit a request to the dedicated fixed function hardware block 642 to perform an operation and the results of the operation can be returned to the kernel domain shader 635"  [¶0043] "virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.virtual graphics pipelines can be configured to support deep learning neural networks that are implemented on a GPU platform.").

	Claim 37 is rejected under U.S.C. §103 as being unpatentable over the combination of Rouhani and Paltashev and Desoli.

	 Regarding claim 37, the combination of Rouhani, and Paltashev teaches 
	However, the combination of Rouhani, and Paltashev doesn't explicitly teach The method of claim 35, further comprising: cropping an input image received at a heterogenous processor including the GrPU, 
	the image cropped into a plurality of overlapping planar segments, the overlap of the planar segments configurable according to an overlap ratio;
	clustering the overlapping planar segments into two or more image batches; and processing the two or more image batches via the heterogenous processor, wherein the heterogenous processor is configured to process the two or more image batches in parallel via two or more cores..

	Desoli, in the same field of endeavor, teaches The method of claim 35, further comprising: cropping an input image received at a heterogenous processor including the GrPU, ([Abstract] "Embodiments are directed towards a system on chip (SoC) that implements a deep convolutional network heterogeneous architecture." [¶0178] "Image Cropper and Resizer Unit")
	the image cropped into a plurality of overlapping planar segments, the overlap of the planar segments configurable according to an overlap ratio;([¶0040] "The pooling process introduces the concepts of “window size” and “stride.” The window size is the dimensions of a window such that a single, maximum value within the window will be selected in the pooling process. A window may be formed having dimensions of m-pixels by n-pixels wherein “m” and “n” are integers, but in most cases, “m” and “n” are equal. In the pooling operation shown in FIG. 1I, each window is formed as a 2-pixel-by-2-pixel window. In the pooling operation, a 4-pixel window is conceptually overlayed onto a selected portion of the kernel map, and within the window, the highest value is selected." Stride interpreted as synonymous with an overlap ratio for overlapping planar segments (filter windows).)
	clustering the overlapping planar segments into two or more image batches; and processing the two or more image batches via the heterogenous processor, wherein the heterogenous processor is configured to process the two or more image batches in parallel via two or more cores.([¶0072] "Kernel sets may be partitioned in batches and processed sequentially, and intermediate results may be stored in on-chip memory. Various kernel sizes (e.g., up to 12×12), various batch sizes (e.g., up to 16), and parallel kernels (e.g., up to 4) can be handled by a single CA instance, and any size kernel can be accommodated with the accumulator input. The CA includes a line buffer to fetch a plurality (e.g., up to 12) of feature map data words in parallel with a single memory access... Configurable batch size and a variable number of parallel kernels provide a neural network designer with flexibility to trade-off the available input and output bandwidth sharing across different units and the available computing logic resources.").

	The combination of Rouhani and Paltashev as well as Desoli are directed towards heterogeneous systems which process neural networks.  Therefore, the combination of Rouhani and Paltashev as well as Desoli are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Rouhani and Paltashev with the teachings of Desoli by providing an input image into a convolutional neural network.  While Caulfield, Rouhani, and Paltashev all teach using convolutional neural networks, they do not explicitly teach inputting images into said CNN.  Desoli provides as additional motivation for combination ([¶0054] "The performance of known object recognition techniques that use machine learning methods is improved by applying more powerful models to larger datasets, and implementing better techniques to prevent overfitting. Two known large datasets include LabelMe and ImageNet. LabelMe includes hundreds of thousands of fully segmented images, and more than 15 million high-resolution, labeled images in over 22,000 categories are included in ImageNet.").

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Chertok (US9418458B2) is directed towards a heterogeneous processor for graph processing of neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124