DETAILED ACTION
Claims 1-26 are pending.
The office acknowledges the following papers:
Claims and remarks application filed on 7/13/2022.

	Withdrawn objections and rejections
The specification objection has been withdrawn.

New Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 25-26 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 25 recites “wherein the assignment strategy includes information for different clusters to be provided same batch information at different times” (emphasis added). Claim 26 recites a similar limitation. Paragraphs 15, 24, 61, and 81 describe generating a batch strategy. The term “batch information” isn’t found within the specification. The terms “batch” and “cluster” have only been together found in paragraphs 26, 61, 77, and 83. Paragraph 26 provides the context of generating output information in clusters by processing batches. Paragraph 61 provides the context of parallelization strategy including a batch strategy. Paragraphs 77 and 83 provides the context of a cluster processing a mini-batch. The terms “batch” and “different” aren’t found within any same paragraph within the specification. A cursory search of the specification hasn’t found written description support for an assignment strategy including information for different clusters to be provided same batch information at different times. Thus, the amendment failed to convey to one skilled in the art that possession of the claimed invention is present in the specification at the time of filing.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 11, and 22-26 are rejected under 35 U.S.C. 103 as being unpatentable over Seshadri et al. (U.S. 2019/0362227), in view of Xu et al. (U.S. 2020/0301739).
As per claim 1:
Seshadri and Xu disclosed a parallelization method comprising: 
generating a profiling result by performing profiling on a target neural network based on model information of the target neural network (Seshadri: Figures 1-2 elements 104-106 and 202-208, and paragraphs 25, 35, and 38-39)(The DNN profiler generates a result for each layer that includes total computation time, size of output activations, and size of weights (i.e. model information).) and architecture information of a manycore system (Xu: Figure 3 element 301, paragraph 39)(Seshadri: Figures 1-2 elements 104-106, 118, and 202, and paragraphs 25, 35, 38-39, and 43)(Seshadri disclosed profiling total computation times for each layer assigned for execution to a GPU. Xu disclosed a workload analyzer accessing a database storing accelerator attributes (i.e. architectural information), including number of cores, PEs, memory capability, etc. Xu selects a resource amount as part of calculating the execution time to complete a neural network layer. The combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer.); 
determining an assignment strategy to assign a plurality of cores of each of a plurality of clusters of the manycore system to a plurality of layers of the target neural network, based on the profiling result (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning a number of cores for execution of a neural network layer. Seshadri disclosed an optimizer that assigns sets of layers of a neural network to worker GPUs based on the neural network profile. The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network.); and 
generating a parallelization strategy for parallel processing of the manycore system based on the assignment strategy (Seshadri: Figures 1 and 5-6 elements 116 and 602-608, paragraphs 46, 49-52, and 55)(Execution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of batches.).
The advantage of accessing accelerator attributes and selecting a subset of attributes for profiling neural network layers is that execution times of neural network layers can be more precise and overlapping use of GPUs by neural network layers can be more precisely determined. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the neural network layer profiling steps of Xu within Seshadri for the above advantages. 
As per claim 2:
Seshadri and Xu disclosed the method of claim 1, wherein the profiling result comprises any one or any combination of a time for a single core of the manycore system to execute a single layer of the target neural network (Xu: Figures 3 and 5B elements 301 and 430A, paragraph 39 and 41-42)(Seshadri: Figures 1-2 elements 104-106, 118, and 202, and paragraphs 25, 35, 38-39, and 43)(The combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer. A single layer can be assigned a single core for processing.), a time for a single cluster of the manycore system to execute a single layer of the target neural network (Xu: Figure 3 element 301, paragraph 39)(Seshadri: Figures 1-2 and 4A elements 104-106, 118, 202, and 404A-C, and paragraphs 25, 35, 38-39, 43, and 47-48)(The combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer. The combined total computation time for a set of layers in a stage reads upon the time for a single cluster.), and a communication cost to transmit processing results between cores of the manycore system.
As per claim 3:
Seshadri and Xu disclosed the method of claim 1, wherein the generating of the profiling result comprises generating the profiling result by pre-executing the target neural network based on test data (Seshadri: Figures 1-2 elements 104-106 and 202-208, and paragraphs 25, 35, and 38-39)(The DNN profiler generates a result for each layer that includes total computation time using a subset of training data (i.e. test data).).
As per claim 4:
Seshadri and Xu disclosed the method of claim 1, wherein the determining of the assignment strategy comprises: 
partitioning the target neural network into a plurality of sub-networks and distributing the plurality of sub-networks to the plurality of clusters (Seshadri: Figure 4A elements 112A-C and 404A-C, paragraphs 47-48)(The neural network layers are grouped into stages (i.e. sub-networks) and sent to worker GPUs (i.e. plurality of clusters) for execution.); and 
assigning a plurality of cores of each of the plurality of clusters to one or more layers of a corresponding sub-network among the plurality of sub-networks (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network.).
As per claim 5:
Seshadri and Xu disclosed the method of claim 4, wherein each of the plurality of sub-networks comprises either one of a single layer and a plurality of consecutive layers among the plurality of layers of the target neural network (Seshadri: Figure 4A elements 402A-G, paragraphs 47-48).
As per claim 6:
Seshadri and Xu disclosed the method of claim 4, wherein the partitioning of the target neural network comprises: 
partitioning the target neural network into the plurality of sub-networks based on a time for a single cluster of the manycore system to execute a single layer of the target neural network (Seshadri: Figures 1 and 4A element 110, paragraphs 35, 40, 43, and 47-48)(The neural network is optimized and partitioned into stages (i.e. sub-networks) based on the profiler result.); and 
distributing the plurality of sub-networks to the plurality of clusters (Seshadri: Figures 1 and 4A element 110, paragraphs 35, 40, 43, and 47-48)(The neural network is optimized and partitioned into stages (i.e. sub-networks) based on the profiler result. The stages are allocated GPU resources for executing the neural network layers.).
As per claim 7:
Seshadri and Xu disclosed the method of claim 4, wherein the assigning of the plurality of cores to the one or more layers comprises assigning the plurality of cores to the one or more layers based on a time for a single core of the manycore system to execute a single layer of the target neural network (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result.).
As per claim 8:
Seshadri and Xu disclosed the method of claim 4, wherein the assigning of the plurality of cores to the one or more layers comprises assigning the plurality of cores to the one or more layers based on a characteristic of each layer of the corresponding sub-network (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result, which includes total compute time, output activations, and weight size (i.e. characteristics).).
As per claim 9:
Seshadri and Xu disclosed the method of claim 8, wherein the characteristic of each layer comprises any one or any combination of an amount of computational operation for processing of each layer and an amount of communication traffic for transmitting a processing result of each layer (Seshadri: Figures 1-2 elements 104-104 and 204, paragraph 38)(The number of output activations indicates an amount of data communication traffic between layers.).
As per claim 11:
Seshadri and Xu disclosed the method of claim 1, further comprising: 
generating a batch strategy comprising a number of micro-batches based on assignment states of the plurality of cores according to the assignment strategy (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 5-6 elements 116 and 602-608, paragraphs 46, 49-52, and 55)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result. Execution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of minibatches.).
As per claim 22:
Seshadri and Xu disclosed a parallelization method comprising: 
determining, for each cluster of a manycore system, a sub-network including one or more layers of a target neural network to be executed by the cluster, based on execution times of the one or more layers (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning a number of cores for execution of a neural network layer. Seshadri disclosed an optimizer that assigns sets of layers of a neural network to worker GPUs based on the neural network profile. The combination allows for assigning a number of GPU cores on a GPU for sets of layers (i.e. sub-network) of the neural network.) and an optimal execution time of the cluster (Xu: Figures 5D-E, paragraphs 46-49)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning multiple neural networks to sets of cores for optimized execution time on the cores. The combination allows for such scheduling in Seshadri, which optimizes execution time of multiple neural networks.);
determining, for each core of each cluster, a layer of the determined sub-network to be processed by the core (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning a number of cores for execution of a neural network layer. Seshadri disclosed an optimizer that assigns sets of layers of a neural network to worker GPUs based on the neural network profile. The combination allows for assigning a number of GPU cores on a GPU (i.e. cluster) for sets of layers of the neural network.); and 
generating output information by processing, in each cluster, one or more batches based on the determined sub-network and the determined layers (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 5-6 elements 116 and 602-608, paragraphs 46, 49-52, and 55)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers (i.e. sub-network) of the neural network based on the profile result. Execution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of minibatches, which generates execution results output to the next layer.).
The advantage of accessing accelerator attributes and selecting a subset of attributes for profiling neural network layers is that execution times of neural network layers can be more precise and overlapping use of GPUs by neural network layers can be more precisely determined. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the neural network layer profiling steps of Xu within Seshadri for the above advantages. 
As per claim 23:
Seshadri and Xu disclosed the method of claim 22, wherein the determining of the sub-network for each cluster comprises determining, for each cluster, the sub-network to include a maximum number of consecutive layers of the target neural network having a sum of execution times less than or equal to the optimal execution time of the cluster (Xu: Figures 5D-E, paragraphs 46-49)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning multiple neural networks to sets of cores for optimized execution time on the cores. The combination allows for such scheduling in Seshadri, which optimizes execution time of multiple neural networks. The sum of execution times of consecutive layers is less than optimal execution time when the cores aren’t fully utilized during layer execution.).
As per claim 24:
Seshadri and Xu disclosed the method of claim 23, wherein the determining of the sub-network for each cluster comprises, in response to one or more layers of the target neural network not being included in the sub-networks, redetermining the sub-networks based on residual computational capabilities of the clusters (Xu: Figures 3 and 5F element 303, paragraphs 44 and 51)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed resorting neural network layers so that multiple neural networks can be optimized for concurrent execution. The combination allows for such re-sorting of layers in Seshadri.).
As per claim 25:
Seshadri and Xu disclosed the method of claim 1, wherein the assignment strategy includes information for different clusters to be provided same batch information at different times (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning a number of cores for execution of a neural network layer. Seshadri disclosed an optimizer that assigning sets of layers of a neural network to worker GPUs based on the neural network profile. The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network. It would have been obvious to one of ordinary skill in the art that multiple GPUs can be allocated the same number of cores (i.e. batch information) for executing a neural network layer at different times during processing.).
As per claim 26:
The additional limitation(s) of claim 26 basically recite the additional limitation(s) of claim 25. Therefore, claim 26 is rejected for the same reason(s) as claim 25.

Claims 10 and 12-21 are rejected under 35 U.S.C. 103 as being unpatentable over Seshadri et al. (U.S. 2019/0362227), in view of Xu et al. (U.S. 2020/0301739), in view of Official Notice.
As per claim 10:
Seshadri and Xu disclosed the method of claim 9, wherein, for the assigning of the plurality of cores to the one or more layers, a higher priority is assigned to the amount of computational operation than to the amount of communication traffic (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result, which includes total compute time, output activations, and weight size (i.e. characteristics). Official notice is given that resource allocation can given higher priority to performance for the advantage of increased system performance. Thus, it would have been obvious to one of ordinary skill in the art to implement assigning GPU cores for layers based on total compute time having the highest priority.).
As per claim 12:
Claim 12 essentially recites the same limitations of claim 1. Claim 12 additionally recites the following limitations:
A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method of claim 1 (Seshadri: Figure 1, paragraph 33)(Official notice is given that hardware logic can be implemented as software code for the advantages of cost efficient design, testing, and implementation. Thus, it would have been obvious to one of ordinary skill in the art to implement the DNN profiler, optimizer, and scheduler of Seshadri as software.).
As per claim 13:
Claim 13 essentially recites the same limitations of claim 1. Claim 13 additionally recites the following limitations:
A processor generating a profile result, determining an assignment strategy, and generating a parallelization strategy (Seshadri: Figure 1, paragraph 33)(Official notice is given that software code can be implemented as hardware logic for the advantages of increased performance. Thus, it would have been obvious to one of ordinary skill in the art to implement the DNN profiler, optimizer, and scheduler of Seshadri as processor hardware.).
As per claim 14:
The additional limitation(s) of claim 14 basically recite the additional limitation(s) of claim 2. Therefore, claim 14 is rejected for the same reason(s) as claim 2.
As per claim 15:
The additional limitation(s) of claim 15 basically recite the additional limitation(s) of claim 3. Therefore, claim 15 is rejected for the same reason(s) as claim 3.
As per claim 16:
The additional limitation(s) of claim 16 basically recite the additional limitation(s) of claim 4. Therefore, claim 16 is rejected for the same reason(s) as claim 4.
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of claim 6. Therefore, claim 17 is rejected for the same reason(s) as claim 6.
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 7. Therefore, claim 18 is rejected for the same reason(s) as claim 7.
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 8. Therefore, claim 19 is rejected for the same reason(s) as claim 8.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 11. Therefore, claim 20 is rejected for the same reason(s) as claim 11.
As per claim 21:
The additional limitation(s) of claim 21 basically recite the additional limitation(s) of claim 12. Therefore, claim 21 is rejected for the same reason(s) as claim 12.

Response to Arguments
The arguments presented by Applicant in the response, received on 7/13/2022 are not considered persuasive.
Applicant argues for claims 1 and 22:
“Again, the purported solution being proposed by Seshadri is to determine how to divide up a DNN model into separate stages that will be separately assigned to different GPUs, for example. Because Seshadri further suggests an approach where each GPU perform their respective forward and backward (back-propagation) passes, which reduces communication between each GPU and a parameter server, for example, where previously gradients from different GPUs may be collected and a GPU would require communication with the example parameter server to obtain such gradient information before the GPU could perform it's back- propagation or before another server or GPU would perform the back-propagation. 
Here, the Office Action acknowledges that Seshadri does not disclose or suggest all the limitations of independent claim 22, and refers to Xu for remedying the deficiencies of Seshadri.
In particular, the Office suggests modifying Seshadri to perform an assignment of different layers of a DNN to different clusters within a single GPU, which is the focus of Xu. 
Applicant respectfully disagrees. In particular, as demonstrated above, Seshadri relates to a neural network training system using pipelining aspects of the training process by assigning different portions of a DNN to two different computing devices, with no apparent reason or suggestion, in the context of the invention of Seshadri, to now differently assign the different portions or any of the different portions to different clusters within any particular one device, i.e., within any particular one GPU.
Further, the Office sets forth "the combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer." 
However, again, there is no need or desire in Seshadri to know the internal scheduling and assignments of resources within any particular device. Seshadri determines the total computation time of the particular device for the assigned portion of the DNN. 
There would be no reason why Seshadri would need to know the breakdown of computing times within the particular device for different cores or clusters of the particular device (GPU). Again, Seshadri only needs to know the total computation time of that device for the assigned portion of the DNN assigned to that device.”  

This argument is not found to be persuasive for the following reason. The applicant is correct that Seshadri disclosed dividing a DNN model into separate stages that will be separately assigned to different GPUs. This can be seen in figure 4A, where the DNN model is divided into optimized layer assignments, where each stage would be assigned to a different GPU. Seshadri alone allows for determining total compute time using an assigned GPU, regardless of how many resources on the GPU are actually needed to compute a given stage comprising NN layer(s). 
Xu disclosed a workload analyzer accessing a database storing accelerator attributes (i.e. architectural information), including number of cores, PEs, memory capability, etc. The combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer. 
The advantage of Xu calculating computation time using an allocated set of accelerator resources is that GPU resources can be allocated in a more fine-grain manner. This allows for knowing available resources in GPUs for parallel allocation. This would especially be the case if a subset of GPU resources are already allocated to other processing. The calculation of Seshadri using the entire resource set of the GPU would result in an inaccurate profile of total computation time. Thus, the combination has proper motivation for using architecture information of the GPUs of Seshadri in performing the profiling steps.
Applicant argues for claims 1 and 22:
“Further, the Office suggests that Seshadri would desire or would be benefited by "assigning a number of GPU cores on a GPU for sets of layers of the neural network." However, again, Seshadri is focused on assigning a layer or such a 'set of layers' to a particular device (i.e., GPU) and assigning another layer or another 'set of layers' to a different particular device. Seshadri does not suggest or need to assign different layers (or different sets of layers) to different clusters within any particular one device. 
Again, the inventive focus of Seshadri is for the different particular devices (GPUs) perform different portions of a DNN.”  

This argument is not found to be persuasive for the following reason. The applicant is correct that Seshadri focuses on assigning various sets of layers to different GPUs (i.e. clusters). The combination of Seshadri and Xu allows for the assignment of a set of layers to a given GPU to include assigning a plurality of cores of the CPU, as opposed to all of the cores of a GPU (e.g. tensor, CUDA, ray tracing, stream processors, compute units, texture units, etc.). Even Seshadri alone would simply result in the assignment of all of the GPU cores for a set of layers that existed on the allocated GPU. Thus, reading upon the claimed limitation.
Applicant argues for claims 1 and 22:
“Further, with respect to the claimed "generating a parallelization strategy for parallel processing of the manycore system based on the assignment strategy", the Office relies upon "Seshadri: Figures 1 and 5-6 elements 116 and 602-608, paragraphs 46, 49-52, and 55)(Execution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of batches.)." 
However, it is unclear how the "[e]xecution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of batches" is related to the claimed parallel processing. 
The forward processing of a device (e.g., GPU), for an assigned portion of the DNN, is not performed in parallel with the backward (i.e., the back-propagation) processing of the device, they are sequential operations. For a backward processing with respect to a batch processed in the corresponding forward pass must occur after the forward pass. 
Different devices (different GPUs) train different portions of the DNN in parallel, i.e., for a same batch input, and/or some different devices may train the same portions of the DNN using different batches in parallel. 
Thus, any interpretable generating of a parallelization strategy for parallel processing in Seshadri is only relating to parallel operations of different devices (different GPUs), based on the determined assignment strategy of the different portions of the DNN to different devices (or the alternative assignment of same portions of the DNN to different devices for processing of different batches of training data).”  

This argument is not found to be persuasive for the following reason. Figure 5 of Seshadri clearly shows parallel execution by multiple worker computing devices (i.e. GPUs) executing different layers with different input data in parallel. The parallel processing of stages on GPUs reads upon the parallel processing of the manycore system (i.e. GPUs having multiple cores). In addition, it’s well-known to one of ordinary skill in the art that neural network stage executions involve large amounts of parallel processing. This is typically done using large input feature and weights for large sets of parallel MAC calculations. This is the reason that neural network calculations are sent to GPUs for parallel execution and are not executed on generic CPUs for sequential processing. Thus, the combination reads on the claimed limitation.
Applicant argues for claims 1 and 22:
“However, there is no disclosure or suggestion in Seshadri that there would ever need or ever desire "that execution times of neural network layers can be more precise and overlappinq use of GPUs by neural network layers can be more precisely determined." 
As explained below, this is only the desire of Xu because Xu involves assigning different portions of a neural network to different clusters of cores within a GPU or other accelerator. 
Seshadri does not need "execution times of neural networks" to be "more precise" with respect to a manycore system. Seshadri is only interested in execution times of different devices (GPUs), which does not need or desire the knowledge of how that device assigns resources within that same device.”  

This argument is not found to be persuasive for the following reason. In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, the reasons to combine is found in the knowledge generally available to one of ordinary skill in the art.
Applicant argues for claims 1 and 22:
“In addition, it is respectfully submitted that the Office has failed to present a prima facie case of why one skilled in the art would have modified Seshardi in view of Xu when, as explained below, Xu is significantly different from the purported invention of Seshardi. 
In more detail, significantly different from Seshardi, Xu relates to an approach for assigning cores to different layers of a DNN within an accelerator (e.g., within a single GPU), in the context of the invention of Xu where the assignment of cores is performed in view of sharing clusters of cores between two different DNNs.”  

This argument is not found to be persuasive for the following reason. Seshadri disclosed allocating a set of neural network layers to a GPU. Xu disclosed allocating single neural network layers to cores of a GPU. The examiner disagrees that there are  significant differences between the references.
Applicant argues for claims 1 and 22:
“As yet another example, there cannot be any suggestion the addition of the teachings of the accelerator of Xu would provide Seshadri with a more precisely determined use of the neural network layers, when Seshadri already has an optimizer that optimize layer assignments. 
Further, as noted above, there cannot be any suggestion the addition of the teachings of the accelerator of Xu would provide Seshadri with a more precisely determined use of the neural network layers with respect to a manycore system, when Seshadri provides no disclosure or suggestion to further perform optimization of how each individual device optimizes the layer assignments assigned thereto.”  

This argument is not found to be persuasive for the following reason. The applicant is correct that both references perform layer optimization. However, Seshadri disclosed allocating the entirety of a GPU to process a given stage, regardless of if all GPU resources are needed or not. The combination allows for allocating a given number of cores of an allocated GPU for better resources utilization.
Applicant argues for claims 1 and 22:
“In addition, it is respectfully submitted that the Office has failed to explain how combination as proposed by the Office would be operative, for example, if the teachings with respect to the accelerator of Xu were forced into the optimizer of Seshadri.” 

This argument is not found to be persuasive for the following reason. Allocating subsets of processing resources to different workflows (e.g. threads) is well-known to one of ordinary skill in the art. For example, execution of multiple WARPs/SIMTs on a processing resource is well-known and leads to the well-known outcome of increased processing from parallel execution of multiple workflows on a single computing device (e.g. GPU).
Applicant argues for claim 22:
“It is respectfully submitted that neither Seshardi nor Xu disclose at least the claimed "determining, for each cluster of a manycore system, a sub-network including one or more layers of a target neural network to be executed by the cluster, based on execution times of the one or more layers and an optimal execution time of the cluster." 
There is no such feature in either of Seshardi or Xu of "an optimal execution time of the cluster", or any determining of a sub-network based on execution times of the one or more layers and the "optimal execution time of the cluster."”  

This argument is not found to be persuasive for the following reason. Xu disclosed a resource evaluator that assigns multiple neural networks to sets of cores, which optimizes overall available execution time of the cores in a GPU (i.e. cluster). The combination allows for this type of scheduling in Seshadri, which optimizes execution time of multiple neural network. Thus, reading upon the claimed limitations.
Applicant argues regarding official notice:
“In addition, the Office Action relies on Official Notice as the "principal evidence" upon which the rejections of claims 10 and 12-21 are based. Official Notice cannot be used in this manner. As Section 2144.03(A) of the MPEP expressly warns, it is never appropriate to rely solely on Official Notice as the principal evidence upon which a rejection was based. Instead, Official Notice is only appropriate for facts and that serve to "fill in the gaps" in a rejection. (MPEP § 2144.03(A)). This is why official notice is to be judicially applied. (MPEP § 2144.03). It is unreasonable to conclude that the Office has used Official Notice to "fill in" a gap in these rejections. 
Still further, the Office attempts to take Official Notice of matter that is not "capable of instant and unquestionable demonstration", as expressly required by section 2144.03(A) of the MPEP. ("For example, even assuming arguendo that the equivalence of the subject printer and plotter is a fact, this fact would be neither of notorious character nor instantly and unquestionably demonstrable.") Moreover, courts have long rejected the notion that official notice can be taken on the state of the art. (See Memorandum to Patent Examining Corps from the Deputy Commissioner for Patent Examining Policy regarding Procedures for Relying on Facts Which are Not of Record as Common Sense or for Taking Official Notice, n.6, citing In re Eynde, 480 F.2d 1364, 1370, 178 USPQ 470, 474 (CCPA 1973)). Thus, the Office's attempt to officially notice the level of ordinary skill in the art is improper as a matter of law.”  

This argument is not found to be persuasive for the following reason. MPEP 2144.03 C states “To adequately traverse such a finding, an applicant must specifically point out the supposed errors in the examiner’s action, which would include stating why the noticed fact is not considered to be common knowledge or well-known in the art … A general allegation that the claims define a patentable invention without any reference to the examiner’s assertion of official notice would be inadequate.” Applicant’s response hasn’t included why the noticed fact isn’t considered well-known in the art. Thus, the official notices taken are maintained. 

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183