DETAILED ACTION
Claims 1-24 are pending.
The office acknowledges the following papers:
Patent application filed on 4/7/2021.

	Priority
The effective filing date for the subject matter defined in the pending claims in this application is 10/28/2020.

Drawings
The Examiner contends that the drawings submitted on 4/7/2021 are acceptable for examination proceedings. 

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 11, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Seshadri et al. (U.S. 2019/0362227), in view of Xu et al. (U.S. 2020/0301739).
As per claim 1:
Seshadri and Xu disclosed a parallelization method comprising: 
generating a profiling result by performing profiling on a target neural network based on model information of the target neural network (Seshadri: Figures 1-2 elements 104-106 and 202-208, and paragraphs 25, 35, and 38-39)(The DNN profiler generates a result for each layer that includes total computation time, size of output activations, and size of weights (i.e. model information).) and architecture information of a manycore system (Xu: Figure 3 element 301, paragraph 39)(Seshadri: Figures 1-2 elements 104-106, 118, and 202, and paragraphs 25, 35, 38-39, and 43)(Seshadri disclosed profiling total computation times for each layer assigned for execution to a GPU. Xu disclosed a workload analyzer accessing a database storing accelerator attributes (i.e. architectural information), including number of cores, PEs, memory capability, etc. Xu selects a resource amount as part of calculating the execution time to complete a neural network layer. The combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer.); 
determining an assignment strategy to assign a plurality of cores of each of a plurality of clusters of the manycore system to a plurality of layers of the target neural network, based on the profiling result (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning a number of cores for execution of a neural network layer. Seshadri disclosed an optimizer that assigning sets of layers of a neural network to worker GPUs based on the neural network profile. The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network.); and 
generating a parallelization strategy for parallel processing of the manycore system based on the assignment strategy (Seshadri: Figures 1 and 5-6 elements 116 and 602-608, paragraphs 46, 49-52, and 55)(Execution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of batches.).
The advantage of accessing accelerator attributes and selecting a subset of attributes for profiling neural network layers is that execution times of neural network layers can be more precise and overlapping use of GPUs by neural network layers can be more precisely determined. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the neural network layer profiling steps of Xu within Seshadri for the above advantages. 
As per claim 2:
Seshadri and Xu disclosed the method of claim 1, wherein the profiling result comprises any one or any combination of a time for a single core of the manycore system to execute a single layer of the target neural network (Xu: Figures 3 and 5B elements 301 and 430A, paragraph 39 and 41-42)(Seshadri: Figures 1-2 elements 104-106, 118, and 202, and paragraphs 25, 35, 38-39, and 43)(The combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer. A single layer can be assigned a single core for processing.), a time for a single cluster of the manycore system to execute a single layer of the target neural network (Xu: Figure 3 element 301, paragraph 39)(Seshadri: Figures 1-2 and 4A elements 104-106, 118, 202, and 404A-C, and paragraphs 25, 35, 38-39, 43, and 47-48)(The combination allows for the DNN profiler of Seshadri to access GPU attributes and selecting resource amounts as part of calculating the profiled total computation time for each layer. The combined total computation time for a set of layers in a stage reads upon the time for a single cluster.), and a communication cost to transmit processing results between cores of the manycore system.
As per claim 3:
Seshadri and Xu disclosed the method of claim 1, wherein the generating of the profiling result comprises generating the profiling result by pre-executing the target neural network based on test data (Seshadri: Figures 1-2 elements 104-106 and 202-208, and paragraphs 25, 35, and 38-39)(The DNN profiler generates a result for each layer that includes total computation time using a subset of training data (i.e. test data).).
As per claim 4:
Seshadri and Xu disclosed the method of claim 1, wherein the determining of the assignment strategy comprises: 
partitioning the target neural network into a plurality of sub-networks and distributing the plurality of sub-networks to the plurality of clusters (Seshadri: Figure 4A elements 112A-C and 404A-C, paragraphs 47-48)(The neural network layers are grouped into stages (i.e. sub-networks) and sent to worker GPUs (i.e. plurality of clusters) for execution.); and 
assigning a plurality of cores of each of the plurality of clusters to one or more layers of a corresponding sub-network among the plurality of sub-networks (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network.).
As per claim 5:
Seshadri and Xu disclosed the method of claim 4, wherein each of the plurality of sub-networks comprises either one of a single layer and a plurality of consecutive layers among the plurality of layers of the target neural network (Seshadri: Figure 4A elements 402A-G, paragraphs 47-48).
As per claim 6:
Seshadri and Xu disclosed the method of claim 4, wherein the partitioning of the target neural network comprises: 
partitioning the target neural network into the plurality of sub-networks based on a time for a single cluster of the manycore system to execute a single layer of the target neural network (Seshadri: Figures 1 and 4A element 110, paragraphs 35, 40, 43, and 47-48)(The neural network is optimized and partitioned into stages (i.e. sub-networks) based on the profiler result.); and 
distributing the plurality of sub-networks to the plurality of clusters (Seshadri: Figures 1 and 4A element 110, paragraphs 35, 40, 43, and 47-48)(The neural network is optimized and partitioned into stages (i.e. sub-networks) based on the profiler result. The stages are allocated GPU resources for executing the neural network layers.).
As per claim 7:
Seshadri and Xu disclosed the method of claim 4, wherein the assigning of the plurality of cores to the one or more layers comprises assigning the plurality of cores to the one or more layers based on a time for a single core of the manycore system to execute a single layer of the target neural network (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result.).
As per claim 8:
Seshadri and Xu disclosed the method of claim 4, wherein the assigning of the plurality of cores to the one or more layers comprises assigning the plurality of cores to the one or more layers based on a characteristic of each layer of the corresponding sub-network (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result, which includes total compute time, output activations, and weight size (i.e. characteristics).).
As per claim 9:
Seshadri and Xu disclosed the method of claim 8, wherein the characteristic of each layer comprises any one or any combination of an amount of computational operation for processing of each layer and an amount of communication traffic for transmitting a processing result of each layer (Seshadri: Figures 1-2 elements 104-104 and 204, paragraph 38)(The number of output activations indicates an amount of data communication traffic between layers.).
As per claim 11:
Seshadri and Xu disclosed the method of claim 1, further comprising: 
generating a batch strategy comprising a number of micro-batches based on assignment states of the plurality of cores according to the assignment strategy (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 5-6 elements 116 and 602-608, paragraphs 46, 49-52, and 55)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result. Execution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of minibatches.).
As per claim 22:
Seshadri and Xu disclosed a parallelization method comprising: 
determining, for each cluster of a manycore system, a sub-network including one or more layers of a target neural network to be executed by the cluster, based on execution times of the one or more layers (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning a number of cores for execution of a neural network layer. Seshadri disclosed an optimizer that assigning sets of layers of a neural network to worker GPUs based on the neural network profile. The combination allows for assigning a number of GPU cores on a GPU for sets of layers (i.e. sub-network) of the neural network.) and an optimal execution time of the cluster (Xu: Figures 5D-E, paragraphs 46-49)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning multiple neural networks to sets of cores for optimized execution time on the cores. The combination allows for such scheduling in Seshadri, which optimizes execution time of multiple neural networks.);
determining, for each core of each cluster, a layer of the determined sub-network to be processed by the core (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning a number of cores for execution of a neural network layer. Seshadri disclosed an optimizer that assigns sets of layers of a neural network to worker GPUs based on the neural network profile. The combination allows for assigning a number of GPU cores on a GPU (i.e. cluster) for sets of layers of the neural network.); and 
generating output information by processing, in each cluster, one or more batches based on the determined sub-network and the determined layers (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 5-6 elements 116 and 602-608, paragraphs 46, 49-52, and 55)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers (i.e. sub-network) of the neural network based on the profile result. Execution of the DNN training is performed using a scheduling policy that interleaves forward and backward processing of minibatches, which generates execution results output to the next layer.).
The advantage of accessing accelerator attributes and selecting a subset of attributes for profiling neural network layers is that execution times of neural network layers can be more precise and overlapping use of GPUs by neural network layers can be more precisely determined. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the neural network layer profiling steps of Xu within Seshadri for the above advantages. 
As per claim 23:
Seshadri and Xu disclosed the method of claim 22, wherein the determining of the sub-network for each cluster comprises determining, for each cluster, the sub-network to include a maximum number of consecutive layers of the target neural network having a sum of execution times less than or equal to the optimal execution time of the cluster (Xu: Figures 5D-E, paragraphs 46-49)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed a resource evaluator assigning multiple neural networks to sets of cores for optimized execution time on the cores. The combination allows for such scheduling in Seshadri, which optimizes execution time of multiple neural networks. The sum of execution times of consecutive layers is less than optimal execution time when the cores aren’t fully utilized during layer execution.).
As per claim 24:
Seshadri and Xu disclosed the method of claim 23, wherein the determining of the sub-network for each cluster comprises, in response to one or more layers of the target neural network not being included in the sub-networks, redetermining the sub-networks based on residual computational capabilities of the clusters (Xu: Figures 3 and 5F element 303, paragraphs 44 and 51)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(Xu disclosed resorting neural network layers so that multiple neural networks can be optimized for concurrent execution. The combination allows for such re-sorting of layers in Seshadri.).

Claims 10 and 12-21 are rejected under 35 U.S.C. 103 as being unpatentable over Seshadri et al. (U.S. 2019/0362227), in view of Xu et al. (U.S. 2020/0301739), in view of Official Notice.
As per claim 10:
Seshadri and Xu disclosed the method of claim 9, wherein, for the assigning of the plurality of cores to the one or more layers, a higher priority is assigned to the amount of computational operation than to the amount of communication traffic (Xu: Figures 3 and 5B-C element 302, paragraphs 41-45)(Seshadri: Figures 1 and 4A elements 108, 110, 112A-C, 118, and 402A-G, paragraphs 28, 35, 40, and 43)(The combination allows for assigning a number of GPU cores on a GPU for sets of layers of the neural network based on the profile result, which includes total compute time, output activations, and weight size (i.e. characteristics). Official notice is given that resource allocation can given higher priority to performance for the advantage of increased system performance. Thus, it would have been obvious to one of ordinary skill in the art to implement assigning GPU cores for layers based on total compute time having the highest priority.).
As per claim 12:
Claim 12 essentially recites the same limitations of claim 1. Claim 12 additionally recites the following limitations:
A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method of claim 1 (Seshadri: Figure 1, paragraph 33)(Official notice is given that hardware logic can be implemented as software code for the advantages of cost efficient design, testing, and implementation. Thus, it would have been obvious to one of ordinary skill in the art to implement the DNN profiler, optimizer, and scheduler of Seshadri as software.).
As per claim 13:
Claim 13 essentially recites the same limitations of claim 1. Claim 13 additionally recites the following limitations:
A processor generating a profile result, determining an assignment strategy, and generating a parallelization strategy (Seshadri: Figure 1, paragraph 33)(Official notice is given that software code can be implemented as hardware logic for the advantages of increased performance. Thus, it would have been obvious to one of ordinary skill in the art to implement the DNN profiler, optimizer, and scheduler of Seshadri as processor hardware.).
As per claim 14:
The additional limitation(s) of claim 14 basically recite the additional limitation(s) of claim 2. Therefore, claim 14 is rejected for the same reason(s) as claim 2.
As per claim 15:
The additional limitation(s) of claim 15 basically recite the additional limitation(s) of claim 3. Therefore, claim 15 is rejected for the same reason(s) as claim 3.
As per claim 16:
The additional limitation(s) of claim 16 basically recite the additional limitation(s) of claim 4. Therefore, claim 16 is rejected for the same reason(s) as claim 4.
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of claim 6. Therefore, claim 17 is rejected for the same reason(s) as claim 6.
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 7. Therefore, claim 18 is rejected for the same reason(s) as claim 7.
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 8. Therefore, claim 19 is rejected for the same reason(s) as claim 8.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 11. Therefore, claim 20 is rejected for the same reason(s) as claim 11.
As per claim 21:
The additional limitation(s) of claim 21 basically recite the additional limitation(s) of claim 12. Therefore, claim 21 is rejected for the same reason(s) as claim 12.

	Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  
Chilimbi et al. (U.S. 2017/0193361), taught neural network training.
Che et al. (U.S. 2020/0319919), taught scheduling neural networks with varying batch sizes.
Venkataramani et al. (U.S. 2021/0110247), taught neural network layer scheduling.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183