RandyDETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application filed a request to participate in the Patent Prosecution Highway (PPH) program on 4/10/2022 and the request was granted on 6/27/2022.
This action is in response to the application and preliminary amendment filed on 3/11/2022. Claims 1-11 are pending and have been examined.

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The Examiner has noted applicant’s claim for foreign priority based on Chinese Application No. CN201911035345.X, filed on 10/29/2019. The examiner acknowledges that a certified copy of Chinese Application No. CN201911035345.X (in Chinese) was received on 3/11/2019, as required by 37 CFR 1.55. The instant application is a national stage entry (under 35 U.S.C. § 371) of international application no. PCT/CN2020/123878, filed on 10/27/2020.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/11/2022 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement has been considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(3) because Figures 3 and 6-13 include letters which do not measure at least .32 cm. (1/8 inch) in height (see, e.g., many of the lowercase characters in FIGs. 3 and 6-13). See MPEP 507 (A) and 37 CFR 1.84(p)(3): Numbers, letters, and reference characters must measure at least .32 cm. (1/8 inch) in height.
 Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities:
In paragraph 1, the recitation of “an application claiming the priority to the Chinese Patent Application No. 201911035345.X” is grammatically incorrect and should read “an application claiming [[the]] priority to [[the]] Chinese Patent Application No. 201911035345.X”. Appropriate correction is required.
The specification includes recitations of “Phases” and “one Phase” (see, e.g., paragraphs 5-6, 13, 27-31, 33, 41-44, 46-47, 52, 55-60, 79-81 and 84-86 and the Abstract). The “Phases” and “Phase” recited in the specification are not proper nouns or acronyms. As such, occurrences of phases and phase should not be capitalized. Appropriate correction is required.
In the first sentence of paragraph 73, the recitation of “perform iteration of the two stages of balancing” is grammatically incorrect and appears to either be missing the letter “s” at the end of “iteration” (i.e., to read “perform iterations”), or is missing the word “an” between “perform” and “iteration” (i.e., to read “perform an iteration”). Appropriate correction is required.

Claim Objections
Claims 1-11 are objected to because of the following informalities: 
Independent claims 1 and 10, and dependent claims 2, 4, 6-8 each include recitations of “Phases” (see, e.g., lines 6 and 8 of claim 1, lines 7 and 13-14 of claim 4, lines 2-3 of claim 8, and lines 5, 7 and 10 of claim 10) and “Phase” (see, e.g., lines 6 and 10 of claim 4 and line 4 of claim 8). The “Phases” and “Phase” recited in these claims are not proper nouns or acronyms. As such, recitations of phases and phase should not be capitalized. Appropriate correction is required.
In lines 5-6 of claim 9, the recitation of “performing iteration of the first stage of balancing and the second stage of balancing” is grammatically incorrect and appears to be missing the word “an” between “performing” and “iteration”. Appropriate correction is required.
	Also, claims 2-9 and 11, which each depend directly or indirectly from claim 1, are objected to based on their respective dependencies from claim 1.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: 
a mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers … and
a balancing module configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks in claim 10.
Regarding claim 10 and the above-noted three-prong test, the recited mapping module is a generic placeholder, configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers is functional language, and there is no recitation in the claim of sufficient structure to perform the mapping. With regard to claim 10 and the above-noted three-prong test, the recited balancing module is a generic placeholder, configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks is functional language, and there is no recitation in the claim of sufficient structure to perform the acquiring.
A review of the specification shows that the following appears to be the corresponding structure is not described in the specification for the 35 U.S.C. 112(f) limitations:
Regarding the above-noted mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers and balancing module configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks recited in claim 10:
Paragraphs 6 and 80 merely repeat the claim language in stating “a mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers in a convolutional neural network to at least one processing element of the chip; and a balancing module configured to acquire the number of Phases needed by the plurality of processing elements in the chip for completing the calculation tasks, and perform a first stage of balancing on the number of Phases of the plurality of processing elements” and “a mapping module 1210 configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers in a convolutional neural network to at least one PE of the chip; and a first balancing module 1220 configured to acquire the number of Phases needed by the plurality of PEs in the chip for completing the calculation tasks, and perform a first stage of balancing on the number of Phases of the plurality of PEs.”
With reference to the mapping module 1210 and balancing modules 1220 and 1230 shown in the high level block diagrams of FIGs. 12 and 13, paragraphs 80 and 85 merely recite “a mapping module 1210 configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers in a convolutional neural network to at least one PE of the chip; and a first balancing module 1220 configured to acquire the number of Phases needed by the plurality of PEs in the chip for completing the calculation tasks.” and “a second balancing module 1230 configured to acquire the number of MACs in the plurality of Phases, and perform a second stage of balancing based on the number of MACs in the plurality of Phases.” 
With continued reference to the balancing module 1220 shown in the high level block diagrams of FIGs. 12 and 13, paragraphs 81-85 recite “the first balancing module 1220 is configured to perform the first stage of balancing on the number of Phases of the plurality of PEs”, “the first balancing module 1220 is configured to determine whether the descent direction of the first stage of balancing exists”, “the first balancing module 1220 is configured to calculate the first-stage balancing reference” and “the first balancing module 1220 is configured to perform the first stage of balancing according to the first-stage balancing reference” by performing additional determining, acquiring and extracting steps without disclosing any structure performing the disclosed steps or claimed functions.
Paragraph 91 of applicant’s specification generally states “that the modules in the devices in the embodiments may be adaptively changed and arranged in one or more devices different from those disclosed in the embodiments. The modules or units or components in the embodiments may be combined into one module or unit or component, and may also be divided into a plurality of sub-modules or sub-units or sub-components.” However, the specification does not disclose any specific structure performing the functions of the above-noted modules beyond the above-noted mentions in paragraphs 6 and 80-85 with reference to the black-boxes shown in the high level block diagrams of FIGs. 12 and 13.
The drawings merely show black-boxes designed to perform the entire claimed function (see, e.g., FIGs. 12 and 13). 
As such, the specification describes the claimed mapping module and balancing module by their respective functions without disclosing any specific structure performing the claimed functions. 
Accordingly, for these claim limitations, the written description fails to disclose both an algorithm(s) and special-purpose computer hardware to perform the algorithm(s). For more information, see MPEP § 2181.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claim 10 is rejected under 35 U.S.C. 112(a) as failing to comply with the written description requirement. 
Claim 10 contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, at the time the application was filed, had possession of the claimed invention. 
In particular, and as previously noted, the claim limitations “a mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers” and “a balancing module configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks” in claim 10 invoke 35 U.S.C. 112(f). 
However, as noted above, the written description of the current application fails to disclose the corresponding structure, material, or acts for performing each of the above-identified claimed functions and to clearly link the structure, material, or acts to the function. In particular, for each of the claimed functions, the written description fails to disclose both an algorithm(s) and special-purpose computer hardware to perform the algorithm. For more information, see MPEP § 2181.
Accordingly, claim 10 is rejected under 35 U.S.C. 112(a) as failing to comply with the written description requirement.
The following is a quotation of 35 U.S.C. 112(b):
 (b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-11 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Both of independent claims 1 and 10 recite “the number of Phases” and “the calculation tasks” (see, e.g., lines 6-7 of claim 1 and lines 5-6 of claim 10). There is insufficient antecedent basis for these limitations in these claims. Applicant did not previously introduce any, “Phases”, phases, or “number of Phases” or “calculation tasks” in these claims. Applicant previously introduced a singular “a calculation task” in line 3 of claim 1 and line 2 of claim 10. However, it is unclear whether “the calculation tasks” refers to a plurality of the previously-introduced “a calculation task”, or to another, additional set of “calculation tasks”. For the purposes of determining patent eligibility and comparison with the prior art, recitations of “the partitioned matrix” in claims 1 and 10 are being interpreted as one or more calculation tasks, which may include the previously-introduced “a calculation task”. Also for examination purposes, recitations of “the number of Phases” in claims 1 and 10 are being interpreted as any number of phases. Appropriate correction is required.
Dependent claims 4 and 7 recite “the number of Multiply Accumulate operations (MACs)” and “the number of MACs” (see, e.g., line 5 of claim 4 and line 6 of claim 7). There is insufficient antecedent basis for these limitations in these claims. Applicant did not previously introduce any, “number of Multiply Accumulate operations (MACs)” or “number of MACs” in these claims, their intervening claims (claim 3 in the case of claim 4), or their base claim, claim 1. For examination purposes, recitations of “the number of Multiply Accumulate operations (MACs)” and “the number of MACs” in claims 4 and 7 are being interpreted as any number of Multiply Accumulate operations or MACs. Appropriate correction is required.
Claims 2-9 and 11, which each depend directly or indirectly from claim 1, are rejected under 35 U.S.C. 112(b) as being indefinite under the same rationale as claim 1.
Also, claim 6, which depends directly from claim 4, is rejected under 35 U.S.C. 112(b) as being indefinite under the same rationale as claim 4.
Additionally, claims 8 and 9, which both depend directly from claim 7, are rejected under 35 U.S.C. 112(b) as being indefinite under the same rationale as claim 7.
As discussed above, the claim limitations “a mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers” and “a balancing module configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks” in claim 10 invoke 35 U.S.C. 112(f). 
However, as also discussed above with regard to the rejection of claim 10 under 35 U.S.C. 112(a), the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the specification fails to clearly link the structure, material, or acts to the function for the limitations “a mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers” and “a balancing module configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks” in claim 10. However, as noted above, there is insufficient disclosure in the specification of algorithms and specific computer hardware for implementing the claimed mapping module and balancing module. As such, the above-noted limitations recited in claim 10 are indefinite. 
For instance, as noted above, paragraphs 6 and 80 merely repeat the claim language in stating “a mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers in a convolutional neural network to at least one processing element of the chip; and a balancing module configured to acquire the number of Phases needed by the plurality of processing elements in the chip for completing the calculation tasks, and perform a first stage of balancing on the number of Phases of the plurality of processing elements” and “a mapping module 1210 configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers in a convolutional neural network to at least one PE of the chip; and a first balancing module 1220 configured to acquire the number of Phases needed by the plurality of PEs in the chip for completing the calculation tasks, and perform a first stage of balancing on the number of Phases of the plurality of PEs.” Further, for example, as also noted above, paragraph 91 of applicant’s specification generally states “that the modules in the devices in the embodiments may be adaptively changed and arranged in one or more devices different from those disclosed in the embodiments. The modules or units or components in the embodiments may be combined into one module or unit or component, and may also be divided into a plurality of sub-modules or sub-units or sub-components.” However, the specification does not disclose any specific structure performing the functions of the above-noted modules beyond the above-noted mentions in paragraphs 6 and 80-85 with reference to the black-boxes shown in the high level block diagrams of FIGs. 12 and 13. As also noted above, the drawings merely show black-boxes designed to perform the entire claimed function (see, e.g., FIGs. 12 and 13). 
Thus, the specification describes the claimed mapping module and balancing module by their respective functions without disclosing any specific structure performing the claimed functions. 
Therefore, claim 10 is indefinite and is rejected under 35 U.S.C. 112(b) for at least this reason. For the purposes of determining patent eligibility and comparison with the prior art, the examiner is interpreting the claimed mapping module and balancing module as any hardware or software modules, units, components or elements capable of performing the claimed functions.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Seshadri et al. (U.S. Patent Application Pub. No. 2019/0362227 A1, hereinafter “Seshadri”) in view of non-patent literature Zhan et al. ("Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking." 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD). IEEE, September 2019: 55-60, hereinafter “Zhan”).
With respect to claim 1, Seshadri discloses the invention as claimed including a neural network mapping method applied to a chip comprising a plurality of processing elements (see, e.g., Abstract, paragraphs 43 and 77, “Layers of a deep neural network (DNN) are partitioned into stages … The stages are assigned to the worker computing devices”, “Once the DNN model has been partitioned into stages, the stages are individually assigned to GPUs 118 in the worker computing devices 112”, “A computer-implemented method for parallelizing training of a DNN” [i.e., a DNN/neural network mapping/assigning technique/method applied to a chip/worker computing device including a plurality of GPUs/processing elements]), comprising:
mapping a … task for a preset feature map of each network layer in a plurality of network layers in a convolutional neural network to at least one processing element of the chip (Paragraphs 5, 40 and 44 of applicant’s specification merely repeat the claim language without defining what is meant by “a preset feature map” and paragraph 28 of applicant’s specification states “The network needs 16 Phases to calculate an input feature map”. Therefore, a task or “a calculation task for a preset feature map of each network layer”, under the BRI, in light of the specification, is any calculation, computation or work associated with input data/features for a neural network layer) (see, e.g., paragraphs 2-3, 25 and 40-43, “DNNs can be utilized to solve complex classification problems such as … feature extraction”, “performance of DNNs stems from their ability to extract high-level features from input data” [i.e., input data/features for DNN/neural network layers], “A DNN model generally consists of a sequence of layers of different types (e.g. convolutional,” [i.e., DNN is a convolutional neural network], “generate optimized layer assignments … the DNN optimizer 108 partitions the DNN model 100 into stages. Each of the stages includes one or more layers of the DNN model” [i.e., assigning/mapping stages/phases to each DNN/network layer in a plurality of layers], “The partitioning of the DNN model 100 might … configure the computing devices 112 used to train the DNN model 100 to perform approximately the same amount of processing during training.”, “the stages can be configured for model parallel processing … multiple worker computing devices 112 can be assigned to a given stage, each processing different minibatches during execution.”, “Once the DNN model has been partitioned into stages, the stages are individually assigned to GPUs 118 in the worker computing devices 112 that will train the DNN model … The stage that contains the input layer might be referred to herein as the input stage” [i.e., assigning/mapping a training/processing task for an input layer/input feature map to at least one processing element/GPU of the chip/worker computing device]);
acquiring the number of Phases needed by the plurality of processing elements in the chip for completing the … tasks (as indicated above, “the number of Phases” has been interpreted as any number of phases, and “the calculation tasks” has been interpreted as one or more calculation tasks, which may include the previously-introduced “a calculation task”. Paragraphs 5-6 of applicant’s specification merely repeat the claim language and paragraphs 29 and 33 of applicant’s specification state “Both the Phase and the MAC in the present disclosure may be taken as a load unit, that is, a unit for describing workload, which is represented as working duration for a chip.” and “Phase lengths of the chip (a Phase length is the number of clock cycles in the Phase)”. Therefore, “Phases”, under the broadest reasonable interpretation (BRI), in light of the specification, are any units, stages, durations, batches, steps or phases of work) (see, e.g., paragraph 40, “In order to generate the optimized layer assignments 110, the DNN optimizer 108 partitions the DNN model 100 into stages. Each of the stages includes one or more layers of the DNN model” [i.e., acquire a number of stages/phases needed by the processing elements/worker computing devices to complete calculations for the training task]), and 
performing a first stage of balancing on the number of Phases of the plurality of processing elements (see, e.g., paragraphs 33 and 42-43, “pipeline parallel DNN training … uses data parallelism for selected subsets of layers to balance computation load among worker devices”, “stages … for model parallel processing. Some or all of the stages can also be configured for data parallel processing. When data parallel processing is used, multiple worker computing devices 112 can be assigned to a given stage, each processing different minibatches during execution”, “Each stage is mapped to a separate GPU 118 that performs both the forward and backward pass for all the layers of that stage. The stage that contains the input layer might be referred to herein as the input stage” [i.e., perform a first stage of balancing on the input layer stage/phase of the GPUs/processing elements]); and
mapping, based on the number of the Phases of the plurality of processing elements obtained after the first stage of balancing, the … task for the preset feature map of each network layer in the plurality of network layers in the convolutional neural network to at least one processing element of the chip (as indicated above, “Phases”, under the BRI, in light of the specification, are any units, stages, durations, batches, steps or phases of work. Paragraphs 5, 42 and 60 of applicant’s specification repeat the claim language and paragraph 28 of applicant’s specification states “The network needs 16 Phases to calculate an input feature map”. Therefore, mapping the task or “the calculation task for the preset feature map of each network layer”, under the BRI, in light of the specification, is assigning or mapping any calculation, computation or work associated with input data/features for a neural network layer) (see, e.g., paragraphs 43 and 53-54, “Each stage is mapped to a separate GPU 118 that performs both the forward and backward pass for all the layers of that stage.”, “On completing the forward pass for a minibatch, each stage asynchronously sends its output activations to the next stage, while simultaneously starting to perform work for another minibatch.”, “deterministic round-robin load balancing can be utilized to distribute work from the previous stages across the GPUs. This deterministic loadbalancing ensures that backward work for a minibatch passes through the same stages it passed through in its forward work phase.” [i.e., mapping/assigning based on the number of stages/phases of the GPUs/processing elements obtained after the previous/first stage of balancing the work/task for the input feature of each DNN/network layer the DNN/network layers to at least one GPU/processing element]).
Although Seshadri substantially discloses the claimed invention, Seshadri is not relied on for explicitly disclosing mapping a calculation task for … each network layer in a plurality of network layers in a convolutional neural network to at least one processing element; 
acquiring the number of Phases needed by the plurality of processing elements … for completing the calculation tasks … ; and
mapping, based on the number of the Phases of the plurality of processing elements …, the calculation task for the preset feature map of each network layer in the plurality of network layers in the convolutional neural network to at least one processing element.
In the same field, analogous art Zhan teaches mapping a calculation task for … each network layer in a plurality of network layers in a convolutional neural network to at least one processing element (as indicated above, “a calculation task for a preset feature map of each network layer”, under the BRI, in light of the specification, is any calculation, computation or work associated with input data/features for a neural network layer) (see, e.g. pages 56 and 58, “DNN training with pipeline-hybrid parallelism, and design an algorithm to divide, model, and map to the GPU cluster … partitions the model into multiple stages; each stage includes a consecutive set of layers in the model, then each stage is mapped to n(n ≥ 1) GPUs … that performs both the forward and backward pass for every layer in that stage. The intermediate results calculated at each stage are transmitted downstream. … each stage starts calculating only when the intermediate result arrives at this stage along the connected network between two adjacent stages … to spread them to GPUs for calculation”, “the neural network hierarchy contains convolutional layer for extracting features in the early stage and … when performing feature extraction in the early stage’s convolutional layer, a large activation size is needed for transmission, and more GPUs are required for data parallelism (to handle steep calculations)” [i.e., mapping a calculation task for each network layer in a convolutional neural network to at least one GPU/processing element]);
acquiring the number of Phases needed by the plurality of processing elements … for completing the calculation tasks (as indicated above, “Phases”, under BRI, in light of the specification, are any units, stages, durations, batches, steps or phases of work) (see, e.g. pages 56 and 59, “partitions the model into multiple stages; each stage includes a consecutive set of layers in the model … dividing the model into three stages, each stage is mapped to one GPU … each stage starts calculating only when the intermediate result arrives at this stage along the connected network between two adjacent stages. … When intermediate results from previous stages arrive, we adopt a round-robin strategy (minibatchNum % stageReplicaNum) to spread them to GPUs for calculation.”, “The algorithm divides the DNN model into k stages, with some stages replicated on GPUs. Then it performs task-placement operations based on the partition algorithm, to map tasks to the corresponding GPU” [i.e., acquire the number/k of stages/phases needed by the GPUs/processing elements for completing the calculation tasks]) … ; and
mapping, based on the number of the Phases of the plurality of processing elements …, the calculation task for the preset feature map of each network layer in the plurality of network layers in the convolutional neural network to at least one processing element (as indicated above, “Phases”, under BRI, in light of the specification, are any units, stages, durations, batches, steps or phases of work, and mapping “the calculation task for the preset feature map of each network layer”, under the BRI, in light of the specification, is assigning or mapping any calculation, computation or work associated with input data/features for a neural network layer) (see, e.g. pages 56 and 59, “DNN training with pipeline-hybrid parallelism, and design an algorithm to divide, model, and map to the GPU cluster … dividing the model into three stages, each stage is mapped to one GPU … When intermediate results from previous stages arrive, we adopt a round-robin strategy (minibatchNum % stageReplicaNum) to spread them to GPUs for calculation.”, “the input of the system is a model architecture, hyperparameter of the model and training dataset” [i.e., mapping, based on the number of stages/phases of GPUs/processing elements, the calculation task for the input data/model features of each DNN layer in the DNN/neural network to at least one GPU/processing element for calculation]).
Seshadri and Zhan are analogous art because they are both directed to systems and methods for load balancing stages/phases of tasks for neural networks by mapping/assigning the tasks to GPUs/processing elements of worker/computing devices (See, e.g., Seshadri, Abstract and paragraphs 33, 43 and 54, and Zhan, Abstract and page 56, section II).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified “system disclosed herein for pipeline parallel DNN training” of Seshadri where Seshadri’s “disclosed system also uses data parallelism for selected subsets of layers to balance computation load among worker devices” and Seshadri’s system configures “stages … for model parallel processing. Some or all of the stages can also be configured for data parallel processing. When data parallel processing is used, multiple worker computing devices 112 can be assigned to a given stage, each processing different minibatches during execution” and “the stages are individually assigned to GPUs 118 in the worker computing devices” (See, Seshadri, paragraphs 33 and 42-43) to incorporate the teachings of Zhan to provide a method, “Pipe-torch, a pipeline-hybrid parallelism approach” that “formulate[s] DNN training with pipeline-hybrid parallelism, and design[s] an algorithm to divide, model, and map to the GPU cluster” and “partitions the model into multiple stages; each stage includes a consecutive set of layers in the model, then each stage is mapped to n(n ≥ 1) GPUs for data parallel that performs both the forward and backward pass for every layer in that stage.” (See, e.g., Zhan, page 56, sections I-II A). Doing so would have allowed Seshadri to use Zhan’s “Pipe-torch [approach that] first adopts pipeline model parallelism to improve GPU usage, while simultaneously harnessing data parallel strategies for specific layers (such as the convolution layer) because of layer characteristics. Second, for load balance, Pipe-torch formulates pipeline-hybrid parallelism and employs an algorithm for guaranteed load balance”, as suggested by Zhan (See, e.g., Zhan, page 56, section II). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

With respect to independent claim 10, claim 10 is substantially similar to claim 1 and therefore is rejected on the same ground as claim 1, discussed above. In particular, claim 10 is an apparatus claim that corresponds to the method of claim 1. 
In addition, Seshadri further discloses a neural network mapping apparatus applied to a chip (see, e.g., Abstract and paragraphs 43 and 85, “Layers of a deep neural network (DNN) are partitioned into stages … The stages are assigned to the worker computing devices”, “the above-described subject matter can be implemented as a computer-controlled apparatus”, “Once the DNN model has been partitioned into stages, the stages are individually assigned to GPUs 118 in the worker computing devices 112”, “A computing device, comprising: one or more processors … to: partition the layers of a DNN model into a plurality of stages, wherein each of the plurality of stages comprises one or more of the layers of the DNN model, and wherein the partitioning is optimized to minimize a time to train the DNN model; and assign at least one of the plurality of stages to each of a plurality of worker computing devices, the computing devices configured to process batches of DNN training data to train the DNN” [i.e., a DNN/neural network mapping/assigning apparatus/device applied to a chip/worker computing device]), comprising:
a mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers … and a balancing module configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks (as indicated above, the “mapping module configured to map a calculation task for a preset feature map of each network layer in a plurality of network layers” and the “balancing module configured to acquire the number of Phases needed by a plurality of processing elements in the chip for completing the calculation tasks” have been interpreted as any hardware or software modules, units, components or elements capable of performing the claimed functions) (see, e.g., paragraphs 36-37 and 60, “the logical operations described herein … can be implemented (1) as a sequence of computer implemented acts or program modules running on a computing device and/or (2) as interconnected machine logic circuits or circuit modules within a computing device”, “the logical operations described herein are referred to variously as … devices, acts, or modules. These … devices, acts and modules can be implemented in hardware, software, firmware, in special-purpose digital logic, and any combination thereof.”, “While the subject matter described herein is presented in the general context of server computers performing parallelized training of a DNN model, those skilled in the art will recognize that other implementations can be performed in combination with other types of computing systems and modules.” [i.e., modules configured to/capable of performing the assigning/mapping and acquiring/determining steps for DNN/network layers, which are substantially similar to claim 1, see above citations to Seshadri and Zhan regarding these substantially similar limitations of claim 1]).

Examiner’s Note: claim 11, as drafted, depends from claim 1. If applicant intended for claim 11 to be an independent claim, the examiner suggests that one way to do so is to amend the last portion of claim 11 to explicitly recite the limitations of claim 1 instead of the current recitation of an “wherein the program is executed by a processor to perform the neural network mapping method of claim 1”.
Seshadri further discloses a non-transitory computer-readable storage medium, storing a computer program, wherein the program is executed by a processor to perform (see, e.g., paragraphs 61 and 64, “The computer 700 further includes a mass storage device 712 for storing an operating system 722, application programs, and other types of programs. The mass storage device 712 can also be configured to store other types of programs and data, such as the DNN definition 102, the DNN profiler 104, the DNN profile 106, the DNN optimizer 108, and the optimized layer assignments 110”, “computer storage media can include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules … or any other medium that can be used to store the desired information and which can be accessed by the computer 700”) the neural network mapping method of claim 1 (as discussed above, Seshadri in view of Zhan teaches the method of claim 1, see above citations to Seshadri and Zhan regarding the limitations of claim 1).

Claims 7 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Seshadri in view of Zhan as applied to claim 1, and further in view of Baum et al. (U.S. Patent Application Pub. No. 2018/0285725 A1, hereinafter “Baum”). 
Regarding claim 7, as discussed above, Seshadri in view of Zhan teaches the method of claim 1.
Seshadri further discloses wherein after mapping, based on the number of the Phases of the plurality of processing elements obtained after the first stage of balancing, the … task for the preset feature map of each network layer in the plurality of network layers in the convolutional neural network to at least one processing element of the chip (as indicated above, the “Phases”, under the BRI, in light of the specification, are any units, stages, durations, batches, steps or phases of work, and mapping the task or “the calculation task for the preset feature map of each network layer”, under the BRI, in light of the specification, is assigning or mapping any calculation, computation or work associated with input data/features for a neural network layer) (see, e.g., paragraphs 43 and 53-54, “Each stage is mapped to a separate GPU 118 that performs both the forward and backward pass for all the layers of that stage.”, “On completing the forward pass for a minibatch, each stage asynchronously sends its output activations to the next stage, while simultaneously starting to perform work for another minibatch.”, “deterministic round-robin load balancing can be utilized to distribute work from the previous stages across the GPUs. This deterministic loadbalancing ensures that backward work for a minibatch passes through the same stages it passed through in its forward work phase.” [i.e., after mapping/assigning based on the number of stages/phases of the GPUs/processing elements obtained after the previous/first stage of balancing the work/task for the input feature of each DNN/network layer the DNN/network layers to at least one GPU/processing element]), the method further comprises: …
mapping the … tasks for the preset feature map of the plurality of network layers in the convolutional neural network to at least one processing element of the chip subjected to the second stage of balancing (as indicated above, the “calculation tasks for the preset feature map of the plurality of network layers”, under the BRI, in light of the specification, is any calculation, computation or work associated with input data/features for neural network layers) (see, e.g., paragraphs 2-3, 25 and 40-43, “DNNs can be utilized to solve complex classification problems such as … feature extraction”, “performance of DNNs stems from their ability to extract high-level features from input data” [i.e., input data/features for the plurality of DNN/neural network layers], “A DNN model generally consists of a sequence of layers of different types (e.g. convolutional,” [i.e., the DNN is a convolutional neural network], “generate optimized layer assignments … the DNN optimizer 108 partitions the DNN model 100 into stages. Each of the stages includes one or more layers of the DNN model” [i.e., assigning/mapping stages/phases – including a second stage, to the plurality of DNN layers/network layers], “The partitioning of the DNN model 100 might … configure the computing devices 112 used to train the DNN model 100 to perform approximately the same amount of processing during training.”, “the stages can be configured for model parallel processing … multiple worker computing devices 112 can be assigned to a given stage, each processing different minibatches during execution.”, “Once the DNN model has been partitioned into stages, the stages are individually assigned to GPUs 118 in the worker computing devices 112 that will train the DNN model … The stage that contains the input layer might be referred to herein as the input stage and the stage that contains the output layer might be referred to herein as the output stage.” [i.e., assigning/mapping a training/processing task for an input layer/input feature map to at least one processing element/GPU of the chip/worker computing device subjected to the second/output stage of balancing after the input stage]).
Although Seshadri substantially discloses the claimed invention, Seshadri is not relied on for explicitly disclosing after mapping … the calculation task for the preset feature map of each network layer in the plurality of network layers in the convolutional neural network to at least one processing element … the method further comprises: …
mapping the calculation tasks for the preset feature map of the plurality of network layers in the convolutional neural network to at least one processing element.
In the same field, analogous art Zhan teaches after mapping … the calculation task for the preset feature map of each network layer in the plurality of network layers in the convolutional neural network to at least one processing element (as indicated above, the “calculation task for the preset feature map of each network layer”, under the BRI, in light of the specification, is any calculation, computation or work associated with input data/features for a neural network layer) (see, e.g. pages 56 and 58, “DNN training with pipeline-hybrid parallelism, and design an algorithm to divide, model, and map to the GPU cluster … partitions the model into multiple stages; each stage includes a consecutive set of layers in the model, then each stage is mapped to n(n ≥ 1) GPUs … that performs both the forward and backward pass for every layer in that stage. The intermediate results calculated at each stage are transmitted downstream. … each stage starts calculating only when the intermediate result arrives at this stage along the connected network between two adjacent stages … to spread them to GPUs for calculation”, “the neural network hierarchy contains convolutional layer for extracting features in the early stage and … when performing feature extraction in the early stage’s convolutional layer, a large activation size is needed for transmission, and more GPUs are required for data parallelism (to handle steep calculations)” [i.e., after mapping a calculation task for each network layer in a convolutional neural network to at least one GPU/processing element]) … the method further comprises: …
mapping the calculation tasks for the preset feature map of the plurality of network layers in the convolutional neural network to at least one processing element (as indicated above, mapping “the calculation tasks for the preset feature map of the plurality of network layers”, under the BRI, in light of the specification, is assigning or mapping any calculations, computations or work associated with input data/features for a neural network layer) (see, e.g. pages 56 and 59, “DNN training with pipeline-hybrid parallelism, and design an algorithm to divide, model, and map to the GPU cluster … dividing the model into three stages, each stage is mapped to one GPU … When intermediate results from previous stages arrive, we adopt a round-robin strategy (minibatchNum % stageReplicaNum) to spread them to GPUs for calculation.”, “the input of the system is a model architecture, hyperparameter of the model and training dataset” [i.e., mapping the calculation tasks for the input data/model features of the plurality of DNN/neural network layers to at least one GPU/processing element for calculation]).
Seshadri and Zhan are analogous art because they are both directed to systems and methods for load balancing stages/phases of tasks for neural networks by mapping/assigning the tasks to GPUs/processing elements of worker/computing devices (See, e.g., Seshadri, Abstract and paragraphs 33, 43 and 54, and Zhan, Abstract and page 56, section II).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified “system disclosed herein for pipeline parallel DNN training” of Seshadri where Seshadri’s “disclosed system also uses data parallelism for selected subsets of layers to balance computation load among worker devices” and Seshadri’s system configures “stages … for model parallel processing. Some or all of the stages can also be configured for data parallel processing. When data parallel processing is used, multiple worker computing devices 112 can be assigned to a given stage, each processing different minibatches during execution” and “the stages are individually assigned to GPUs 118 in the worker computing devices” (See, Seshadri, paragraphs 33 and 42-43) to incorporate the teachings of Zhan to provide a method, “Pipe-torch, a pipeline-hybrid parallelism approach” that “formulate[s] DNN training with pipeline-hybrid parallelism, and design[s] an algorithm to divide, model, and map to the GPU cluster” and “partitions the model into multiple stages; each stage includes a consecutive set of layers in the model, then each stage is mapped to n(n ≥ 1) GPUs for data parallel that performs both the forward and backward pass for every layer in that stage.” (See, e.g., Zhan, page 56, sections I-II A). Doing so would have allowed Seshadri to use Zhan’s “Pipe-torch [approach that] first adopts pipeline model parallelism to improve GPU usage, while simultaneously harnessing data parallel strategies for specific layers (such as the convolution layer) because of layer characteristics. Second, for load balance, Pipe-torch formulates pipeline-hybrid parallelism and employs an algorithm for guaranteed load balance”, as suggested by Zhan (See, e.g., Zhan, page 56, section II). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.
 Although Seshadri in view of Zhan substantially teaches the claimed invention, Seshadri in view of Zhan is not relied on to teach acquiring the number of MACs in the plurality of Phases, and performing a second stage of balancing based on the number of MACs in the plurality of Phases.
In the same field, analogous art Baum teaches acquiring the number of MACs in the plurality of Phases, and performing a second stage of balancing based on the number of MACs in the plurality of Phases (as indicated above, “the number of MACs” has been interpreted as any number of Multiply Accumulate operations or MACs, and “Phases”, under the BRI, in light of the specification, are any units, stages, durations, batches, steps or phases of work) (see, e.g., paragraphs 106-107, 110, 115, 119 and 124, “The NN processor of the present invention uses several design principles in its implementation including … leveraging both the time-domain and the space-domain to optimize utilization and efficiency; and ( 4) balanced load over available system resources”, “ANNs are implemented in three stages: modeling, training, and inference, all three of which are addressed to some extent by the NN processor” [i.e., balancing based on resources in time-domains/durations and stages/phases], “the compute fabric (or compute capability) provided by the computation units that are organized into various aggregation levels or hierarchical levels, such as PEs, subclusters, clusters, NN cores … The compute fabric comprises the basic compute elements that are configured to address the special nature of the computational needs of ANNs [artificial neural networks]. Several features of the compute fabric include: … a large number of multiply and accumulate operations”, “The subcluster, in tum, comprises the most basic units, namely the processing elements (PEs) 76 which are composed of a multiply and accumulate (MAC) circuit and local memory. It is the PE hierarchical level that contains a set of neuron entities found in a typical neural network”, “the NN processor is configured to be substantially balanced in terms of compute and memory resources to ensure the system achieves maximal utilization” [i.e., balancing based on compute resources – including MAC circuit resources], “Layer 1 then performs all the required multiply and accumulate (MAC) operations … and finally signals to layer 2, which in turn repeats the same steps. When layer 2 is finished, it signals to the output layer to send the results outside the NN core. [i.e., acquire a required number of MAC operations in the layer steps/stages/phases and performing a steps of a second layer of load balancing based the number of MACs]).
Seshadri, Zhan and Baum are analogous art because they are each related to systems and methods for load balancing stages/phases of tasks for neural networks by mapping/assigning the tasks to GPUs/processing elements/system resources of worker/computing devices (See, e.g., Seshadri, Abstract and paragraphs 33, 43 and 54, and Zhan, Abstract and page 56, section II, and Baum, paragraphs 17, 102, 106, 119 and 140).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Seshadri in view of Zhan to incorporate the teachings of Baum to provide a “novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs) and incorporating configurable and programmable sliding window based memory access” where “The NN processor … uses several design principles in its implementation including … leveraging both the time-domain and the space-domain to optimize utilization and efficiency; and (4) balanced load over available system resources.” [i.e., load balancing] where the “NN processor comprises … processing elements (PEs) 76 which are composed of a multiply and accumulate (MAC) circuit and local memory.” (See, e.g., Baum, Abstract and paragraphs 106 and 115). Doing so would have allowed Seshadri in view of Zhan to use Baum’s neural network (NN) processing core and NN processor “such that … the NN processor is configured to be substantially balanced in terms of compute and memory resources to ensure the system achieves maximal utilization”, as suggested by Baum (see, e.g., Baum, paragraph 119).

Regarding claim 9, as discussed above, Seshadri in view of Zhan and Baum teaches the method of claim 7.
Seshadri further discloses after mapping the … tasks for the preset feature map of the plurality of network layers in the convolutional neural network to at least one processing element of the chip subjected to the second stage of balancing (as indicated above, mapping “the calculation tasks for the preset feature map of the plurality of network layers”, under the BRI, in light of the specification, is assigning or mapping any calculations, computations or work associated with input data/features for a neural network layer) (see, e.g., paragraphs 2-3, 25 and 40-43, “DNNs can be utilized to solve complex classification problems such as … feature extraction”, “performance of DNNs stems from their ability to extract high-level features from input data” [i.e., input data/features for the plurality of DNN/neural network layers], “A DNN model generally consists of a sequence of layers of different types (e.g. convolutional,” [i.e., the DNN is a convolutional neural network], “generate optimized layer assignments … the DNN optimizer 108 partitions the DNN model 100 into stages. Each of the stages includes one or more layers of the DNN model” [i.e., assigning/mapping stages/phases – including a second stage, to the plurality of DNN layers/network layers], “The partitioning of the DNN model 100 might … configure the computing devices 112 used to train the DNN model 100 to perform approximately the same amount of processing during training.”, “the stages can be configured for model parallel processing … multiple worker computing devices 112 can be assigned to a given stage, each processing different minibatches during execution.”, “Once the DNN model has been partitioned into stages, the stages are individually assigned to GPUs 118 in the worker computing devices 112 that will train the DNN model … The stage that contains the input layer might be referred to herein as the input stage and the stage that contains the output layer might be referred to herein as the output stage.” [i.e., assigning/mapping a training/processing task for an input layer/input feature map to at least one processing element/GPU of the chip/worker computing device subjected to the second/output stage of balancing after the input stage]), the method further comprises:
performing iteration of the first stage of balancing and the second stage of balancing (see, e.g., paragraph 52, “In a startup state, the input stage admits a sufficient number of minibatches of training data … These minibatches propagate their way to the output stage. As soon as the output stage completes the forward pass for the first minibatch, it performs the backward pass for the same mini batch, and then starts alternating between performing forward and backward passes for subsequent minibatches.” [i.e., alternating/performing an iteration of the first/input stage and second/output stage of balancing for subsequent minibatches after completing the output/second stage of balancing]).
Although Seshadri substantially discloses the claimed invention, Seshadri is not relied on for explicitly disclosing after mapping the calculation tasks for the preset feature map of the plurality of network layers in the convolutional neural network to at least one processing element of the chip subjected to the second stage of balancing.
In the same field, analogous art Zhan teaches after mapping the calculation tasks for the preset feature map of the plurality of network layers in the convolutional neural network to at least one processing element of the chip subjected to the second stage of balancing (as indicated above, mapping “the calculation tasks for the preset feature map of the plurality of network layers”, under the BRI, in light of the specification, is assigning or mapping any calculations, computations or work associated with input data/features for a neural network layer) (see, e.g. pages 56 and 58, “DNN training with pipeline-hybrid parallelism, and design an algorithm to divide, model, and map to the GPU cluster … partitions the model into multiple stages; each stage includes a consecutive set of layers in the model, then each stage is mapped to n(n ≥ 1) GPUs … that performs both the forward and backward pass for every layer in that stage. The intermediate results calculated at each stage are transmitted downstream. … each stage starts calculating only when the intermediate result arrives at this stage along the connected network between two adjacent stages … to spread them to GPUs for calculation”, “the neural network hierarchy contains convolutional layer for extracting features in the early stage and … when performing feature extraction in the early stage’s convolutional layer, a large activation size is needed for transmission, and more GPUs are required for data parallelism (to handle steep calculations)” [i.e., after mapping a calculation task for each network layer in a convolutional neural network to at least one GPU/processing element]).
Alternatively, Zhan also teaches performing iteration of the first stage of balancing and the second stage of balancing (see, e.g., pages 56-57, “dividing the model into three stages, each stage is mapped to one GPU … a horizontal axis indicates the time slice. … each stage starts calculating only when the intermediate result arrives at this stage along the connected network between two adjacent stages. … In the initial state, we inject a minibatch into the system at every time slice until the first minibatch completes the forward pass in the last stage. Then the first minibatch performs a backward pass in the opposite path of the forward pass. When the first minibatch completes an iteration, we inject a new minibatch into the system for training”, “Because different parts of the model execute concurrently at each stage in a time slice, to ensure that the pipeline executes in a stable state (that is, to reduce the GPU’s idle state), an algorithm must be designed that guarantees load balancing at each stage. … The goal of model partitioning and task placement is to minimize the total time of one iteration of training.” [i.e., performing an iteration of the first and second stages of load balancing]).
The motivation to combine Seshadri and Zhan is the same as discussed above with respect to claim 7.

Allowable Subject Matter
Upon overcoming all of the objections and rejections as discussed above in items 8 and 13-23, claims 2-6 and 8 are objected to as being dependent upon a rejected base claim (i.e., claim 1), but would be allowable if amended to address the rejections under 35 U.S.C. §§ 112(b) and 103 and rewritten in independent form including all of the limitations of the base claim and any intervening claims (i.e., intervening claim 2 in the case of claims 3 and 5, intervening claims 2 and 3 in the case of claim 4, intervening claims 2, 3 and 4 in the case of claim 6, and intervening claim 7 in the case of claim 8).
As discussed above, Seshadri in view of Zhan teaches the method of claim 1.
However, with regard to dependent claim 2, the prior art of record does not anticipate, nor do they render obvious in any reasonable combination to one of ordinary skill in the art at the time of Applicants' invention, the combination of recited limitations of claim 2 (i.e., “wherein performing the first stage of balancing on the number of Phases of the plurality of processing elements comprises:
determining whether a descent direction of the first stage of balancing exists; 
in a case where it is determined that performing the first stage of balancing causes a value of 1 minus a global utilization rate of the plurality of processing elements to decrease, determining that the descent direction of the first stage of balancing exists; and 
calculating, in response to a determination result that the descent direction of the first stage of balancing exists, a first-stage balancing reference, and performing the first stage of balancing according to the first-stage balancing reference”, and its base claim, independent claim 1. 
Regarding the limitation “determining whether a descent direction of the first stage of balancing exists” recited claim 2, paragraph 46 of applicant’s specification states “The descent direction is a limit of a balancing technique which may be applied in the stage, and the balancing may be performed under the limit of the direction to ensure that GlobalUse in an algorithm iteration process does not descend as much as possible. The descent direction refers to that few computing resources are wasted”. Therefore, “determining whether a descent direction of the first stage of balancing exists”, under the BRI, in light of the specification, is determining or identifying that a decreasing or fewer number of computing resources or devices are wasted, idle, or have high bandwidth/low usage or utilization as a result of a stage of balancing. 
With reference to the limitation “a global utilization rate of the plurality of processing elements” further recited claim 2, paragraphs 36-37 of applicant’s specification state “a global utilization rate (also referred to as an average utilization rate) of n PEs is: 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
” and “i and j have the same meaning and are used for representing an ith PE and a jth PE, respectively, and both i and j are natural numbers greater than 1.” Therefore, “a global utilization rate of the plurality of processing elements”, under the BRI, in light of the specification, is an overall, global mean utilization, usage or bandwidth parameter or metric of the processing elements.
Claims 3 and 5 are objected to as being dependent upon a rejected base claim (i.e., claim 1), but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims (i.e., claim 2). For example, the prior art of record does not anticipate or render obvious the limitations recited in dependent claims 3 and 5, in combination with limitations of their base claims, independent claim 1, and their intervening claim, claim 2. 

With regard to dependent claim 4, the prior art of record does not anticipate, nor do they render obvious in any reasonable combination to one of ordinary skill in the art at the time of Applicants' invention, the combination of recited limitations of claim 4 (i.e., “wherein a formula of the preset balancing vector is:
p = (pr, px, pf, py, pw)
wherein p is the preset balancing vector;
pr is configured to adjust the number of Multiply Accumulate operations (MACs) of a processing element in one Phase in a reduction loop;
px is configured to adjust the number of Phases of a processing element in a horizontal direction;
pf is configured to adjust the number of MACs of a processing element in one Phase in an output feature loop;
py is configured to adjust the number of Phases of a processing element in a vertical direction; and
pw is configured to adjust the number of Phases and the number of MACs in the Phases”, and its base claim, claim 1, and its intervening claims, claims 2 and 3.
As indicated in the section 112(b) rejection of claim 4 above, the recitation of “the number of Multiply Accumulate operations” has been interpreted as any number of Multiply Accumulate operations.

Regarding dependent claim 6, the prior art of record does not anticipate, nor do they render obvious in any reasonable combination to one of ordinary skill in the art at the time of Applicants' invention, the combination of recited limitations of claim 6 (i.e., “wherein performing the first stage of balancing according to the first-stage balancing reference comprises:
extracting a first balancing vector from the preset balancing vector, and
performing the first stage of balancing by using the first balancing vector according to the first-stage balancing reference;
wherein p1 = (px, py, pw), and p1 is the first balancing vector,
px is configured to adjust the number of Phases of a processing element in the
horizontal direction, py is configured to adjust the number of Phases of a processing element in the vertical direction, and pw is configured to adjust the number of Phases and the number of MACs in the Phases”, and its base claim, claim 1, and its intervening claims, claims 2, 3 and 4.

As discussed above, Seshadri in view of Zhan and Baum teaches the method of claim 7.
However, with regard to dependent claim 8, the prior art of record does not anticipate, nor do they render obvious in any reasonable combination to one of ordinary skill in the art at the time of Applicants' invention, the combination of recited limitations of claim 8 (i.e., “wherein acquiring the number of MACs in the plurality of Phases, and performing the second stage of balancing based on the number of MACs in the plurality of Phases comprises:
acquiring the number of MACs in each Phase, and determining whether a descent direction of the second stage of balancing exists; in a case where it is determined that performing the second stage of balancing causes the value of 1 minus the global utilization rate of the plurality of processing elements to decrease, 
determining that the descent direction of the second stage of balancing exists; and
calculating, in response to a determination result that the descent direction of the
second stage of balancing exists, a second-stage balancing reference, and performing the second stage of balancing according to the second-stage balancing reference”, and its base claim, claim 1, and its intervening claim, claim 7.
As indicated above regarding similar recitations in claim 2, “determining whether a descent direction of the second stage of balancing exists”, under the BRI, in light of the specification, is determining or identifying that a decreasing or fewer number of computing resources or devices are wasted, idle, or have high bandwidth/low usage or utilization as a result of a stage of balancing, and “a global utilization rate of the plurality of processing elements”, under the BRI, in light of the specification, is an overall, global mean utilization, usage or bandwidth parameter or metric of the processing elements.

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, is considered pertinent to applicant's disclosure. 
For example, Seide et al. (U.S. Patent Application Pub. No. 2014/0142929 A1, hereinafter “Seide”) discloses “use of a pipelined algorithm that performs parallelized computations to train deep neural networks (DNNs) … techniques for training may include … distributing a layer of the DNNs to multiple processors for processing”, “techniques may include the use of a pipelined algorithm to parallelize the training of the DNNs across multiple multi-core processors, such as multiple general-purpose graphics processing units (GPGPUs).” and “Once the DNN model has been partitioned into stages, the stages are individually assigned to GPUs 118 in the worker computing devices 112 that will train the DNN model 100. Each stage is mapped to a separate GPU”, “A computer-implemented method, comprising: providing a pipelined algorithm to train deep neural networks (DNNs)” [i.e., a DNN/neural network mapping technique/method applied to a GPU/chip including a plurality of cores/processing elements/processors] (see, e.g., Abstract, paragraphs 12 and 43 and claim 10).
The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.K.B./Examiner, Art Unit 2125


/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125