Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is the initial office action that has been issued in response to patent application 16/215,033 filed on 12/10/2018. Claims 1-14, as originally filed, are currently pending and have been considered below. Claim 1 and 10 are independent claims.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Applicant cannot rely upon the certified copy of the foreign priority application to overcome this rejection because a translation of said application has not been made of record in accordance with 37 CFR 1.55. See MPEP §§ 215 and 216.
In particular, Applicant is reminded of requirements set forth in 27 C.F.R. 1.55(g)(3)-(4) Claim for foreign priority:
“(3) An English language translation on a non-English language foreign application is not required except:
When the application is involved in an inference (see § 41.202 of this chapter) or  derivation (see part 42 of this chapter) proceeding;
When necessary to overcome the date of a reference relied upon by the examiner; or 
When specifically required by the examiner.
(4) If an English language translation of a non-English language foreign application is required, it must be filed together with a statement that the translation of the certified copy is accurate” (emphasis added).
	Since an English language translation of Application No. CN201810646003.0A has not been made of record, the Examiner notes that prior art references with filing date or publication date prior to the instant Application’s filing date of 12/10/2018 are considered applicable prior art references.

Claim Objections
Claims 2-9 and 11-14 objected to because of the following informalities: 
In claim 2, lines 13-14, “the entire deep neural network to be trained” should read “the deep neural network to be trained”
In claim 11, lines 13-14, “the entire deep neural network to be trained” should read “the deep neural network to be trained”
Claims 3-9 depend on claim 2 and do not cure the deficiencies of claim 2, therefore, claims 3-9 are objected based on the same rationale.  
Claims 12-14 depend on claim 11 and do not cure the deficiencies of claim 11, therefore, claims 12-14 are objected based on the same rationale.  

Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):


The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder 
Claim 10: 
a first dividing module, configured for dividing a deep neural network Docket No. PSNOPO11US -46- to be trained into multiple sub-networks (Specification [0136] reiterates the function, but does not provide description of the structure)
a second dividing module, configured for dividing a pre-acquired set of training samples into multiple subsets of samples (Specification [0137] reiterates the function, but does not provide description of the structure)
a network training module, configured for performing the distributed training of the deep neural network to be trained with the multiple subsets of samples based on a distributed cluster architecture and a preset scheduling method (Specification [0138] reiterates the function, but does not provide description of the structure)
Claim 11: 
a network training sub-module, configured for scheduling the multiple tasks to the multiple cloud resource nodes according to equation (Specification [0142] reiterates the function, but does not provide description of the structure)
Claim 12: 
a time computing sub-module, configured for calculating the sum of remaining run-time and data transmission time of the application of numeral p (Specification [0145] reiterates the function, but does not provide description of the structure)

a model mapping unit, configured for mapping the scheduling of the multiple tasks into a directed graph model (Specification [0148] reiterates the function, but does not provide description of the structure)
a model transforming unit, configured for transforming the directed graph model into a residual graph (Specification [0149] reiterates the function, but does not provide description of the structure)
a scheduling unit, configured for scheduling the multiple tasks to the multiple cloud resource nodes based on the preset scheduling method and the residual graph (Specification [0150] reiterates the function, but does not provide description of the structure)
Claim 14:
a model mapping unit, configured for mapping the scheduling of the multiple tasks into a directed graph model (Specification [0148] reiterates the function, but does not provide description of the structure)
a model transforming unit, configured for transforming the directed graph model into a residual graph (Specification [0149] reiterates the function, but does not provide description of the structure)
a scheduling unit, configured for scheduling the multiple tasks to the multiple cloud resource nodes based on the preset scheduling method and the residual graph (Specification [0150] reiterates the function, but does not provide description of the structure)


If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 10-14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirements. The 

Each of the limitations in claims 10-14 that contains the following generic placeholders:
first dividing module
second dividing module
network training module
network training sub-module
time computing sub-module
model mapping unit
model transforming unit
scheduling unit

invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112. Sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification (see Section 7 of the Office Action) of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions. Therefore, claims 10-14 are 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1- 14 are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite or failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1 recites the limitation “the effect of network delay through data localization" in lines 10-11.  There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an effect of network delay through data localization”.
Claim 2 recites the limitation “the remaining time required to fulfill the current distributed training" in lines 15-16. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a remaining time required to fulfill the current distributed training”.
Claim 2 recites the limitation “the number of” in line 17. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 3 recites the limitation “the calculation of" in line 1. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a calculation of”.
Claim 3 recites the limitation “the number of” in line 8. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 3 recites the limitation “the elapsed run-time of” in line 9. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an elapsed run-time of”.
Claim 3 recites the limitation “the running progress of” in line 10. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a running progress of”.
Claim 3 recites the limitation “the estimated minimum data transmission time of a task of numeral” in line 11. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an estimated minimum data transmission time of a task of numeral”.
Claim 3 recites the limitation “the waiting time till the resource of a cloud resource node of numeral n becomes idle” in lines 12-13. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a waiting time till the resource of a cloud resource node of numeral n becomes idle”.
Claim 3 recites the limitation “the data transmission time of the task” in lines 13-14. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a data transmission time of the task”.
Claim 6 recites the limitation “the number of" in line 15. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 6 recites the limitation “the number of" in line 17. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 6 recites the limitation “the maximum number of assignable tasks of the node object" in lines 18-19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a maximum number of assignable tasks of the node object”.
Claim 6 recites the limitation “the number of excessive tasks” line 19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of excessive tasks”.
Claim 6 recites the limitation “the absolute value of” line 19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an absolute value of”.
Claim 6 recites the limitation “the waiting time till” lines 27-28. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a waiting time till”.
Claim 7 recites the limitation “the number of" in line 15. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 7 recites the limitation “the number of" in line 17. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 7 recites the limitation “the maximum number of assignable tasks of the node object" in lines 18-19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a maximum number of assignable tasks of the node object”.
Claim 7 recites the limitation “the number of excessive tasks” line 19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of excessive tasks”.
Claim 7 recites the limitation “the absolute value” line 19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an absolute value”.
Claim 7 recites the limitation “the waiting time till” lines 27-28. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a waiting time till”.
Claim 8 recites the limitation “the flow" in line 6. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a flow”.
Claim 8 recites the limitation “the parameters of” in line 29. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a parameters of”.
Claim 9 recites the limitation “the flow" in line 6. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a flow”.
Claim 9 recites the limitation “the parameters of” in line 29. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a parameters of”.
Claim 10 recites the limitation “the effect of network delay through data localization" in lines 10-11. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an effect of network delay through data localization”.
Claim 11 recites the limitation “the remaining time required to fulfill the current distributed training" in lines 15-16. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a remaining time required to fulfill the current distributed training”.
Claim 11 recites the limitation “the number of” in line 17. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 12 recites the limitation “the calculation of" in line 1. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a calculation of”.
Claim 12 recites the limitation “the number of” in line 8. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a number of”.
Claim 12 recites the limitation “the elapsed run-time of” in line 9. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an elapsed run-time of”.
Claim 12 recites the limitation “the running progress of” in line 10. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a running progress of”.
Claim 12 recites the limitation “the estimated minimum data transmission time of a task of numeral” in line 11. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “an estimated minimum data transmission time of a task of numeral”.
Claim 12 recites the limitation “the waiting time till the resource of a cloud resource node of numeral n becomes idle” in lines 12-13. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a waiting time till the resource of a cloud resource node of numeral n becomes idle”.
Claim 12 recites the limitation “the data transmission time of the task” in lines 13-14. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a data transmission time of the task”.

Each of the limitations in claim 10-14 that contain the following the following generic placeholders:
first dividing module
second dividing module
network training module
network training sub-module
time computing sub-module
model mapping unit
model transforming unit
scheduling unit

invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification (see Section 7 of the Office Action) of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions. Therefore, the claims is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 

(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

In addition to the grounds stated above, each dependent claim is rejected based on the same rationale as the claim from which it depends.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Vanhoucke et al. (US 9460711 B1) in view of Zhang et al. (“Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters”) in view of Yang et al. (“Research and design of distributed training algorithm for neural networks”)
Regarding Claim 1,
Vanhoucke et al. teaches a method for accelerating distributed training of a deep neural network, comprising (Vanhoucke et al., FIG. 5 Col. 8 Lines 48-53, “FIG. 5 is a conceptual illustration 500 of an example distributed framework. In one instance, computations performed for each node of a DNN may be distributed across several computing machines 502A-D so that responsibility for computation for different nodes is assigned to different computing machines” teaches distributed training of a deep neural network).
Vanhoucke et al., FIG. 5 Col. 8 Lines 48-53, “FIG. 5 is a conceptual illustration 500 of an example distributed framework. In one instance, computations performed for each node of a DNN may be distributed across several computing machines 502A-D so that responsibility for computation for different nodes is assigned to different computing machines” teaches multiple model instances (corresponds to the multiple sub-networks) of a deep neural network).  
dividing a pre-acquired set of training samples into multiple subsets of samples (Vanhoucke et al., FIG. 6 Col. 8 Lines 63-67, “In one example, to apply SGD based on training data, the training data may be divided into a number of subsets of training data 602A-C. Each subset of training data 602A-C may then be processed on a respective model instance, or copy, of a DNN model” teaches the training data being divided into multiple subsets of training data).   
… wherein, the multiple sub-networks are simultaneously trained, and training progresses of parallel sub-networks are synchronized to accelerate the distributed training of the deep neural network (Vanhoucke et al., FIG. 5-6 and Col. 8 Lines 49-63, “In one instance, computations per formed for each node of a DNN may be distributed across several computing machines 502A-D so that responsibility for computation for different nodes is assigned to different computing machines. For instance, computation for node 504A may be performed by machine 502A while computation for node 504B may be performed by machine 502B. For connections between nodes that cross partition boundaries, values computed at the nodes may be transmitted between the computing machines 502A-D. In some examples, to further decrease processing time, computation may be distributed across multiple model instances of a DNN. FIG. 6 is a conceptual illustration 600 of distributed processing of multiple model instances of an acoustic model” teaches the distributed training across multiple model instances (corresponds to the sub-network) of a deep neural network that further decreases processing time (corresponds to accelerating the training). FIG. 2 and Col. 5 Lines 25-32, “Method 200 may include one or more operations, functions, or actions as illustrated by one or more of blocks 202-206. Although the blocks are illustrated in a sequential order, these blocks may also be performed in parallel, and/or in a different order than those described herein. Also, the various blocks may be combined into fewer blocks, divided into additional blocks, and/or removed based upon the desired implementation” teaches the acoustic model having the capability to be processed (corresponds to being trained) in parallel). 
Vanhoucke et al. does not appear to explicitly teach performing the distributed training of the deep neural network to be trained with the multiple subsets of samples based on a distributed cluster architecture and a preset scheduling method
However, Zhang et al., teaches performing the distributed training of the deep neural network to be trained with the multiple subsets of samples based on a distributed cluster architecture and a preset scheduling method (Zhang et al., Section 1 Pg. 182, “With this motivation, we design Poseidon, an efficient communication architecture for data-parallel DL on distributed GPUs. Poseidon exploits the sequential layer-by-layer structure in DL programs, finding independent GPU computation operations and network communication operations in the training algorithm, so that they can be scheduled together to reduce bursty network communication” teaches distributed training of a deep neural network based on a preset schedule. Section 2 Pg. 182, “In this section, we formulate the DL training as an iterative-convergent algorithm, and describe parameter server (PS) and sufficient factor broadcasting (SFB) for parallelizing such computation on clusters” teaches distributed training based on clusters. Section 2.1 Pg. 183, “we usually feed a batch of training samples D(t) (D(t) ⊂ D) at each training iteration t” teaches each training iteration allows training using one batch (subset), therefore multiple iterations would use multiple batches (subsets). Fig. 1, teaches a convolutional neural network (corresponds to a deep neural network. A CNN is a type of DNN)).
Vanhoucke et al. in view of Zhang et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “distributed machine learning”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Vanhoucke et al. with Zhang et al., with motivation of performing the distributed training of the deep neural network to be trained with the multiple subsets of samples based on a distributed cluster architecture and a preset scheduling method. “We present Poseidon, a scalable and efficient communication architecture for large-scale DL on distributed GPUs. Poseidon’s design is orthogonal to TensorFlow, Caffe or other DL frameworks – the techniques present in Poseidon could be used to produce a better distributed version of them. We empirically show that Poseidon constantly delivers linear speedups using up to 32 nodes and limited bandwidth on a variety of neural network, datasets and computation engines, and 
Vanhoucke et al. in view of Zhang et al. does not appear to explicitly teach wherein, the training of each sub-network is accelerated by reducing the effect of network delay through data localization and wherein, the data localization means that a task is performed at a preset cloud resource node to minimize data transmission time
However, Yang et al., teaches wherein, the training of each sub-network is accelerated by reducing the effect of network delay through data localization (Yang et al., Section 3.1 Pg. 4045-4046, “In this paper, a new distributed DNN model based on multi-agent from another point of distributed is proposed to solve the memory bottleneck of large sample set. It improves the convergent speed through making use of the current network resources. The model is build based on a Hybrid Model combined the Master-Slave Model [2] with the Node-Only Model [2], every agent in this model is peer to peer that can offer its computation service and get help from other agents when training its sample set. So many distributed agents in different location can process the large sample. Those free agents form a Node-Only Model, just like Figure 2, when there is no computation mission. And they will change to be a Master-Slave Model, which is shown in Figure 3, when one of them informs a computation mission. Those resources have been engaged in computation will be released at the end of the mission, and return to be a free agent that can reform a new Node-Only Model waiting for the next mission. Therefore, the agent’s execute model in the DNN model is defined as Figure 4” 
… wherein, the data localization means that a task is performed at a preset cloud resource node to minimize data transmission time (Yang et al., Figure 4 and Section 3.1 Pg. 4045-4046, “In this paper, a new distributed DNN model based on multi-agent from another point of distributed is proposed to solve the memory bottleneck of large sample set. It improves the convergent speed through making use of the current network resources. The model is build based on a Hybrid Model combined the Master-Slave Model [2] with the Node-Only Model [2], every agent in this model is peer to peer that can offer its computation service and get help from other agents when training its sample set. So many distributed agents in different location can process the large sample. Those free agents form a Node-Only Model, just like Figure 2, when there is no computation mission. And they will change to be a Master-Slave Model, which is shown in Figure 3, when one of them informs a computation mission. Those resources have been engaged in computation will be released at the end of the mission, and return to be a free agent that can reform a new Node-Only Model waiting for the next mission. Therefore, the agent’s execute model in the DNN model is defined as Figure 4” teaches a computation mission (corresponds the task) being performed at a preset node to improve convergent speed (corresponds to minimize data transmission time)).

Regarding Claim 10,
Vanhoucke et al. teaches an apparatus for accelerating distributed training of a deep neural network, comprising (Vanhoucke et al., FIG. 5 Col. 8 Lines 48-53, “FIG. 5 is a conceptual illustration 500 of an example distributed framework. In one instance, computations performed for each node of a DNN may be distributed across several computing machines 502A-D so that responsibility for computation for different nodes is assigned to different computing machines” teaches distributed training of a deep neural network).
Vanhoucke et al., FIG. 5 Col. 8 Lines 48-53, “FIG. 5 is a conceptual illustration 500 of an example distributed framework. In one instance, computations performed for each node of a DNN may be distributed across several computing machines 502A-D so that responsibility for computation for different nodes is assigned to different computing machines” teaches multiple model instances (corresponds to the multiple sub-networks) of a deep neural network).  
a second dividing module, configured for dividing a pre-acquired set of training samples into multiple subsets of samples (Vanhoucke et al., FIG. 6 Col. 8 Lines 63-67, “In one example, to apply SGD based on training data, the training data may be divided into a number of subsets of training data 602A-C. Each subset of training data 602A-C may then be processed on a respective model instance, or copy, of a DNN model” teaches the training data being divided into multiple subsets of training data).
… wherein, the multiple sub-networks are simultaneously trained, and training progresses of parallel sub-networks are synchronized to accelerate the distributed training of the deep neural network (Vanhoucke et al., FIG. 5-6 and Col. 8 Lines 49-63, “In one instance, computations per formed for each node of a DNN may be distributed across several computing machines 502A-D so that responsibility for computation for different nodes is assigned to different computing machines. For instance, computation for node 504A may be performed by machine 502A while computation for node 504B may be performed by machine 502B. For connections between nodes that cross partition boundaries, values computed at the nodes may be transmitted between the computing machines 502A-D. In some examples, to further decrease processing time, computation may be distributed across multiple model instances of a DNN. FIG. 6 is a conceptual illustration 600 of distributed processing of multiple model instances of an acoustic model” teaches the distributed training across multiple model instances (corresponds to the sub-network) of a deep neural network that further decreases processing time (corresponds to accelerating the training). FIG. 2 and Col. 5 Lines 25-32, “Method 200 may include one or more operations, functions, or actions as illustrated by one or more of blocks 202-206. Although the blocks are illustrated in a sequential order, these blocks may also be performed in parallel, and/or in a different order than those described herein. Also, the various blocks may be combined into fewer blocks, divided into additional blocks, and/or removed based upon the desired implementation” teaches the acoustic model having the capability to be processed (corresponds to being trained) in parallel).
Vanhoucke et al. does not appear to explicitly teach a network training module, configured for performing the distributed training of the deep neural network to be trained with the multiple subsets of samples based on a distributed cluster architecture and a preset scheduling method
However, Zhang et al., teaches a network training module, configured for performing the distributed training of the deep neural network to be trained with the multiple subsets of samples based on a distributed cluster architecture and a preset scheduling method (Zhang et al., Section 1 Pg. 182, “With this motivation, we design Poseidon, an efficient communication architecture for data-parallel DL on distributed GPUs. Poseidon exploits the sequential layer-by-layer structure in DL programs, finding independent GPU computation operations and network communication operations in the training algorithm, so that they can be scheduled together to reduce bursty network communication” teaches distributed training of a deep neural network based on a preset schedule. Section 2 Pg. 182, “In this section, we formulate the DL training as an iterative-convergent algorithm, and describe parameter server (PS) and sufficient factor broadcasting (SFB) for parallelizing such computation on clusters” teaches distributed training based on clusters. Section 2.1 Pg. 183, “we usually feed a batch of training samples D(t) (D(t) ⊂ D) at each training iteration t” teaches each training iteration allows training using one batch (subset), therefore multiple iterations would use multiple batches (subsets). Fig. 1, teaches a convolutional neural network (corresponds to a deep neural network. A CNN is a type of DNN)).
Vanhoucke et al. in view of Zhang et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “distributed machine learning”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Vanhoucke et al. with Zhang et al., with motivation of a network training module, configured for performing the distributed training of the deep neural network to be trained with the multiple subsets of samples based on a distributed cluster architecture and a preset scheduling method. “We present Poseidon, a scalable and efficient communication architecture for large-scale DL on distributed 
Vanhoucke et al. in view of Zhang et al. does not appear to explicitly teach wherein, the training of each sub-network is accelerated by reducing the effect of network delay through data localization and wherein, the data localization means that a task is performed at a preset cloud resource node to minimize data transmission time
However, Yang et al., teaches wherein, the training of each sub-network is accelerated by reducing the effect of network delay through data localization (Yang et al., Section 3.1 Pg. 4045-4046, “In this paper, a new distributed DNN model based on multi-agent from another point of distributed is proposed to solve the memory bottleneck of large sample set. It improves the convergent speed through making use of the current network resources. The model is build based on a Hybrid Model combined the Master-Slave Model [2] with the Node-Only Model [2], every agent in this model is peer to peer that can offer its computation service and get help from other agents when training its sample set. So many distributed agents in different location can process the large sample. Those free agents form a Node-Only Model, just like Figure 2, when there is no computation mission. And they will change to be a Master-Slave Model, which is shown in Figure 3, when one of them informs a computation mission. Those resources have been engaged in computation will be released at the end of the mission, and return to be a free agent that can reform a new Node-Only Model waiting for the next mission. Therefore, the agent’s execute model in the DNN model is defined as Figure 4” teaches training of a distributed neural network based on multi agent (corresponds to sub-network) that improves the convergent speed (reducing the effect of network delay) by utilizing a Hybrid model where a node accesses a peer node through network latency (corresponds to data localization)).
… wherein, the data localization means that a task is performed at a preset cloud resource node to minimize data transmission time (Yang et al., Figure 4 and Section 3.1 Pg. 4045-4046, “In this paper, a new distributed DNN model based on multi-agent from another point of distributed is proposed to solve the memory bottleneck of large sample set. It improves the convergent speed through making use of the current network resources. The model is build based on a Hybrid Model combined the Master-Slave Model [2] with the Node-Only Model [2], every agent in this model is peer to peer that can offer its computation service and get help from other agents when training its sample set. So many distributed agents in different location can process the large sample. Those free agents form a Node-Only Model, just like Figure 2, when there is no computation mission. And they will change to be a Master-Slave Model, which is shown in Figure 3, when one of them informs a computation mission. Those resources have been engaged in computation will be released at the end of the mission, and return to be a free agent that can reform a new Node-Only Model waiting for the next mission. Therefore, the agent’s execute model in the DNN model is defined as Figure 4” teaches a computation mission (corresponds the task) being performed at a preset node to improve convergent speed (corresponds to minimize data transmission time)).
Vanhoucke et al. in view of Zhang et al. in view of Yang et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “distributed machine learning”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Vanhoucke et al. and Zhang et al. with Yang et al., with motivation wherein, the training of each sub-network is accelerated by reducing the effect of network delay through data localization and wherein, the data localization means that a task is performed at a preset cloud resource node to minimize data transmission time. “The experimental results demonstrate that this method can effectively improve the convergent speed, has good expansibility, and can be applied to the prediction of protein secondary structure of middle and large size of amino-acid sequence” (Yang et al., Abstract). The proposed teaching is beneficial in that it helps to improve convergent speed, had good expansibility and can be applied to the prediction of protein secondary structure of middle and large size of amino-acid sequence.

Allowable Subject Matter
Claims 2-9 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 8:00am-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125