DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after Dec 13, 2017, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/13/2017, 01/18/2019 and 07/08/2020 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Oath/Declaration
For the record, the examiner acknowledges that the Oath/Declaration submitted on 01/05/2018 has been received.
Drawings
The drawings filed on 12/13/2017 have been accepted.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 


An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with 
Claim 8:
	processing elements configured to train an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions…….and an input/output engine configured to store the first and second parameter values in a storage component.
Claim 9:
	processing elements is configured to concurrently provide input values of the corresponding known training data sets to the first subnetworks, concurrently generate error values for the first subnetworks, and concurrently modify the first parameter values.	
Claim 10:
	processing elements is configured to concurrently provide the input values of the corresponding known training data sets to first subnetworks, concurrently generate the error values for the first subnetworks, and concurrently modify the first parameter values that define the first subnetworks iteratively until convergence criteria for the first parameter values are satisfied.
Claim 11:
	processing elements is configured to: train a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork.
Claim 12:
processing elements is configured to: provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and second parameter values of the first and second subnetworks… wherein the input/output engine is configured to store the modified parameter values that define the first and second subnetworks in the storage component
Claim 13:
	processing elements is configured to iteratively provide the input values of the network training data set to the artificial neural network, generate the error values, and modify the first and second parameter values that define the first and second subnetworks until a convergence criterion for the first parameter values is satisfied.
Claim 14:
	the input/output engine is configured to read parameter values for a subset of the first subnetworks of the artificial neural network from the storage component; and at least one of the plurality of processing elements is configured to define parameter values of a different artificial neural network using the parameter values for the subset of the first subnetworks of the artificial neural network.

Present application's disclosure provides the following description regarding the above generic modifiers:
Para [0013]:
“These drawbacks in the conventional sequential training process are addressed by parallelizing the training of an artificial neural network (such as a CNN or a DNN) that includes first 

Para [0018]:
“An input/output (I/O) engine 140 handles input or output operations associated with the display 120, as well as other elements of the processing system 100 such as keyboards, mice, printers, external disks, and the like. The I/O engine 140 is coupled to the bus 110 so that the I/O engine 140 is able to communicate with the memory 105, the GPU 115, or the CPU 130. In the illustrated embodiment, the I/O engine 140 is configured to read information stored on an external storage component 145, which is implemented using a non-transitory computer 

Para [0019]:
“Artificial neural networks, such as a CNN or DNN, are represented as program code that is configured using a corresponding set of parameters. The artificial neural network can therefore be executed on the GPU 115 or the CPU 130, or other processing units including field programmable gate arrays (FPGA), application-specific integrated circuits (ASIC), processing in memory (PIM), and the like. If the artificial neural network implements a known function that can be trained using a corresponding known dataset, the artificial neural network is trained (i.e., the values of the parameters that define the artificial neural network are established) by providing input values of the known training data set to the artificial neural network executing on the GPU 115 or the CPU 130 and then comparing the output values of the artificial neural network to labeled output values in the known training data set. Error values are determined based on the comparison and back propagated to modify the values of the parameters that define the artificial neural network. This process is iterated until the values of the parameters satisfy a convergence criterion.”

Para [0020]:
“However, as discussed herein, artificial neural networks are often composed of subnetworks that perform known (or explicit) functions and subnetworks that perform unknown (or implicit) 

Para [0021]:
“Once the first subnetworks have been trained, the artificial neural network is trained on a network training data set. The parameters of the first subnetworks are held constant at this stage of the training because the parameters are expected to be accurately defined by training the first subnetworks on the basis of the known datasets. Input values from the network training datasets are provided to the artificial neural network, which is executing on one, some or all of the processing elements 116-118, 131-133. Error values are generated by comparing 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

Regarding “processing elements” in claim 8, para [0020] provides that “artificial neural
networks are often composed of subnetworks that perform known (or explicit) functions and subnetworks that perform unknown (or implicit) functions. Sequentially training the artificial neural network that includes subnetworks to implement known and unknown functions on a 
Elements as GPU or CPU (para. 0020) programmed to train first subnetworks that implements known function and second subnetworks that implements unknown functions. (para 0020-0021, 0013) 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):


The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.


Claims 8-14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first
paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
	The following limitations in claims 8-14 invoke 35 35 U.S.C. 112(f) and the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.

Claim 8:
	processing elements configured to train an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions…….and an input/output engine configured to store the first and second parameter values in a storage component.

	processing elements is configured to concurrently provide input values of the corresponding known training data sets to the first subnetworks, concurrently generate error values for the first subnetworks, and concurrently modify the first parameter values.	
Claim 10:
	processing elements is configured to concurrently provide the input values of the corresponding known training data sets to first subnetworks, concurrently generate the error values for the first subnetworks, and concurrently modify the first parameter values that define the first subnetworks iteratively until convergence criteria for the first parameter values are satisfied.
Claim 11:
	processing elements is configured to: train a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork.
Claim 12:
	processing elements is configured to: provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and second parameter values of the first and second subnetworks… wherein the input/output engine is configured to store the modified parameter values that define the first and second subnetworks in the storage component
Claim 13:
processing elements is configured to iteratively provide the input values of the network training data set to the artificial neural network, generate the error values, and modify the first and second parameter values that define the first and second subnetworks until a convergence criterion for the first parameter values is satisfied.
Claim 14:
	the input/output engine is configured to read parameter values for a subset of the first subnetworks of the artificial neural network from the storage component; and at least one of the plurality of processing elements is configured to define parameter values of a different artificial neural network using the parameter values for the subset of the first subnetworks of the artificial neural network.

	However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.

	Regarding claim 8, para [0018] provides that “An input/output (I/O) engine 140 handles input or output operations associated with the display 120, as well as other elements of the processing system 100 such as keyboards, mice, printers, external disks, and the like. The I/O engine 140 is coupled to the bus 110 so that the I/O engine 140 is able to communicate with the memory 105, the GPU 115, or the CPU 130. In the illustrated embodiment, the I/O engine 140 is configured to read information stored on an external storage component 145, which is implemented using a non-transitory computer readable medium such as a compact disk 


Regarding claim 9, para [0019] provides that “If the artificial neural network implements a known function that can be trained using a corresponding known dataset, the artificial neural network is trained (i.e., the values of the parameters that define the artificial neural network are established) by providing input values of the known training data set to the artificial neural network executing on the GPU 115 or the CPU 130 and then comparing the output values of the artificial neural network to labeled output values in the known training data set. Error values are determined based on the comparison and back propagated to modify the values of the parameters that define the artificial neural network. This process is iterated until the values of the parameters satisfy a convergence criterion.” This description is insufficient because it is merely referring to hardware broadly and does not describe the specific algorithm used for the 

Regarding claim 10, para [0019] provides that “If the artificial neural network implements a known function that can be trained using a corresponding known dataset, the artificial neural network is trained (i.e., the values of the parameters that define the artificial neural network are established) by providing input values of the known training data set to the artificial neural network executing on the GPU 115 or the CPU 130 and then comparing the output values of the artificial neural network to labeled output values in the known training data set. Error values are determined based on the comparison and back propagated to modify the values of the parameters that define the artificial neural network. This process is iterated until the values of the parameters satisfy a convergence criterion.” This description is insufficient because it is merely referring to hardware broadly and does not describe the specific algorithm used for the processing elements to provide input values of the corresponding known training data sets to the first subnetworks, concurrently generate error 

Regarding claim 11, para [0033] provides that “FIG. 5 is a block diagram illustrating training of subnetworks that implement unknown functions within a cutout portion 500 of an artificial neural network that also includes subnetworks that implement known functions according to some embodiments. The cutout portion 500 is executed on processing elements such as the processing elements 116-118, 131-133 shown in FIG. 1.” This description is insufficient because it is merely referring to hardware broadly and does not describe the specific algorithm used for the processing elements to train a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork.

See MPEP 2181, subsection II ("To claim a means for performing a specific computer-implemented function and then to disclose only a general purpose computer as the structure 

Regarding claim 12, para [0014] provides that “During the quality assurance step, input values of the training data set are provided to an instance of the artificial neural network that is defined by the modified parameters of the first and second subnetworks. Error values generated by the artificial neural network are back propagated to modify the parameters that define the first and second subnetworks in the artificial neural network.” This description is insufficient because it is merely referring to hardware broadly and does not describe the specific algorithm used for the processing elements to provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and second parameter values of the first and second subnetworks; generate error values by comparing output values from the artificial neural network to labeled values in the network training data set; and use the error values to modify the first and second parameter values that define the first and second subnetworks. See also para [0018] provides that “An input/output (I/O) engine 140 handles input or output operations associated with the display 120, as well as other elements of the processing system 100 such as keyboards, mice, printers, external disks, and the like. The I/O engine 140 is coupled to the bus 110 so that the I/O engine 140 is able to communicate with the memory 105, the GPU 115, or the CPU 130. In the illustrated 

Regarding claim 13, para [0014] provides that “A quality assurance step is performed to train the parameters of the artificial neural network given the parameter values determined for the trained first and second subnetworks. During the quality assurance step, input values of the training data set are provided to an instance of the artificial neural network that is defined by the modified parameters of the first and second subnetworks. Error values generated by the artificial neural network are back propagated to modify the parameters that define the first and second subnetworks in the artificial neural network and the process is iterated until a 

Regarding claim 14, para [0018] provides that “An input/output (I/O) engine 140 handles input or output operations associated with the display 120, as well as other elements of the processing system 100 such as keyboards, mice, printers, external disks, and the like. The I/O engine 140 is coupled to the bus 110 so that the I/O engine 140 is able to communicate with the memory 105, the GPU 115, or the CPU 130. In the illustrated embodiment, the I/O engine 140 is configured to read information stored on an external storage component 145, which is implemented using a non-transitory computer readable medium such as a compact disk (CD), a digital video disc (DVD), and the like. The I/O engine 140 can also write information to the external storage component 145, such as the results of processing by the GPU 115 or the 
Therefore, claims 8-14 are rejected under 35 U.S.C. 112(a) for lack of written description. See MPEP 2181, subsection II ("When a claim containing a computer- implemented 35 U.S.C. 112(f) claim limitation is found to be indefinite under 35 U.S.C. 112(b) for failure to disclose sufficient corresponding structure (e.g., the computer and the algorithm) in the specification that performs the entire claimed function, it will also lack written description under 35 U.S.C. 112(a). See MPEP § 2163.03, subsection VI.").







(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 6, 8-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 6, it recites, “modifying the first parameter values that define the first
and second subnetworks”. Claim 6 is dependent on claim 1, however first parameter values define first subnetworks and second parameter values define second subnetworks. For the purpose of the examination, examiner will read this as “modifying the first parameter values that define the first subnetworks”.
The following limitations in claims 8-14 invoke 35 35 U.S.C. 112(f) and the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.



	processing elements configured to train an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions…….and an input/output engine configured to store the first and second parameter values in a storage component.
Claim 9:
	processing elements is configured to concurrently provide input values of the corresponding known training data sets to the first subnetworks, concurrently generate error values for the first subnetworks, and concurrently modify the first parameter values.	
Claim 10:
	processing elements is configured to concurrently provide the input values of the corresponding known training data sets to first subnetworks, concurrently generate the error values for the first subnetworks, and concurrently modify the first parameter values that define the first subnetworks iteratively until convergence criteria for the first parameter values are satisfied.
Claim 11:
	processing elements is configured to: train a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork.
Claim 12:
	processing elements is configured to: provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and input/output engine is configured to store the modified parameter values that define the first and second subnetworks in the storage component
Claim 13:
	processing elements is configured to iteratively provide the input values of the network training data set to the artificial neural network, generate the error values, and modify the first and second parameter values that define the first and second subnetworks until a convergence criterion for the first parameter values is satisfied.
Claim 14:
	the input/output engine is configured to read parameter values for a subset of the first subnetworks of the artificial neural network from the storage component; and at least one of the plurality of processing elements is configured to define parameter values of a different artificial neural network using the parameter values for the subset of the first subnetworks of the artificial neural network.

	However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.

	Regarding claim 8, para [0018] provides that “An input/output (I/O) engine 140 handles input or output operations associated with the display 120, as well as other elements of the processing system 100 such as keyboards, mice, printers, external disks, and the like. The I/O 
(CD), a digital video disc (DVD), and the like. The I/O engine 140 can also write information to the external storage component 145, such as the results of processing by the GPU 115 or the CPU 130.” This description is insufficient because it is merely referring to hardware broadly and does not describe the specific algorithm used for the input/output engine to store the first and second parameter values in a storage component. See MPEP 2181, subsection II ("To claim a means for performing a specific computer-implemented function and then to disclose only a general purpose computer as the structure designed to perform that function amounts to pure functional claiming. Aristocrat, 521 F.3d 1328 at 1333, 86 USPQ2d at 1239. In this instance, the structure corresponding to a 35 U.S.C. 112(f) claim limitation for a computer-implemented function must include the algorithm needed to transform the general purpose computer or microprocessor disclosed in the specification").

Regarding claim 9, para [0019] provides that “If the artificial neural network implements a known function that can be trained using a corresponding known dataset, the artificial neural network is trained (i.e., the values of the parameters that define the artificial neural network are established) by providing input values of the known training data set to the artificial neural network executing on the GPU 115 or the CPU 130 and then comparing the output values of the artificial neural network to labeled output values in the known training data set. Error values 

Regarding claim 10, para [0019] provides that “If the artificial neural network implements a known function that can be trained using a corresponding known dataset, the artificial neural network is trained (i.e., the values of the parameters that define the artificial neural network are established) by providing input values of the known training data set to the artificial neural network executing on the GPU 115 or the CPU 130 and then comparing the output values of the artificial neural network to labeled output values in the known training data set. Error values are determined based on the comparison and back propagated to modify the values of the parameters that define the artificial neural network. This process is iterated 

Regarding claim 11, para [0033] provides that “FIG. 5 is a block diagram illustrating training of subnetworks that implement unknown functions within a cutout portion 500 of an artificial neural network that also includes subnetworks that implement known functions according to some embodiments. The cutout portion 500 is executed on processing elements such as the processing elements 116-118, 131-133 shown in FIG. 1.” This description is insufficient because it is merely referring to hardware broadly and does not describe the specific algorithm used for the processing elements to train a second subnetwork based on a 
See MPEP 2181, subsection II ("To claim a means for performing a specific computer-implemented function and then to disclose only a general purpose computer as the structure designed to perform that function amounts to pure functional claiming. Aristocrat, 521 F.3d 1328 at 1333, 86 USPQ2d at 1239. In this instance, the structure corresponding to a 35 U.S.C. 112(f) claim limitation for a computer-implemented function must include the algorithm needed to transform the general purpose computer or microprocessor disclosed in the specification").

Regarding claim 12, para [0014] provides that “During the quality assurance step, input values of the training data set are provided to an instance of the artificial neural network that is defined by the modified parameters of the first and second subnetworks. Error values generated by the artificial neural network are back propagated to modify the parameters that define the first and second subnetworks in the artificial neural network.” This description is insufficient because it is merely referring to hardware broadly and does not describe the specific algorithm used for the processing elements to provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and second parameter values of the first and second subnetworks; generate error values by comparing output values from the artificial neural network to labeled values in the network training data set; and use the error values to modify the first and second parameter values that define the first and second subnetworks. See also para [0018] provides that “An input/output 

Regarding claim 13, para [0014] provides that “A quality assurance step is performed to train the parameters of the artificial neural network given the parameter values determined for the trained first and second subnetworks. During the quality assurance step, input values of the 

Regarding claim 14, para [0018] provides that “An input/output (I/O) engine 140 handles input or output operations associated with the display 120, as well as other elements of the processing system 100 such as keyboards, mice, printers, external disks, and the like. The I/O engine 140 is coupled to the bus 110 so that the I/O engine 140 is able to communicate with the memory 105, the GPU 115, or the CPU 130. In the illustrated embodiment, the I/O 
Therefore, claims 8-14 are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA 
35 U.S.C. 112, second paragraph.
Applicant may:
(a) Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph;
 (b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C.132(a)); or

If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
(a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR l.75(d) and MPEP §§ 608.0l(o) and 2181.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 2015/0294219 Al by Krizhevsky et al, (hereinafter, “Ref. Krizhevsky”), in view of US 2016/0110642 Al by Matsuda et al., (hereinafter, “Ref. Matsuda”).
As per claim 1, Krizhevsky teaches 
training the first subnetworks separately and in parallel on corresponding known training datasets to determine first parameter values that define the first subnetworks; (Ref Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN. Here, first combination of layers comprise the elements - 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements - 116b, 114b, 112b, 110b, 108b and 106b. First and second combination of the layers (i.e. first subnetworks) are trained separately and in parallel using training example batch data 104a and 104b (i.e. corresponding known training datasets). See also para [0035] reasonably teaches that the worker updates (i.e. determine) the weight values (i.e. first parameter values) for the first subnetworks. ) 

providing input values from a network training data set to the artificial neural network including the trained first subnetworks; (Ref Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks) 
generating error values by comparing output values produced by the artificial neural network to labeled output values of the network training data set; (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network  to the known output (i.e. labeled output) for the training dataset. )
using the error values to modify second parameter values that define the second subnetworks without modifying the first parameter values; (Ref Krizhevsky fig 1 teaches that the worker 3 100c trains the third combination of layers (i.e. second subnetworks) which comprise the elements 116c, 114c, 112c, 110c, 108c and 106c. Para [0030] and [0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. Then the worker computes the gradient of an objective function for the training example using the error. The worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). This reasonably teaches the worker 3 100c updates (i.e. modify) the weight values (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks.), it doesn’t modify the weights for the first or the second combination of layers (i.e. first subnetworks). The worker 1 and the worker 2 update the first and the second combination of layers’ weights individually)
Ref. Krizhevsky fails to explicitly teach
A method of training an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions, the method comprising
and storing the first and second parameter values.


However Ref. Matsuda teaches 
A method of training an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions, the method comprising: (Ref. Matsuda abstract –“The method includes the steps of training a language-independent sub-network 120 and language-dependent sub-networks 122 and 124 with training data of Japanese and English” and para. [0008] –”When speech recognition of a new language becomes necessary, a new DNN is prepared and learning is done anew” teach that the language independent subnetwork (i.e. first subnetworks) can be trained to provide speech recognition for any trained known languages (i.e. implementing known function). It also teaches that when there is a new unknown language (i.e. unknown function), language-dependent subnetworks (i.e. second subnetworks) can be trained for that new unknown language.)
and storing the first and second parameter values. (Ref. Matsuda para [0015] - “the present invention provides a storage medium storing DNN sub-network parameters learned through any of the methods described above” teaches that sub-network parameters (i.e. first and second parameter values) can be stored in a storage medium. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Krizhevsky Parallelizing the Training of Convolutional Neural Networks into Ref. Matsuda Deep Neural Network Learning Method And Apparatus, with a motivation to “provide a method and an apparatus of DNN learning that can shorten the time necessary for DNN learning using training data of which objects belong to certain categories, as well as to provide an apparatus for recognizing an object using such a DNN.” (Ref. Matsuda para [0010])
As per claim 2, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the method of Claim 1.
Ref Krizhevsky further teaches 
wherein training the first subnetworks comprises concurrently providing input values of the corresponding known training data sets to the first subnetworks, (Ref Krizhevsky fig 1 teaches that the system comprises concurrently providing the training example batch 1 104a to the first combination layers of the CNN and providing the training example batch 2 104b to the second combination of layers of the CNN. Here the first and the second combination of layers of the CNN (i.e. first subnetworks) are trained concurrently.)

concurrently generating error values for the first subnetworks, and concurrently modifying the first parameter values. (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and to the known output for the training dataset. See also para [0035] and [0037] reasonably teach that each worker updates the weight values for its designated subnetwork. For example, the worker 1 100a updates weights (i.e. first parameter values) for the first combination of layers (i.e. first subnetwork) and the worker 2 100b updates weights (i.e. first parameter values) for the second combination of layers (i.e. first subnetwork) concurrently.)
As per claim 3, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the method of Claim 2.
Ref Krizhevsky further teaches 
wherein concurrently providing the input values of the corresponding known training data sets to first subnetworks, (Ref Krizhevsky fig 1 teaches that system comprises concurrently providing the training example batch 1 104a to first combination layers of the CNN and the training example batch 2 104b to second combination of layers of the CNN. Here first and second combination of layers (i.e. first subnetworks) of the CNN are trained concurrently.)
concurrently generating the error values for the first subnetworks, and concurrently modifying the first parameter values that define the first subnetworks are performed iteratively until convergence criteria for the first parameter values are satisfied. (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and the known output for the training dataset. See also para [0035] and [0037] reasonably teach that each worker updates the weight values for its designated subnetworks. For example, the worker 1 100a updates weights (i.e. first subnetwork) for the first combination of layers (i.e. first subnetwork) and the worker 2 100b updates weights (i.e. first subnetwork) for the second combination of layers (i.e. first subnetwork)  concurrently. See also para [0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first parameter values are satisfied)

As per claim 4, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the method of Claim 3.
Ref Krizhevsky further teaches
training a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork (Ref Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers and the worker 2 100b trains second combination of layers. Both first and second combination of layers are part of the first subnetworks. Here either first combination of layers (trained by worker 1) or second combination of layers (trained by worker 2) can be read as “a second subnetwork” which can be read as a first subset of the first subnetworks. Here, the first subset encompasses the second subnetwork. Also, either training example batch 1 104a or training example batch 2 104b can be read as a cutout training set, since 104a or 104b are a subset of the training example batches (i.e. known training datasets))

As per claim 5, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the method of Claim 1.
	Ref Krizhevsky further teaches
providing the input values of the network training data set to an instance of the artificial neural network that is defined by the modified parameter values of the first and second subnetworks; (Ref Krizhevsky para [0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network. During the first iteration of the training each worker updates the weights (i.e. modified parameter values) of the following combination of layers. For instance, (see fig 1) after the first iteration of training the worker 1 100a updates the weights (i.e. parameters) for first combination of layers (i.e. first subnetworks) and worker 3 100c updates the weights for third combination of layers (i.e. second subnetworks). ) 
generating error values by comparing output values from the artificial neural network to labeled values in the network training data set; (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. )
using error values to modify the first and second parameter values that define the first and second subnetworks; (Para [0030] and [0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. Each worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). For instance, (see fig 1) the worker 1 100a updates the weights (i.e. first parameter values) for first combination of layers (i.e. first subnetwork) and worker 3 100c updates the weights (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks)).

Ref. Krizhevsky fails to explicitly teach
and storing the modified parameter values that define the first and second subnetworks. 
However Ref. Matsuda teaches 
and storing the modified parameter values that define the first and second subnetworks. (Ref. Matsuda para [0015] - “the present invention provides a storage medium storing DNN sub-network parameters learned through any of the methods described above” teaches that sub-network modified parameters (i.e. first and second modified parameter values) can be stored in a storage medium. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Krizhevsky Parallelizing the Training of Convolutional Neural Networks into Ref. Matsuda Deep Neural Network Learning Method And Apparatus, with a motivation to “provide a method and an apparatus of DNN learning that can shorten the time necessary for DNN learning using training data of which objects belong to certain categories, as well as to provide an apparatus for recognizing an object using such a DNN.” (Ref. Matsuda para [0010])

As per claim 6, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the method of Claim 5.
	Ref Krizhevsky further teaches
wherein providing the input values of the network training data set to the artificial neural network, (Ref Krizhevsky para [0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network.)
generating the error values, (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and the known output for the training dataset.)
and modifying the first parameter values that define the first subnetworks are performed iteratively until a convergence criterion for the first parameter values is satisfied. (Ref Krizhevsky para [0035] and [0037] teach that each worker updates the weight values (i.e. first parameter values) for the first subnetworks. For example, the worker 1 100a updates weights for the first combination of layers and the worker 2 100b updates weights for the second combination of layers concurrently. See also para [0039] – “Once each worker has teaches that the training process is performed until convergence criteria for the first parameter values are satisfied)
As per claim 7, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the method of Claim 1.
	Ref Krizhevsky further teaches
	reading stored parameter values for a subset of the first subnetworks of the artificial neural network; (Ref Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network).)
and defining parameter values of a different artificial neural network using the stored parameter values for the subset of the first subnetworks of the artificial neural network. (Ref Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network) and uses these data to update the weights (i.e. defining parameter values of a different artificial neural network) by backpropagation. )

As per claim 8, Krizhevsky teaches 
training the first subnetworks separately and in parallel on corresponding known training datasets to determine first parameter values that define the first subnetworks; (Ref Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN. Here, first combination of layers comprise the elements - 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements - 116b, 114b, 112b, 110b, 108b and 106b. First and second combination of the layers (i.e. first subnetworks) are trained separately and in parallel using training example batch data 104a and 104b (i.e. corresponding known training datasets). See also para [0035] reasonably teaches that the worker updates (i.e. determine) the weight values (i.e. first parameter values) for the first subnetworks. )

providing input values from a network training data set to the artificial neural network including the trained first subnetworks;  - 22 -Attorney Docket Number: 1458-170142 (Ref Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks)

(Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network  to the known output (i.e. labeled output) for the training dataset. )


training the second subnetworks by using the error values to modify second parameter values that define the second subnetworks without modifying the first parameter values; (Ref Krizhevsky fig 1 teaches that the worker 3 100c trains the third combination of layers (i.e. second subnetworks) which comprise the elements 116c, 114c, 112c, 110c, 108c and 106c. Para [0030] and [0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. The worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). This reasonably teaches the worker 3 100c updates (i.e. modify) the weight values (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks.), it doesn’t update the weights for first or second combination of layers (i.e. first subnetworks). The worker 1 and the worker 2 update the first and second combination of layers’ weights individually)

Ref. Krizhevsky fails to explicitly teach
a plurality of processing elements configured to train an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions 
an input/output engine configured to store the first and second parameter values in a storage component.
However Ref. Matsuda teaches 
a plurality of processing elements configured to train an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions (Ref. Matsuda fig 12 teaches CPU (356) (i.e. processing elements). See also, abstract –“The method includes the steps of training a language-independent sub-network 120 and language-dependent sub-networks 122 and 124 with training data of Japanese and English” and para. [0008] –”When speech recognition of a new language becomes necessary, a new DNN is prepared and learning is done anew” teach that the language independent subnetwork (i.e. first subnetworks) can be trained to provide speech recognition for any trained known languages (i.e. implementing known function). It also teaches that when there is a new unknown language (i.e. unknown function), language-dependent subnetworks (i.e. second subnetworks) can be trained for that new unknown language.) 

 (Ref. Matsuda fig 11 teaches computer (i.e. input/output engine). See also, para [0015] - “the present invention provides a storage medium storing DNN sub-network parameters learned through any of the methods described above” teaches that sub-network parameters (i.e. first and second parameter values) can be stored in a storage medium. ) 

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Krizhevsky Parallelizing the Training of Convolutional Neural Networks into Ref. Matsuda Deep Neural Network Learning Method And Apparatus, with a motivation to “provide a method and an apparatus of DNN learning that can shorten the time necessary for DNN learning using training data of which objects belong to certain categories, as well as to provide an apparatus for recognizing an object using such a DNN.” (Ref. Matsuda para [0010])
As per claim 9, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 8.
	Ref Krizhevsky further teaches
wherein the plurality of processing elements is configured to concurrently provide input values of the corresponding known training data sets to the first subnetworks (Ref Krizhevsky fig 1 teaches that system comprises concurrently providing the training example batch 1 104a to the first combination layers of the CNN and providing the training example batch 2 104b to the second combination of layers of the CNN. Here the first and the second combination of layers of the CNN (i.e. first subnetworks) are trained concurrently.)
Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and to the known output for the training dataset. See also para [0035] and [0037] reasonably teach that each worker updates the weight values (i.e. first parameter values) for the first subnetworks. For example, the worker 1 100a updates weights for the first combination of layers and the worker 2 100b updates weights for the second combination of layers concurrently)
As per claim 10, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 9.
	Ref Krizhevsky further teaches
wherein the plurality of processing elements is configured to concurrently provide the input values of the corresponding known training data sets to first subnetworks, concurrently generate the error values for the first subnetworks, and concurrently modify the first parameter values that define the first subnetworks iteratively until convergence criteria for the first parameter values are satisfied. (Ref Krizhevsky fig 1 teaches that system comprises concurrently providing the training example batch 1 104a to first combination layers of the CNN and the training example batch 2 104b to second combination of layers of the CNN. Here first and second combination of layers of the CNN (i.e. first subnetworks) are trained concurrently.)
Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and to the known output for the training dataset. See also para [0035] and [0037] reasonably teach that each worker updates the weight values for the first subnetworks. For example, the worker 1 100a updates weights (i.e. first parameter values)  for the first combination of layers (i.e. first subnetworks) and the worker 2 100b updates weights (i.e. first parameter values)  for the second combination of layers (i.e. first subnetworks) concurrently. See also para [0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first parameter values are satisfied)

As per claim 11, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 10. 
	Ref Krizhevsky further teaches
wherein the plurality of processing elements is configured to: train a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a (Ref Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers and the worker 2 100b trains second combination of layers. Both first and second combination of layers are part of the first subnetworks. Here either first combination of layers (trained by worker 1) or second combination of layers (trained by worker 2) can be read as “a second subnetwork” which can be read as a first subset of the first subnetworks. Here, the first subset encompasses the second subnetwork. Also, either training example batch 1 104a or training example batch 2 104b can be read as a cutout training set, since 104a or 104b are a subset of the training example batches (i.e. known training datasets)).

As per claim 12, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 8.
Ref Krizhevsky further teaches
provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and second parameter values of the first and second subnetworks; (Ref Krizhevsky para [0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network. During the first iteration of the training each worker updates the weights (i.e. modified parameter values) of the following combination of layers. For instance, (see fig 1) after the first iteration of training the worker 1 100a updates the weights (i.e. parameters) for first combination of layers (i.e. first subnetworks) and worker 3 100c updates the weights for third combination of layers (i.e. second subnetworks). )
generate error values by comparing output values from the artificial neural network to labeled values in the network training data set; (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and the known output (i.e. labeled output) for the training dataset. )
and use the error values to modify the first and second parameter values that define the first and second subnetworks, (Para [0030] and [0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. Each worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). For instance, (see fig 1) the worker 1 100a updates the weights (i.e. first parameter values) for first combination of layers (i.e. first subnetwork) and worker 3 100c updates the weights (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks)).
Ref. Krizhevsky fails to explicitly teach

However Ref. Matsuda teaches 
and wherein the input/output engine is configured to store the modified parameter values that define the first and second subnetworks in the storage component.
 (Ref. Matsuda para [0015] - “the present invention provides a storage medium storing DNN sub-network parameters learned through any of the methods described above” teaches that sub-network modified parameters (i.e. first and second modified parameter values) can be stored in a storage medium. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Krizhevsky Parallelizing the Training of Convolutional Neural Networks into Ref. Matsuda Deep Neural Network Learning Method And Apparatus, with a motivation to “provide a method and an apparatus of DNN learning that can shorten the time necessary for DNN learning using training data of which objects belong to certain categories, as well as to provide an apparatus for recognizing an object using such a DNN.” (Ref. Matsuda para [0010])
As per claim 13, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 12.
Ref Krizhevsky further teaches
wherein the plurality of processing elements is configured to iteratively provide the input values of the network training data set to the artificial neural network, (Ref Krizhevsky para [0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network.)

 generate the error values, (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output for the training dataset.)

and modify the first and second parameter values that define the first and second subnetworks until a convergence criterion for the first parameter values is satisfied. (Ref Krizhevsky para [0035] and [0037] teach that each worker updates the weight values (i.e. first parameter values) for the first subnetworks. For example, the worker 1 100a updates weights for the first combination of layers and the worker 2 100b updates weights for the second combination of layers concurrently. See also para [0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first and second parameter values are satisfied)
As per claim 14, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 8.
Ref Krizhevsky further teaches
the input/output engine is configured to read parameter values for a subset of the first subnetworks of the artificial neural network from the storage component; (Ref Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network)
and at least one of the plurality of processing elements is configured to define parameter values of a different artificial neural network using the parameter values for the subset of the first subnetworks of the artificial neural network. . (Ref Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network) and uses these data to update the weights (i.e. defining parameter values of a different artificial neural network) by backpropagation. )

As per claim 15, Krizhevsky teaches 
wherein the first subnetworks have been trained on corresponding known training datasets; (Ref Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN. Here, first combination of layers comprise the element 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements 116b, 114b, 112b, 110b, 108b and 106b. First and second combination of the layers (i.e. first subnetworks) are trained separately and in parallel using training example batch data 104a and 104b (i.e. corresponding known training datasets))

generating, at the processing system, an artificial neural network by combining the first and second subnetworks; (Ref Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN and worker 3 100c trains the third combination of layers of the CNN. Here, first combination of layers comprise the element 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements - 116b, 114b, 112b, 110b, 108b and 106b. Third combination of layers (i.e. second subnetworks) comprise the elements – 116c, 114c, 112c, 110c, 108c and 106c .First, second combination of the layers (i.e. first subnetworks) and third layer jointly generate the CNN (i.e. artificial neural network)

providing input values of a network training data set to the artificial neural network; (Ref Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks)
generating, at the processing system, error values for the artificial neural network by comparing output values from the artificial neural network to labeled output values in the network training data set; (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error teaches that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. )
and modifying, at the processing system, the second parameter values that define the second subnetworks based on the error values. (Ref Krizhevsky fig 1 teaches that the worker 3 100c trains the third combination of layers (i.e. second subnetworks) which comprise the elements 116c, 114c, 112c, 110c, 108c and 106c. Para [0030] and [0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. The worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). This reasonably teaches the worker 3 100c updates (i.e. modify) the weight values (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks.))

Ref. Krizhevsky fails to explicitly teach
using an input/output engine of a processing system, first parameter values that define first subnetworks to implement known functions and second parameter values that define second subnetworks to implement unknown functions, 
However Ref. Matsuda teaches 
(Ref. Matsuda fig 11 teaches computer (i.e. input/output engine). See also, abstract –“The method includes the steps of training a language-independent sub-network 120 and language-dependent sub-networks 122 and 124 with training data of Japanese and English” and para. [0008] –”When speech recognition of a new language becomes necessary, a new DNN is prepared and learning is done anew” teach that the language independent subnetwork (i.e. first subnetworks) can be trained to provide speech recognition for any trained known languages (i.e. implementing known function). It also teaches that when there is a new unknown language (i.e. unknown function), language-dependent subnetworks (i.e. second subnetworks) can be trained for that new unknown language. See also para [0048] teaches that independent subnetwork has its own parameters (i.e. first parameter values) and dependent subnetwork has its own parameters (i.e. second parameter values))
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Krizhevsky Parallelizing the Training of Convolutional Neural Networks into Ref. Matsuda Deep Neural Network Learning Method And Apparatus, with a motivation to “provide a method and an apparatus of DNN learning that can shorten the time necessary for DNN learning using training data of which objects belong to certain categories, as well as to provide an apparatus for recognizing an object using such a DNN.” (Ref. Matsuda para [0010])
As per claim 16, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 15.
Ref Krizhevsky further teaches
wherein providing the input values of the network training data set to the artificial neural network, (Ref Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks)
generating the error values for the artificial neural network, and modifying the second parameter values that define the second subnetworks are performed iteratively until a convergence criterion for the second parameter values that define the second subnetworks is satisfied. (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and to the known output for the training dataset. See also para [0035] and [0037] reasonably teach that each worker updates the weight values for the subnetworks. For example, worker 3 100c updates weights (i.e. second parameter values) for the third combination of layers (i.e. second subnetwork). See also para [0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the second parameter values are satisfied).

As per claim 17, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 16.
Ref Krizhevsky fails to teach explicitly 
storing the first and second parameter values in a storage component.
However Ref. Matsuda teaches 
storing the first and second parameter values in a storage component. (Ref. Matsuda para [0015] - “the present invention provides a storage medium storing DNN sub-network parameters learned through any of the methods described above” teaches that sub-network parameters (i.e. first and second parameter values) can be stored in a storage medium. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Krizhevsky Parallelizing the Training of Convolutional Neural Networks into Ref. Matsuda Deep Neural Network Learning Method And Apparatus, with a motivation to “provide a method and an apparatus of DNN learning that can shorten the time necessary for DNN learning using training data of which objects belong to certain categories, as well as to provide an apparatus for recognizing an object using such a DNN.” (Ref. Matsuda para [0010])
As per claim 18, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 17.
Ref Krizhevsky further teaches
providing the input values of the network training data set to an instance of the artificial neural network that is defined by the modified parameter values of the second subnetwork; (Ref Krizhevsky para [0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network. During the first iteration of the training each worker updates the weights (i.e. modified parameter values) of the following combination of layers. For instance, (see fig 1) after the first iteration of training the worker 1 100a updates the weights (i.e. parameters) for first combination of layers (i.e. first subnetworks) and worker 3 100c updates the weights for third combination of layers (i.e. second subnetworks). )
generating error values by comparing output values from the artificial neural
network to labeled values in the network training data set; (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset.)
and using the error values to modify the first and second parameter values that define the first and second subnetworks. (Para [0030] and [0035] teach that the worker can generate error values comparing the output values by the neural network and the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. Each worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). For instance, (see fig 1) the worker 1 100a updates the weights (i.e. first parameter values) for first combination of layers (i.e. first subnetwork) and worker 3 100c updates the weights (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks)).
As per claim 19, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 18.
Ref Krizhevsky further teaches
wherein providing the input values of the network training data set to the artificial neural network, (Ref Krizhevsky para [0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network. During the first iteration of the training each worker updates the weights (i.e. modified parameter values) of the following combination of layers. For instance, (see fig 1) after the first iteration of training the worker 1 100a updates the weights (i.e. parameters) for first combination of layers (i.e. first subnetworks) and worker 3 100c updates the weights for third combination of layers (i.e. second subnetworks). )
generating the error values, (Ref Krizhevsky para [0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output for the training dataset.)
 (Ref Krizhevsky para [0035] and [0037] teach that each worker updates the weight values for the subnetworks. For example, the worker 1 100a updates weights (i.e. first parameter values) for the first combination of layers (i.e. first subnetworks) and the worker 2 100b updates weights (i.e. first parameter values)  for the second combination of layers (i.e. first subnetworks) and worker 3 100c updates weights (i.e. second parameter values) for the third combination of networks (i.e. second subnetwork). See also para [0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first and second parameter values are satisfied)
As per claim 20, the combination of Ref. Krizhevsky and Ref. Matsuda as shown above teaches the system of Claim 19.
Ref Krizhevsky further teaches
- 25 -Attorney Docket Number: 1458-170142 defining parameter values of a different artificial neural network using a subset of the stored parameter values (Ref Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network) and uses these data to update the weights (i.e. defining parameter values of a different artificial neural network) by backpropagation. )
Ref Krizhevsky fails to teach explicitly 
storing the modified parameter values that define the first and second subnetworks in the storage component; 
However Ref. Matsuda teaches 
storing the modified parameter values that define the first and second subnetworks in the storage component; (Ref. Matsuda para [0015] - “the present invention provides a storage medium storing DNN sub-network parameters learned through any of the methods described above” teaches that sub-network parameters (i.e. first and second parameter values) can be stored in a storage medium. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Krizhevsky Parallelizing the Training of Convolutional Neural Networks into Ref. Matsuda Deep Neural Network Learning Method And Apparatus, with a motivation to “provide a method and an apparatus of DNN learning that can shorten the time necessary for DNN learning using training data of which objects belong to certain categories, as well as to provide an apparatus for recognizing an object using such a DNN.” (Ref. Matsuda para [0010])

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAIA N M AZAD whose telephone number is (571)272-8232.  The examiner can normally be reached on 8.30 -5.30 (Mon -Thurs and 2nd Fri of the Pay week).

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/RAIA N M AZAD/
Examiner, Art Unit 2125  

/KAMRAN AFSHAR/             Supervisory Patent Examiner, Art Unit 2125