DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after Dec 13, 2017, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and §103 (or as subject to pre-AIA  35 U.S.C. §102 and §103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/13/2017, 01/18/2019 and 07/08/2020 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Examiner’s Note and Broadest Reasonable Interpretation
The claims and disclosure refer to “functions” being “known” or “unknown” in a confusing manner. 
For example, claim 1 recites “first subnetworks to implement known functions and second subnetworks to implement unknown functions”, and then proceeds “training the first subnetworks separately and in parallel corresponding to known training datasets” and then describes using a network training data set to train the parameters of the second subnetworks by providing the network training data set as input, generating error values between the network output and the known labels of the network training data set, and modifying the second parameter values of the second subnetwork based on the generated error values. Therefore, there is known labeled training data for training the second subnetworks. 
The as-filed specification states in [0011] that “The functions of a deep neural network (DNN) are represented by different sets of parameters for the different layers.” In other words, each “function” performed by a neuron, subnetwork, or network is a mathematical function carried out by the parameters and hyper-parameters/structure, which one of ordinary skill in the art before the earliest effective filing date of the invention would have recognized these mathematical functions as being conventional use of the term “function” in this context.
However, it is not clear based on the claims or specification if these mathematical functions are what are “known” or “unknown” for the various subnetworks. Instead, the claims and specification appear to use the term “known functions” for a subnetwork to designate that there is known labeled training data specific to the corresponding subnetwork where the corresponding labels represent the desired outputs from that corresponding subnetwork given that corresponding training data as input to that corresponding subnetwork (see independent claims 1, 8, 15 and as-filed specification ¶[0019]). This is not necessarily the same as knowing the mathematical function that the subnetwork(s) are intended to be trained to implement. Rather, it only necessarily means that some labeled training data are known for the subnetwork. Examiner has not found anywhere in the claims or specification where a known mathematical function for the subnetwork is used to generate the labeled training data for the subnetwork. 
As mentioned above for claim 1, the claimed invention does have a labeled training data set used to train the parameters of the second subnetworks which are recited as being “to implement unknown functions”. This may be confusing and not seem very clear. Looking to the as-filed specification for guidance, applicant distinguishes in ¶[0001]: “The functions implemented by the layers in a DNN are explicit (i.e., known or predetermined) or hidden (i.e., unknown).” Further, in ¶[0020]: “To reduce the time and resources consumed by training an artificial network, the artificial neural network is subdivided into first subnetworks that perform known functions (which have corresponding known training datasets) and second subnetworks that perform unknown functions and therefore do not have known training datasets.” Then in ¶[0028]-[0030]:
[0028] FIG. 3 is a block diagram illustrating training of subnetworks that implement unknown functions within an instance 300 of an artificial neural network that also includes subnetworks that implement known functions according to some embodiments. The instance 300 is executed on processing elements such as the processing elements 116-118, 131-133 shown in FIG. 1. The DNN is implemented using interconnected subnetworks 310, 311, 312, 313, 314, 315, which are collectively referred to herein as "the subnetworks 310-315." The subnetworks 310-315 implement different functions that are defined by values of parameters that characterize the subnetworks 310-315. In the illustrated embodiment, the subnetwork 310 implements an unknown function and consequently does not have a known training dataset. Although a single subnetwork 310 implementing an unknown function is shown in FIG. 3, some embodiments of the artificial neural network include multiple subnetworks that implement one or more unknown functions. The subnetworks 311-315 implement known functions that have corresponding known training datasets. The subnetworks 311-315 have therefore been trained separately and in parallel on the basis of the corresponding known training datasets. 
[0029] The instance 300 of the DNN is trained using a network training dataset that includes the input values 320 and 325 and the labeled output values 330. The instance 300 of the DNN can receive the input values 320, 325 and generate output values 335. Error values are then determined for the instance 300 of the DNN by comparing the output values 335 to the labeled output values 330. The subnetwork 310 is identified as a training subnetwork, as indicated by the solid lines, which means that the parameters that define the subnetworks 310 are modified based on back propagated error values. The subnetworks 311-315 are identified as non-training subnetworks, as indicated by the dashed lines, which means that the parameters that define the subnetworks 311-315 are not modified based on the back 
[0030] The training subnetwork 310 is then trained by assuming that the error values produced by the instance 300 of the DNN are produced by inaccurate values of the parameters that define the training subnetwork 310. The values of the parameters are therefore modified based on the error values produced during a current iteration to reduce the error values produced during a subsequent iteration. The values of the parameters that define the other (non-training) subnetworks 311-315 are held constant during the training process. For example, the values of the parameters that define the subnetwork 310 in the instance 300 of the DNN are iteratively modified to reduce the error values produced by the instance 300 of the DNN, while holding the values of the parameters that define the subnetworks 311-315 constant. 

Also see ¶¶[0033]-[0034], [0038], [0040]-[0043]. Therefore, in light of the disclosure in the specification, the “second subnetworks to implement known functions” are interpreted to encompass subnetworks that can be trained as part of a larger neural network using known training data. 
	The claims and only the claims form the metes and bounds of the invention. Office personnel are to give claims their "broadest reasonable interpretation" in light of the supporting disclosure. In re Morris, 127 F.3d 1048, 1054-55, 44 USPQ2d 1023, 1027-28 (Fed. Cir. 1997). Limitations appearing in the specification but not recited in the claim are not read into the claim. In re Prater, 415 F.2d 1393, 1404-05, 162 USPQ 541, 550-551(CCPA 1969).  See *also In re Zletz, 893 F.2d 319, 321-22, 13 USPQ2d 1320, 1322(Fed. Cir. 1989) ("During patent examination the pending claims must be interpreted as broadly as their terms reasonably allow .... The reason is simply that during patent prosecution when claims can be amended, ambiguities should be recognized, scope and breadth of language explored, and clari-fication imposed .... An essential purpose of patent examination is to fashion claims that are precise, clear, correct, and unambiguous. Only in this way can uncertainties of claim scope be removed, as much as possible, during the administrative process."). See MPEP § 2106. The Examiner has full latitude to interpret each claim in the broadest reasonable sense. 
	Consistent with the well-established axiom in patent law that a patentee or applicant is free to be his or her own lexicographer, a patentee or applicant may use terms in a manner contrary to or inconsistent with one or more of their ordinary meanings if the written description clearly redefines the terms. See MPSP §2173.05(a)(III).
	Therefore, the Broadest Reasonable Interpretation (BRI) of the instant claims in light of applicant’s manner of using the terms “known”, “unknown”, and “function” is as follows:
The BRI of subnetwork(s) to implement “known function(s)” includes merely having associated labeled training data, even if the mathematical function governing that training data is unknown.
The BRI of subnetwork(s) to implement “unknown function(s)” includes subnetwork(s) that can be trained by applying known labeled training data to a larger network containing the subnetwork(s). 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
Claim 8: processing elements configured to train an artificial neural network that comprises first subnetworks to implement known functions and second subnetworks to implement unknown functions…….and an input/output engine configured to store the first and second parameter values in a storage component.
Claim 9: processing elements is configured to concurrently provide input values of the corresponding known training data sets to the first subnetworks, concurrently generate error values for the first subnetworks, and concurrently modify the first parameter values.	
Claim 10: processing elements is configured to concurrently provide the input values of the corresponding known training data sets to first subnetworks, concurrently generate the error values for the first subnetworks, and concurrently modify the first parameter values that define the first subnetworks iteratively until convergence criteria for the first parameter values are satisfied.
Claim 11: processing elements is configured to: train a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork.
Claim 12: processing elements is configured to: provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and second parameter values of the first and second subnetworks… wherein the input/output engine is configured to
Claim 13: processing elements is configured to iteratively provide the input values of the network training data set to the artificial neural network, generate the error values, and modify the first and second parameter values that define the first and second subnetworks until a convergence criterion for the first parameter values is satisfied.
Claim 14: the input/output engine is configured to read parameter values for a subset of the first subnetworks of the artificial neural network from the storage component; and at least one of the plurality of processing elements is configured to define parameter values of a different artificial neural network using the parameter values for the subset of the first subnetworks of the artificial neural network.

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

Present application's disclosure provides the following description regarding the above generic modifiers:
processing elements configured to train/provide are interpreted as being implemented by the hardware of the GPU or CPU of the system (see instant as-filed specification ¶¶16-17, ¶¶20-23, ¶28, ¶¶31-33, ¶37, ¶48);
input/output engine configured to store/read is interpreted as being implemented by hardware coupled to the bus communicating with the memory, GPU, or CPU (see instant as-filed specification ¶18).


If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Response to Arguments
Applicant's arguments filed 7/27/2021 have been fully considered but they are not persuasive. In Re pages 7-16, applicant argues that the claimed “processing elements” and “input/output engine” should not be interpreted under §112(f) because they have sufficiently definite meaning. 


Claim Rejections - 35 USC § 112(a)
Response to Arguments
Applicant’s arguments, see pages 9-12, filed 7/27/2021, with respect to the rejection under §112(a) have been fully considered and are persuasive. The rejection of claims 8-14 under §112(a) has been withdrawn. 

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 6, 13, and 19 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6: It isn’t clear if the recitation of “providing the input values of the network training data set to the artificial neural network” in claim 6 lines 1-2 is referring for antecedent basis back to claim 1 lines 6-7 or to claim 5 lines 2-3. It isn’t clear if the recitation of “generating the error values” in claim 6 line 2 refers for antecedent basis back to claim 1 line 8 or to claim 5 line 5. It isn’t clear if the recitation of “modifying the first and second parameter values that define the first and second subnetworks” in claim 6 line 3 refers for antecedent basis back to claim 1 lines 4,10-11 or to claim 5 lines 7-8. For purposes of examination, any of the listed possibilities for antecedent basis support are within the Broadest Reasonable Interpretation (BRI) of the claim language. 
Claim 13: It isn’t clear if the recitation of “provide the input values of the network training data set to the artificial neural network” in claim 13 lines 2-3 is referring for antecedent basis back to claim 8 lines 8-9 or to claim 12 lines 3-4. It isn’t clear if the recitation of “generate the error values” in claim 13 line 3 refers for antecedent basis back 
Claim 19: It isn’t clear if the recitation of “providing the input values of the network training data set to the artificial neural network” in claim 19 lines 1-2 is referring for antecedent basis back to claim 15 line 8 or to claim 18 lines 2-3. It isn’t clear if the recitation of “generating the error values” in claim 19 line 2 refers for antecedent basis back to claim 15 line 9 or to claim 18 line 5. It isn’t clear if the recitation of “modifying the first and second parameter values that define the first and second subnetworks” in claim 19 line 2-3 refers for antecedent basis back to claim 15 lines 5,12-13 or to claim 18 lines 7-8.  For purposes of examination, any of the listed possibilities for antecedent basis support are within the Broadest Reasonable Interpretation (BRI) of the claim language. 
Response to Arguments
Applicant's arguments filed 7/27/2021 have been fully considered but they are not persuasive. In Re pages 12-13, applicant argues that claim 6 has been amended to overcome the rejection under §112(b). 
Examiner disagrees. As detailed in the rejection above, claim 6 has antecedent basis issues that need remedied, and therefore is still properly rejected under §112(b). 
Applicant’s arguments, see pages 12-13, filed 7/27/2021, with respect to the rejection of claims 8-14 under §112(b) have been fully considered and are persuasive. The previously-applied rejection of claims 8-14 under §112(b) has been withdrawn. 
	However, examiner has determined that a rejection under §112(b) is necessary for claims 13 and 19 due to antecedent basis issues similar to those in claim 6, as detailed above. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over 
Krizhevsky (US 2015/0294219) in view of
Ranzato (US 9,224,068).

Claim 1 (Independent)
Krizhevsky discloses: A method of training an artificial neural network that comprises first subnetworks to implement known functions (Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN. Here, first combination of layers comprise the elements - 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements - 116b, 114b, 112b, 110b, 108b and 106b. First and second combination of the layers (i.e. first subnetworks) are trained separately and in parallel using training example batch data 104a and 104b (i.e. corresponding known training datasets). See also ¶[0035] reasonably teaches that the worker updates (i.e. determine) the weight values (i.e. first parameter values) for the first subnetworks):
training the first subnetworks separately and in parallel on corresponding known training datasets to determine first parameter values that define the first subnetworks (Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN. Here, first combination of layers comprise the elements - 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements - 116b, 114b, 112b, 110b, 108b and 106b. First and second combination of the layers (i.e. first subnetworks) are trained separately and in parallel using training example batch data 104a and 104b (i.e. corresponding known training datasets). See also ¶[0035] reasonably teaches that the worker updates (i.e. determine) the weight values (i.e. first parameter values) for the first subnetworks); 
providing input values from a network training data set to the artificial neural network including the trained first subnetworks (Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks); 
generating error values by comparing output values produced by the artificial neural network to labeled output values of the network training data set (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network  to the known output (i.e. labeled output) for the training dataset).
Krizhevsky fails to explicitly recite:
second subnetworks to implement unknown functions; using the error values to modify second parameter values that define the second subnetworks without modifying the first parameter values; and storing the first and second parameter values.
Ranzato discloses: A method of training an artificial neural network that comprises second subnetworks to implement unknown functions (C3L25-55: performing an iteration of a stochastic gradient descent training procedure on the loss function while fixing the values of the initial patch locator neural network to generate updated values of the parameters of the first neural network and the second neural network while maintaining the adjusted values of the parameters of initial patch locator neural network; Also see C5L45-60 or C12L40-60 or C13L25-50), the method comprising: 
using the error values to modify second parameter values that define the second subnetworks without modifying the first parameter values (C3L25-55: performing an iteration of a stochastic gradient descent training procedure on the loss function while fixing the values of the initial patch locator neural network to generate updated values of the parameters of the first neural network and the second neural network while maintaining the adjusted values of the parameters of initial patch locator neural network; Also see C5L45-60 or C12L40-60 or C13L25-50); and 
storing the first and second parameter values (C14L1–C15L40: the described stored program operating on hardware and processors with memory and storage executing the described process necessarily stores the parameter values in at least the cache of the processor, and the reference further describes data servers and other components that make this storage necessity abundantly clear).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krizhevsky to incorporate training second subnetworks while already trained subnetwork Ranzato for the benefit of fixing predicted location while only values of the remaining components are adjusted (Ranzato especially e.g. C12L40-60). 

Claim 2
The combination of Krizhevsky and Ranzato as shown above discloses the method of Claim 1.
Krizhevsky further discloses: 
wherein training the first subnetworks comprises concurrently providing input values of the corresponding known training data sets to the first subnetworks (Krizhevsky fig 1 teaches that the system comprises concurrently providing the training example batch 1 104a to the first combination layers of the CNN and providing the training example batch 2 104b to the second combination of layers of the CNN. Here the first and the second combination of layers of the CNN (i.e. first subnetworks) are trained concurrently),
concurrently generating error values for the first subnetworks, and concurrently modifying the first parameter values (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and to the known output for the training dataset. See also ¶[0035] and ¶[0037] reasonably teach that each worker updates the weight values for its designated subnetwork. For example, the worker 1 100a updates weights (i.e. first parameter values) for the first combination of layers (i.e. first subnetwork) and the worker 2 100b updates weights (i.e. first parameter values) for the second combination of layers (i.e. first subnetwork) concurrently).

Claim 3
The combination of Krizhevsky and Ranzato as shown above discloses the method of Claim 2.
Krizhevsky further discloses:
wherein concurrently providing the input values of the corresponding known training data sets to first subnetworks (Krizhevsky fig 1 teaches that system comprises concurrently providing the training example batch 1 104a to first combination layers of the CNN and the training example batch 2 104b to second combination of ,
concurrently generating the error values for the first subnetworks, and concurrently modifying the first parameter values that define the first subnetworks are performed iteratively until convergence criteria for the first parameter values are satisfied (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and the known output for the training dataset. See also ¶[0035] and ¶[0037] reasonably teach that each worker updates the weight values for its designated subnetworks. For example, the worker 1 100a updates weights (i.e. first subnetwork) for the first combination of layers (i.e. first subnetwork) and the worker 2 100b updates weights (i.e. first subnetwork) for the second combination of layers (i.e. first subnetwork)  concurrently. See also ¶[0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first parameter values are satisfied).

Claim 4
The combination of Krizhevsky and Ranzato as shown above discloses the method of Claim 3.
Krizhevsky further discloses:
training a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork (Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers and the worker 2 100b trains second combination of layers. Both first and second combination of layers are part of the first subnetworks. Here either first combination of layers (trained by worker 1) or second combination of layers (trained by worker 2) can be read as “a second subnetwork” which can be read as a first subset of the first subnetworks. Here, the first subset encompasses the second subnetwork. Also, either training example batch 1 .

Claim 5
The combination of Krizhevsky and Ranzato as shown above discloses the method of Claim 1.
Krizhevsky further discloses:
providing the input values of the network training data set to an instance of the artificial neural network that is defined by the modified parameter values of the first and second subnetworks (Krizhevsky ¶ [0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network. During the first iteration of the training each worker updates the weights (i.e. modified parameter values) of the following combination of layers. For instance, (see fig 1) after the first iteration of training the worker 1 100a updates the weights (i.e. parameters) for first combination of layers (i.e. first subnetworks) and worker 3 100c updates the weights for third combination of layers (i.e. second subnetworks)); 
generating error values by comparing output values from the artificial neural network to labeled values in the network training data set (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset);
using error values to modify the first and second parameter values that define the first and second subnetworks (Para [0030] and ¶[0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. Each worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). For instance, (see fig 1) the worker 1 100a updates the weights (i.e. first parameter values) for first combination of layers (i.e. first subnetwork) and .
Krizhevsky fails to explicitly recite:
storing the modified parameter values that define the first and second subnetworks. 
Ranzato discloses: 
storing the modified parameter values that define the first and second subnetwork (C14L1-30: the described stored program operating on hardware and processors with memory and storage executing the described process necessarily stores the parameter values in at least the cache of the processor).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krizhevsky to incorporate storing parameters as taught by Ranzato for the benefit this being a necessary part of computer processing (Ranzato especially e.g. C14:1–C15L40).

Claim 6
The combination of Krizhevsky and Ranzato as shown above discloses the method of Claim 5.
Krizhevsky further discloses:
wherein providing the input values of the network training data set to the artificial neural network (Krizhevsky ¶[0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network),
generating the error values (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and the known output for the training dataset), and
modifying the first parameter values that define the first subnetworks are performed iteratively until a convergence criterion for the first parameter values is satisfied (Krizhevsky ¶[0035] and ¶[0037] teach that each worker updates the weight values (i.e. first parameter values) for the first subnetworks. For example, the worker 1 100a updates weights for the first combination of layers and the worker 2 100b updates weights for the second combination of layers concurrently. See also ¶[0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first parameter values are satisfied).

Claim 7
The combination of Krizhevsky and Ranzato as shown above discloses the method of Claim 1.
Krizhevsky further discloses:
reading stored parameter values for a subset of the first subnetworks of the artificial neural network (Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network)); and 
defining parameter values of a different artificial neural network using the stored parameter values for the subset of the first subnetworks of the artificial neural network (Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network) and uses these data to update the weights (i.e. defining parameter values of a different artificial neural network) by backpropagation).

Claim 8 (Independent)
Krizhevsky discloses: A processing system comprising: a plurality of processing elements configured to train an artificial neural network that comprises first subnetworks to implement known functions (Krizhevsky fig 1 teaches by:
training the first subnetworks separately and in parallel on corresponding known training datasets to determine first parameter values that define the first subnetworks (Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN. Here, first combination of layers comprise the elements - 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements - 116b, 114b, 112b, 110b, 108b and 106b. First and second combination of the layers (i.e. first subnetworks) are trained separately and in parallel using training example batch data 104a and 104b (i.e. corresponding known training datasets). See also ¶[0035] reasonably teaches that the worker updates (i.e. determine) the weight values (i.e. first parameter values) for the first subnetworks);
providing input values from a network training data set to the artificial neural network including the trained first subnetworks (Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks);
generating error values by comparing output values produced by the artificial neural network to labeled output values of the network training data set (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network  to the known output (i.e. labeled output) for the training dataset);
Krizhevsky fails to explicitly recite:
using the error values to modify second parameter values that define the second subnetworks without modifying the first parameter values; and storing the first and second parameter values.
Ranzato discloses: A processing system comprising: a plurality of processing elements configured to train an artificial neural network that comprises second subnetworks to implement unknown functions (C3L25-55: performing an iteration of a stochastic gradient descent training procedure on the loss function while fixing the values of  by: 
training the second subnetworks by using the error values to modify second parameter values that define the second subnetworks without modifying the first parameter values (C3L25-55: performing an iteration of a stochastic gradient descent training procedure on the loss function while fixing the values of the initial patch locator neural network to generate updated values of the parameters of the first neural network and the second neural network while maintaining the adjusted values of the parameters of initial patch locator neural network; Also see C5L45-60 or C12L40-60 or C13L25-50); and
an input/output engine configured to store the first and second parameter values in a storage component (C14L1–C15L40: the described stored program operating on hardware and processors with memory and storage executing the described process necessarily stores the parameter values in at least the cache of the processor, and the reference further describes data servers and other components that make this storage necessity abundantly clear C13L29–C15L40 or “input” and “output” throughout). 
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krizhevsky to incorporate training second subnetworks while already trained subnetwork parameters are fixed as taught by Ranzato for the benefit of fixing predicted location while only values of the remaining components are adjusted (Ranzato especially e.g. C12L40-60). 

Claim 9
The combination of Krizhevsky and Ranzato as shown above discloses the system of Claim 8.
Krizhevsky further discloses:
wherein the plurality of processing elements is configured to concurrently provide input values of the corresponding known training data sets to the first subnetworks (Krizhevsky fig 1 teaches that system comprises concurrently providing the training example batch 1 104a to the first combination layers of the CNN and providing the training example batch 2 104b to the second combination of layers of the CNN. Here the first and the second combination of layers of the CNN (i.e. first subnetworks) are trained concurrently); 
concurrently generate error values for the first subnetworks, and concurrently modify the first parameter values (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and to the known output for the training dataset. See also ¶[0035] and ¶[0037] reasonably teach that each worker updates the weight values (i.e. first parameter values) for the first subnetworks. For example, the worker 1 100a updates weights for the first combination of layers and the worker 2 100b updates weights for the second combination of layers concurrently).

Claim 10
The combination of Krizhevsky and Ranzato as shown above discloses the system of Claim 9.
Krizhevsky further discloses
wherein the plurality of processing elements is configured to concurrently provide the input values of the corresponding known training data sets to first subnetworks, concurrently generate the error values for the first subnetworks, and concurrently modify the first parameter values that define the first subnetworks iteratively until convergence criteria for the first parameter values are satisfied (Krizhevsky fig 1 teaches that system comprises concurrently providing the training example batch 1 104a to first combination layers of the CNN and the training example batch 2 104b to second combination of layers of the CNN. Here first and second combination of layers of the CNN (i.e. first subnetworks) are trained concurrently);
concurrently generate the error values for the first subnetworks, and concurrently modify the first parameter values that define the first subnetworks iteratively until convergence criteria for the first parameter values are satisfied (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that each worker can concurrently generate error values comparing the output values by the neural network and to the known output for the training dataset. See also ¶[0035] and ¶[0037] reasonably teach that each worker updates the weight values for the first subnetworks. For example, the worker 1 100a updates weights (i.e. first parameter values)  for the first combination of layers (i.e. first subnetworks) and the worker 2 100b updates weights (i.e. first parameter values)  for the second combination of layers (i.e. first subnetworks) concurrently. See also ¶[0039] – “Once each worker has completed the process 200, each worker .

Claim 11
The combination of Krizhevsky and Ranzato as shown above discloses the system of Claim 10. 
Krizhevsky further discloses:
wherein the plurality of processing elements is configured to: train a second subnetwork based on a cutout training set formed of a subset of known training datasets corresponding to a first subset of the first subnetworks, wherein the first subset encompasses the second subnetwork (Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers and the worker 2 100b trains second combination of layers. Both first and second combination of layers are part of the first subnetworks. Here either first combination of layers (trained by worker 1) or second combination of layers (trained by worker 2) can be read as “a second subnetwork” which can be read as a first subset of the first subnetworks. Here, the first subset encompasses the second subnetwork. Also, either training example batch 1 104a or training example batch 2 104b can be read as a cutout training set, since 104a or 104b are a subset of the training example batches (i.e. known training datasets)).

Claim 12
The combination of Krizhevsky and Ranzato as shown above discloses the system of Claim 8.
Krizhevsky further discloses:
provide the input values of the network training data set to an instance of the artificial neural network that is defined by the modified first and second parameter values of the first and second subnetworks (Krizhevsky ¶[0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural ;
generate error values by comparing output values from the artificial neural network to labeled values in the network training data set (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and the known output (i.e. labeled output) for the training dataset); and 
use the error values to modify the first and second parameter values that define the first and second subnetworks (Krizhevsky ¶[0030] and ¶[0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. Each worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). For instance, (see fig 1) the worker 1 100a updates the weights (i.e. first parameter values) for first combination of layers (i.e. first subnetwork) and worker 3 100c updates the weights (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks)).
Krizhevsky fails to explicitly recite:
wherein the input/output engine is configured to store the modified parameter values that define the first and second subnetworks in the storage component.
Ranzato discloses: 
storing the modified parameter values that define the first and second subnetwork (C14L1-30: the described stored program operating on hardware and processors with memory and storage executing the described process necessarily stores the parameter values in at least the cache of the processor).
Rationale:
Krizhevsky to incorporate storing parameters as taught by Ranzato for the benefit this being a necessary part of computer processing (Ranzato especially e.g. C14:1–C15L40).

Claim 13
The combination of Krizhevsky and Ranzato as shown above teaches the system of Claim 12.
Krizhevsky further teaches
wherein the plurality of processing elements is configured to iteratively provide the input values of the network training data set to the artificial neural network (Krizhevsky ¶[0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network),
generate the error values (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output for the training dataset), and
modify the first and second parameter values that define the first and second subnetworks until a convergence criterion for the first parameter values is satisfied (Krizhevsky ¶[0035] and ¶[0037] teach that each worker updates the weight values (i.e. first parameter values) for the first subnetworks. For example, the worker 1 100a updates weights for the first combination of layers and the worker 2 100b updates weights for the second combination of layers concurrently. See also ¶[0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first and second parameter values are satisfied).

Claim 14
The combination of Krizhevsky and Ranzato as shown above teaches the system of Claim 8.
Krizhevsky further teaches
the input/output engine is configured to read parameter values for a subset of the first subnetworks of the artificial neural network from the storage component (Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network); and
at least one of the plurality of processing elements is configured to define parameter values of a different artificial neural network using the parameter values for the subset of the first subnetworks of the artificial neural network (Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network) and uses these data to update the weights (i.e. defining parameter values of a different artificial neural network) by backpropagation).

Claim 15 (Independent)
Krizhevsky discloses: 
reading, using an input/output engine of a processing system, first parameter values that define first subnetworks to implement known functions, wherein the first subnetworks have been trained on corresponding known training datasets (Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN. Here, first combination of layers comprise the element 116a, 114a, 112a, 110a, 108a and 106a. Second combination of layers comprise the elements 116b, 114b, 112b, 110b, 108b and 106b. First and second combination of the layers (i.e. first subnetworks) are trained separately and in parallel using training example batch data 104a and 104b (i.e. corresponding known training datasets));
generating, at the processing system, an artificial neural network by combining the first and second subnetworks (Krizhevsky fig 1 teaches that the worker 1 100a trains the first combination of layers of the CNN and the worker 2 100b trains the second combination of the layers of the CNN and worker 3 100c trains the third ;
providing input values of a network training data set to the artificial neural network (Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks);
generating, at the processing system, error values for the artificial neural network by comparing output values from the artificial neural network to labeled output values in the network training data set (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset); and 
modifying, at the processing system, the second parameter values that define the second subnetworks based on the error values (Krizhevsky fig 1 teaches that the worker 3 100c trains the third combination of layers (i.e. second subnetworks) which comprise the elements 116c, 114c, 112c, 110c, 108c and 106c. ¶[0030] and ¶[0035] teach that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. The worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). This reasonably teaches the worker 3 100c updates (i.e. modify) the weight values (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks)).
Krizhevsky fails to explicitly recite:
second parameter values that define second subnetworks to implement unknown functions; … modify second parameter values that define the second subnetworks without modifying the first parameter values (EN: ); and storing the first and second parameter values.
Ranzato discloses: 
reading, using an input/output engine of a processing system, second parameter values that define second subnetworks to implement unknown functions (C13L29–C15L40 or “input” and “output” throughout or C3L25-55: performing an iteration of a stochastic gradient descent training procedure on the loss function while fixing the values of the initial patch locator neural network to generate updated values of the parameters of the first neural network and the second neural network while maintaining the adjusted values of the parameters of initial patch locator neural network; Also see C5L45-60 or C12L40-60 or C13L25-50); 
modifying, at the processing system, the second parameter values that define the second subnetworks based on the error values (C3L25-55: performing an iteration of a stochastic gradient descent training procedure on the loss function while fixing the values of the initial patch locator neural network to generate updated values of the parameters of the first neural network and the second neural network while maintaining the adjusted values of the parameters of initial patch locator neural network; Also see C5L45-60 or C12L40-60 or C13L25-50).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krizhevsky to incorporate training second subnetworks while already trained subnetwork parameters are fixed as taught by Ranzato for the benefit of fixing predicted location while only values of the remaining components are adjusted (Ranzato especially e.g. C12L40-60). 

Claim 16
The combination of Krizhevsky and Ranzato as shown above discloses the system of Claim 15.
Krizhevsky further discloses:
wherein providing the input values of the network training data set to the artificial neural network (Krizhevsky fig 1 teaches that providing the training data 102 (i.e. a network training data set) to the entire neural network which includes the first subnetworks), 
generating the error values for the artificial neural network, and modifying the second parameter values that define the second subnetworks are performed iteratively until a convergence criterion for the second parameter values that define the second subnetworks is satisfied (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the .

Claim 17
The combination of Krizhevsky and Ranzato as shown above teaches the system of Claim 16.
Krizhevsky fails to explicitly recite:
storing the first and second parameter values in a storage component.
Ranzato discloses: 
storing the modified parameter values that define the first and second subnetwork (C14L1-30: the described stored program operating on hardware and processors with memory and storage executing the described process necessarily stores the parameter values in at least the cache of the processor).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krizhevsky to incorporate storing parameters as taught by Ranzato for the benefit this being a necessary part of computer processing (Ranzato especially e.g. C14:1–C15L40).

Claim 18
The combination of Krizhevsky and Ranzato as shown above teaches the system of Claim 17.
Krizhevsky
providing the input values of the network training data set to an instance of the artificial neural network that is defined by the modified parameter values of the second subnetwork (Krizhevsky ¶[0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network. During the first iteration of the training each worker updates the weights (i.e. modified parameter values) of the following combination of layers. For instance, (see fig 1) after the first iteration of training the worker 1 100a updates the weights (i.e. parameters) for first combination of layers (i.e. first subnetworks) and worker 3 100c updates the weights for third combination of layers (i.e. second subnetworks));
generating error values by comparing output values from the artificial neural network to labeled values in the network training data set (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output (i.e. labeled output) for the training dataset); and 
using the error values to modify the first and second parameter values that define the first and second subnetworks (¶[0030] and ¶[0035] teach that the worker can generate error values comparing the output values by the neural network and the known output (i.e. labeled output) for the training dataset. The worker then computes the gradient of an objective function for the training example using the error. Each worker updates weight values for the convolutional layer replicas and the fully-connected layer partitions maintained by the worker using the corresponding gradients for each replica and partition (step 218). For instance, (see fig 1) the worker 1 100a updates the weights (i.e. first parameter values) for first combination of layers (i.e. first subnetwork) and worker 3 100c updates the weights (i.e. second parameter values) for the third combination of layers (i.e. second subnetworks)).

Claim 19
The combination of Krizhevsky and Ranzato as shown above discloses the system of Claim 18.
Krizhevsky
wherein providing the input values of the network training data set to the artificial neural network (Krizhevsky ¶[0024] – “Once each worker has performed the training technique on the worker's assigned batch, each worker can be assigned a new batch of training examples and can perform additional iterations of the training technique to train the CNN on the new batch,” teaches that after finishing the first iteration of training the CNN, the system can provide training data set (i.e. the input values of the network training data set) to neural network. During the first iteration of the training each worker updates the weights (i.e. modified parameter values) of the following combination of layers. For instance, (see fig 1) after the first iteration of training the worker 1 100a updates the weights (i.e. parameters) for first combination of layers (i.e. first subnetworks) and worker 3 100c updates the weights for third combination of layers (i.e. second subnetworks)),
generating the error values (Krizhevsky ¶[0030] - “for each training example, the worker determines the error between the output portion computed by the worker and the corresponding portion of the known output for the training example” teaches that the worker can generate error values comparing the output values by the neural network and to the known output for the training dataset), and 
modifying the first and second parameter values that define the first and second subnetworks are performed iteratively until a convergence criterion for the first and second parameter values is satisfied (Krizhevsky ¶[0035] and ¶[0037] teach that each worker updates the weight values for the subnetworks. For example, the worker 1 100a updates weights (i.e. first parameter values) for the first combination of layers (i.e. first subnetworks) and the worker 2 100b updates weights (i.e. first parameter values)  for the second combination of layers (i.e. first subnetworks) and worker 3 100c updates weights (i.e. second parameter values) for the third combination of networks (i.e. second subnetwork). See also ¶[0039] – “Once each worker has completed the process 200, each worker can be assigned a new batch of training examples and can repeat the process 200 for the new batch. The workers can continue to repeat the process 200, e.g., until convergence criteria for the training of the CNN have been satisfied” teaches that the training process is performed until convergence criteria for the first and second parameter values are satisfied).

Claim 20
The combination of Krizhevsky and Ranzato as shown above discloses the system of Claim 19.
Krizhevsky
- 25 -Attorney Docket Number: 1458-170142defining parameter values of a different artificial neural network using a subset of the stored parameter values (Krizhevsky fig 1 and fig 5 teach that workers send convolutional activation data and gradient data (i.e. parameter values) to each other. This reasonably teaches that worker 3 100c can read activation data and gradient data from the worker 1 100a that trains the first combination of layers (i.e. subset of the first subnetworks of the artificial neural network) and uses these data to update the weights (i.e. defining parameter values of a different artificial neural network) by backpropagation).

Krizhevsky fails to explicitly recite:
storing the modified parameter values that define the first and second subnetworks in the storage component.
Ranzato discloses: 
storing the modified parameter values that define the first and second subnetwork (C14L1-30: the described stored program operating on hardware and processors with memory and storage executing the described process necessarily stores the parameter values in at least the cache of the processor).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krizhevsky to incorporate storing parameters as taught by Ranzato for the benefit this being a necessary part of computer processing (Ranzato especially e.g. C14:1–C15L40).

Examiner’s Note
The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because Ranzato teaches the disputed limitations in the new ground of rejection.

Conclusion
Any prior art made of record on the attached PTO-892 and not relied upon is considered pertinent to applicant's disclosure.
Applicant is reminded that in amending in response to a rejection of claims, the patentable novelty must be clearly shown in view of the state of the art disclosed by the references cited and the objections made.  Applicant must also show how the amendments avoid such references and objections.  See 37 CFR §1.111(c).  Additionally when amending, in their remarks Applicant should particularly cite to the supporting paragraphs in the original disclosure for the amendments.

Correspondence Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN J BUSS whose telephone number is (571)272-5831.  The examiner can normally be reached on M-F 9A-5P ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
As detailed in MPEP 502.03, communications via Internet e-mail are at the discretion of the applicant.  Without a written authorization by applicant in place, the USPTO will not respond via Internet e-mail to any Internet correspondence which contains information subject to the confidentiality requirement as set forth in 35 U.S.C. 122. A paper copy of such correspondence will be placed in the appropriate patent application. Examiner suggests filing PTO/SB/439 if applicant desires the examiner to be able to communicate by email.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 


/B. B./
Examiner, Art Unit 2125




/ABDULLAH AL KAWSAR/             Supervisory Patent Examiner, Art Unit 2127