Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 1-10 and 14 are objected to because of the following informalities:  
In claim 1, second-to-last paragraph, “the first position and a second position” should be “the first position and the second position”.
In claims 2-10, the preamble recitation of “A method” should be “The method”.
In claim 14, the beginning of claim should be “A data processing apparatus”.
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
In claim 14, the term “one or more processing elements to implement the ANN of claim 13” invokes § 112(f) because “element” is a generic placeholder in the manner of “means” and “to implement…” is functional language modifying the generic placeholder.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In claim 1, line 4, “the preceding layer” lacks antecedent basis. For purposes of examination, “the” has been regarded as “an.”
Claim 13 recites “An Artificial neural network (ANN) generated by the method of claim 1.” This term is indefinite because claim 1 recites two ANNs: a “base ANN” and a “modified ANN.” Therefore, it is unclear as to which ANN is being referred to by claim 13. For purposes of examination, claim 13 has been interpreted to refer to the “modified ANN.”
Claims that dependent from one or more of the above discussed claims are also rejected for the same reasons, since these dependent claims incorporate the indefinite recitations of their parent claims without curing the deficiencies thereof.  

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or 
Claims 1-2, 4, 6-7, and 9-14 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 9, 11-13, 15-16, 18, and 20 of copending Application No. 16/280,065 in view of Yam et al., “A new method in determining initial weights of feedforward neural networks for training enhancement,” Neurocomputing 16 (1997) 23-32 (“Yam”).
Although the claims at issue are not identical, they are not patentably distinct from each other. Corresponding features are shown in the table below.
Claims of the current application
Copending application 16/280,065 (claims filed on 1/21/2022)
	1. A computer-implemented method of generating a modified artificial neural network (ANN) from a base ANN having an ordered series of two or more successive layers of neurons, each layer passing data signals to the next layer in the ordered series, the neurons of each layer processing the data signals received from the preceding layer according to an activation function and weights for that layer,
	the method comprising:
	detecting the data signals for a first position and a second position in the ordered series of layers of neurons;
	generating the modified ANN from the base ANN by providing an introduced layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN;
	deriving  at least a set of weights for the introduced layer 
	processing training data using the modified ANN to train the modified ANN .
a derived artificial neural network (ANN) from a base ANN, the method comprising: 
	initialising a set of parameters of the derived ANN in dependence upon parameters of the base ANN; 
	inferring a set of output data from a set of input data using the base ANN; 
	quantising the set of output data; and 	training the derived ANN using training data comprising the set of input data and the quantised set of output data, 
	wherein the derived ANN has a different network structure to the base ANN, the base ANN having an ordered series of two or more successive layers of neurons, the two or more successive layers or the ordered series being fully connected layers, each layer passing data signals to the next layer in the ordered series, the neurons of each layer processing the data signals received from the preceding layer according to an activation function and weights for that layer, wherein the method for processing the data signals received from the preceding layer according to an activation function and weights for that layer includes 	detecting the data signals for a first position and a second position in the ordered series of layers of neurons, 	generating the derived ANN from the base ANN by providing an insertion layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN, 	initialising at least a set of weights for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position; and 	in response to at least one of the two or more successive layers in the derived ANN being a convolutional layer, reformulate the convolutional layer as a fully connected layer.

	9. The method according to claim 1, wherein the two or more successive layers are fully connected layers in which each neuron in a fully connected layer is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer.
	4. A method according to claim 1, in which the training data comprises a set of data having set of known input data and corresponding output data, and in which the processing step comprises varying at least the weighting of at least the introduced layer to so that, for an instances of known input data, the output data of the modified ANN is closer to the corresponding known output data.
	11. The method of according to claim 1, in which the training step comprises varying at least the weighting of at least the insertion layer to so that, for an instances of known input data, the output data of the derived ANN is closer to the quantised set of output data.	See also claim 1 (“…training data comprising the set of input data and the quantised set of output data…	“)
	6. A method according to claim 1, in which the generating step comprises providing the introduced layer to replace one or more layers of the base ANN.
	12. The method according to claim 1, in which the generating step comprises providing the insertion layer to replace one or more layers of the base ANN.

	13. The method according to claim 12, in which the insertion layer has a different layer size to that of the one or more layers it replaces.
	9. A method according to claim 1, comprising adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN.
	15. The method  according to claim 1, comprising adding a further weighting to the least squares approximation of the set of weights to simulate the addition of dropout noise in the ANN.
	10. A method according to claim 1, in which the neurons of each layer of the ANN process the data signals received from the preceding layer according to a bias function for that layer, the method comprising deriving an initial approximation of at least a bias function for the introduced layer using a least squares approximation from the data signals detected for the first position and a second position
	16. The method according to claim 1, in which the neurons of each layer of the base ANN process the data signals received from the preceding layer according to a bias function for that layer, the method comprising deriving an initial approximation of at least a bias function for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position
	11. Computer software which, when executed by a computer, causes the computer to implement the method of claim 1.
	18. A non-transitory machine readable computer-readable storage medium storing computer readable instructions thereon which, when executed by a computer, cause the computer to perform a method, the method comprising [steps that are analogous to those of claim 1]
	12. A non-transitory machine-readable medium which stores computer software according to claim 11
	18. A non-transitory computer-readable storage medium storing computer readable instructions thereon which, when executed by a computer, cause the computer to perform a method, the method comprising [operations  that are analogous to those of claim 1]
	13. An Artificial neural network (ANN) generated by the method of claim 1.
	See claim 1, which recites the generation of a derived ANN.
	14. Data processing apparatus comprising one or more processing elements to implement the ANN of claim 13.
	20. A data processing apparatus, comprising circuitry configured to [perform operations  that are analogous to those of claim 1]


	It is noted that in the copending application claims, the elements of “derived ANN” and “insertion layer” correspond to the elements of “modified ANN” and “introduced layer” since the derived ANN is a modified ANN. 
an initial approximation of” the at least the set of weights; and (2) the processing of the training data includes “training the weights of the introduced layer from their initial approximation,” as recited in claim 1 of the instant application. 
	However, these differences are obvious because Yam et al. teaches “an initial approximation of [at least a set of weights]” and “training the weights of the introduced layer from their initial approximation.” [Abstract: “The optimal initial weights are evaluated by using a least squares method at each layer.” The purpose of weight initiation is so that “the set of weights obtained becomes a new starting point for further training” (page 24, first full paragraph, last sentence). Such training is taught in § 1, last paragraph: “To achieve an even smaller learning error, a training session can be continued with the conventional backpropagation algorithm (BP).” See also parts of Yam cited in the § 103 rejection below.] 
It would have been obvious to one of ordinary skill in the art to have modified the co-pending claims to include the elements that the initialization derives “an initial approximation of” the at least the set of weights; and the processing of the training data includes “training the weights of the introduced layer from their initial approximation.” The motivation would have been to obtain optimal initial weights and to perform further training with those weights as a starting point, such that the effect that the speed of training is increased, as suggested by Yam, § 1, last paragraph (“By starting with a set of optimal initial weights, the speed of training a neural network can be substantially increased.”).
	As noted above, a timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome a provisional rejection based on nonstatutory double patenting provided the reference application either is shown to be commonly owned with the 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 11 and 13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  
The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the “computer software” of claim 11 and the “artificial neural network” of claim 13 are considered to be software per se. As stated in MPEP § 2106.03, “products that do not have a physical or tangible form, such as…a computer program per se (often referred to as ‘software per se’) when claimed as a product without any structural recitations” are not within any of the statutory categories.
This rejection can be overcome by canceling claims 11 and 13.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-4, 6-8, and 10-14 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 2018/0204115 A1) in view of Yam et al., “A new method in determining initial weights of feedforward neural networks for training enhancement,” Neurocomputing 16 (1997) 23-32 (“Yam”).
As to claim 1, Lee teaches a computer-implemented method of generating a modified artificial neural network (ANN) from a base ANN [[0004]: “at least one of inserting the layer between the first layer and the second layer of the neural network and replacing one of the first layer and the second layer with the layer to reduce a number of connections in the neural network.” That is, “inserting” or “replacing” constitutes modifying a base neural network.] having an ordered series of two or more successive layers of neurons, each layer passing data signals to the next layer in the ordered series, [[0029]: “FIG. 1 depicts two adjacent layers of a neural network 100. More specifically, network 100 includes a layer 102, which includes neurons 106, and a layer 104, which includes neurons 108. Further, synaptic connections 110 (also referred to herein as “synapses”) connect each neuron of a layer with each neuron of an adjacent layer.”] the neurons of each layer processing the data signals received from the preceding layer according to an activation function and weights for that layer, [[0018]: “each connection may have a numeric weight for tuning the importance of communication between the neurons.” [0032]: “each segment may have one or more variable neuron activation methods (e.g., defining an output of a neuron given an input or set of inputs).” [0018]: “neuron activation states.”] 
the method comprising:
detecting the data signals for a first position and a second position in the ordered series of layers of neurons; [[0018]: “Neurons of a neural network may communicate with each other through synaptic connections between them… The weight tuning and therefore the data flow for the connections synapses may be performed in parallel.” The operation of “detecting” is met by the use (execution) of the neural network, such as the weight tuning described above in [0018] and also described in [0017]: “cognitive tasks such as computer vision and speech recognition.” Moreover, execution of the neural network is which is taught in [0016]. With respect to “first position” and “second position,” these positions may be any positions before and after the inserted layer (e.g., an “insertion layer” as noted below). For example, the output of layer 102 may be regarded as a “first position” and the input of layer 104 may be regarded as a “second position.” Furthermore, noting that dependent claim 8 states that the two positions may be “the same,” the output of layer 102 (which is used as the input of layer 104) may also be regarded as the signals of both the first and second positions.]
generating the modified ANN from the base ANN by providing an introduced layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN; [[0022]: “an insertion layer having a number of neurons P may be inserted and connected between the two layers.” [0039]: “an insertion layer may be added to a neural network, and method 400 may proceed to block 404. For example, the insertion layer (e.g., insertion layer 202; see FIG. 2) may be positioned between and connected to layers 102 and 104 of neural network 200.”] 
deriving an initial approximation…” and “processing training data…” [Lee teaches “weight initialization” for the inserted layer in general (see [0026] and [0032]), but does not teach the particular weight initialization recited in the instant claim. 
Yam, in an analogous art, teaches the further limitations. Yam teaches “determining initial weights of feedforward neural networks for training enhancement” (title). Therefore, Yam is in the same field of endeavor as the claimed invention, namely machine learning.  
In particular, Yam teaches: 
deriving an initial approximation of at least a set of weights for the introduced layer using a least squares approximation from the data signals detected for the first position and a second position; [Abstract: “The optimal initial weights are evaluated by using a least squares method at each layer.” The algorithm is described in § 2. Specifically, page 27, which teaches the least squares minimization problem represented by expression (13), i.e., “                        
                            m
                            i
                            n
                            i
                            m
                            i
                            z
                            e
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    A
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    W
                                                
                                                
                                                    l
                                                
                                            
                                            -
                                            
                                                
                                                    S
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                                
                                    2
                                
                            
                        
                    ” and teaches that “This linear least squares problem can be solved by QR factorisation using Householder reflections” (text below expression (12)). Note that the notation of                         
                            
                                
                                    
                                        
                                            x
                                        
                                    
                                
                                
                                    2
                                
                            
                        
                     denotes the L2-norm, which includes squared terms. Here, Wl is the weight of a layer (see § 2, paragraph 1), the inputs used to compute Al, including                         
                            
                                
                                    o
                                
                                
                                    i
                                    ,
                                    j
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                    as described in expression (5) correspond to the data signals of the “first position,” and the targets t (e.g.,                         
                            
                                
                                    t
                                
                                
                                    l
                                    +
                                    1
                                
                            
                        
                    ) that are incorporated into Sl  in accordance with expression (11) correspond to data signals of the “second position.” Note that since the method of Yam is applicable to each layer, a particular layer in the method of Yam corresponds to an “introduced layer.”] and
processing training data using the modified ANN to train the modified ANN including training the weights of the introduced layer from their initial approximation. [In training” (page 24, first full paragraph, last sentence). Such training is taught in § 1, last paragraph: “To achieve an even smaller learning error, a training session can be continued with the conventional backpropagation algorithm (BP). By starting with a set of optimal initial weights, the speed of training a neural network can be substantially increased.” Training data is described in § 2, paragraph 1: “For a training set with P patterns…”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Lee with the teachings of Yam by performing the further operations of “deriving an initial approximation of at least a set of weights for the introduced layer using a least squares approximation from the data signals detected for the first position and a second position;” and “processing training data using the modified ANN to train the modified ANN including training the weights of the introduced layer from their initial approximation.” The motivation would have been to obtain optimal initial weights and to perform further training with those weights as a starting point, such that the effect that the speed of training is increased, as suggested by Yam, § 1, last paragraph: “By starting with a set of optimal initial weights, the speed of training a neural network can be substantially increased.”

As to claim 2, the combination of Lee and Yam teaches a method according to claim 1, in which [(at least one of the)] the two or more successive layers are fully connected layers in which each neuron in a fully connected layer is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer. [Lee, [0020]: “if two adjacent layers of a neural network have M and N numbers of neurons, the 
Yam further teaches the limitation that (each of) “the two or more successive layers are fully connected layers…” [§ 2, paragraph 1: “a multilayer neural network with L fully interconnected layers.”]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated, into the thus-far combination of references, the above-described further teachings of Yam by modifying the neural network of Lee such that (each of) the two or more successive layers are fully connected layers in which each neuron in a fully connected layer is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer. The motivation for doing so would have been to implement a common type of neural network structure (Yam, § 1, paragraph 1: “Multilayer feedforward neural networks are probably the most widely used neural networks.”) suitable for tasks such as character recognition (Yam, § 3, text below expression (16): “In the character recognition problem”). Furthermore, since fully connected layers are taught in Lee in general, the results of multiple fully connected layers would also have been predictable. 

As to claim 3, the combination of Lee and Yam teaches a method according to claim 1, in which at least one of the two or more successive layers is a convolutional layer, the method comprising deriving a fully connected layer from the convolutional layer. [Lee, [0021]: “layer may be inserted between two adjacent layers of a neural network. For example, a layer (also referred to herein as an “insertion layer”) may be inserted between one layer (e.g., the convolutional layer) and another layer (e.g., the first FC (fully-connected) layer) in a conventional convolutional neural network (CNN).” With respect to the limitation of “deriving,” this limitation is taught because the fully-connected layer follows the convolutional layer, and is thus trained based on the convolutional layer (note that training is described in [0018]: “The weight tuning…”)]

As to claim 4, the combination of Lee and Yam teaches a method according to claim 1, in which the training data comprises a set of data having set of known input data and corresponding output data, [Yam, § 2, paragraph 1: “For a training set with P patterns, all given inputs can be represented… Similarly, the targets can be represented by.” Note that the “targets” corresponding to the target output for training purposes, i.e., “known output data” recited in the below part of the claim. Regarding “output data,” the output of the layers is represented by Ol, as taught with respect to expression (1).] and in which the processing step comprises varying at least the weighting of at least the introduced layer to so that, for an instances of known input data, the output data of the modified ANN is closer to the corresponding known output data. [Yam, § 1, last paragraph: “To achieve an even smaller learning error, a training session can be continued with the conventional backpropagation algorithm (BP). By starting with a set of optimal initial weights, the speed of training a neural network can be substantially increased.” It is noted that the training of a neural network by backpropagation implies, to one of ordinary skill in the art, the act of “varying at least the weighting of at least the introduced layer to so that, for an instances of known input data, the output data of the modified ANN is closer to the corresponding known output data,” because the function of backpropagation is to optimize the weights to minimize loss (i.e., to be closer to the 

As to claim 6, the combination of Lee and Yam teaches a method according to claim 1, in which the generating step comprises providing the introduced layer to replace one or more layers of the base ANN. [Lee, [0023]: “a layer of a neural network may be replaced by another layer (a “replacement layer”). For example, a layer (e.g., the first FC layer) in a neural network (e.g., a CNN) may be replaced by a replacement layer, wherein the replacement layer has a number of neurons less than a number of neurons of the layer being replaced.” [0034]: “In one example, layer 258 includes X neurons, wherein X<N.” In the context of a replacement layer, the “first position” and the “second position” may be regarded as the input and output of the layer that is being replaced.]

As to claim 7, the combination of Lee and Yam teaches a method according to claim 6, in which the introduced layer has a different layer size to that of the one or more layers it replaces. [[0034]: “layer 104 of neural network 100 (see FIG. 1) has been replaced by layer 258, wherein a number of neurons in layer 258 is less than a number of neurons in layer 104.” Note that “number of neurons” here corresponds to the size of the layer.]

As to claim 8, the combination of Lee and Yam teaches a method according to claim 1, in which the first position and the second position are the same, and the generating step comprises providing the introduced layer in addition to the layers of the base ANN. [[0039]: insertion layer (e.g., insertion layer 202; see FIG. 2) may be positioned between and connected to layers 102 and 104 of neural network 200.” Here, the position between layers 102 and 104 can be considered to be two positions that are the same.] 

As to claim 10, the combination of Lee and Yam teaches a method according to claim 1, as set forth in the rejection above.
Yam further teaches in which the neurons of each layer of the ANN process the data signals received from the preceding layer according to a bias function for that layer, [Yam, § 2, paragraph 1: “The layer 1 consists of nl + 1 neurons (1 = 1, . . . , L - 1) in which the last neuron is a bias node with a constant output of 1.0.”] the method comprising deriving an initial approximation of at least a bias function for the introduced layer using a least squares approximation from the data signals detected for the first position and a second position. [As noted in the rejection of claim 1, Yam teaches estimating the weights of a layer. Since the layer includes a neuron that serves as a bias, the estimating of the initial weights of the bias neuron satisfies the instant limitation.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated, into the thus-far combination references, the above further teachings of Yam by modifying the method of Lee (as modified by Yam thus far) to include the further feature that “the neurons of each layer of the ANN process the data signals received from the preceding layer according to a bias function for that layer” and the further operation of “deriving an initial approximation of at least a bias function for the introduced layer using a least squares approximation from the data signals detected for the first position and a second position.” The motivation for doing so would have been to incorporate a bias variable bias node with a constant output of 1.0.”). Furthermore, since the function of the bias is known in Yam, such a combination of teachings would have been an obvious combination of known elements according to known methods to yield predictable results. 

As to claim 11, the combination of Lee and Yam teaches computer software which, when executed by a computer, causes the computer to implement the method of claim 1. [Lee, [0074]: “processing operations for generating an insertion layer and/or a replacement layer in a neural network may be included in data storage 820 as program instructions.” Lee, [0075]: “computer-executable instructions or data structures stored thereon.”]

As to claim 12, the combination of Lee and Yam teaches a non-transitory machine-readable medium which stores computer software according to claim 11. [Lee, [0075]: “such computer-readable storage media may include tangible or non-transitory computer-readable storage media.”]

As to claim 13, the combination of Lee and Yam teaches an artificial neural network (ANN) generated by the method of claim 1. [As set forth in the rejection of claim 1. See, e.g., Lee, abstract: “A method of updating a neural network may be provided.”]

As to claim 14, the combination of Lee and Yam teaches data processing apparatus comprising one or more processing elements to implement the ANN of claim 13. [Lee, 

2.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Yam, and further in view of Li et al. (US 10,885,900 B2) (“Li”).
As to claim 5, the combination of Lee and Yam teaches a method according to claim 4, but does not teach the further limitations of “in which, for each instance of input data the set of known input data, the corresponding known output data are output data of the base ANN for that instance of input data.”
Li, in an analogous art, teaches the above limitations. Li generally pertains to “speech recognition via teacher-student learning” (title), involving the use of a “neural network” (abstract) model. Therefore, Li is in the same field of endeavor as the claimed invention, namely machine learning.
In particular, Li teaches “in which, for each instance of input data the set of known input data, the corresponding known output data are output data of the base ANN for that instance of input data” [In general, Li teaches “A student model for a new domain is created based on the teacher model trained in an existing domain,” where the student model is analogous to the modified ANN and the “teacher model” is analogous to the base ANN. As the teacher model receives inputs that conform to the first domain, the student model is fed (in parallel) equivalent inputs that conform to the second domain. Col. 1, lines 39-42: “As the teacher model receives inputs that conform to the first domain, the student model is fed (in parallel) equivalent inputs that conform to the second domain.” Abstract, last sentence: “The outputs from the teacher model are compared with the outputs of the student model and the differences are used to 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Lee and Yam with the teachings of Li by modifying the combination of Lee and Yam such that for each instance of input data the set of known input data, the corresponding known output data are output data of the base ANN for that instance of input data. The motivation would have been to make the behavior of the derived model converge to that of the base model, as suggested by Li, col. 8, lines 1-4 (“to make the behavior of the student model 160 for the target domain converge to that of the teacher model 150”). 

3.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Yam, and further in view of Srivastava et al., “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research 15, June 2014, pp. 1929-1958 (“Srivastava”) (cited by applicant in the 2/20/2019 IDS).
As to claim 9, the combination of Lee and Yam teaches a method according to claim 1, but does not teach the further limitations of “comprising adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN.”
Srivastava, in an analogous art, teaches the above limitations. Srivastava teaches “dropout…to prevent neural networks from overfitting (title). Therefore, Srivastava is in the same field of endeavor as the claimed invention, namely machine learning. 
adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN” [§ 10 (p. 1951), paragraph 1: “This new form of dropout amounts to adding a Gaussian distributed random variable with zero mean and standard deviation equal to the activation of the unit. That is, each hidden activation hi is perturbed to hi + hir where r ∼ N (0, 1), or equivalently hirt where rt ∼ N (1, 1).” That is, the hir when applied to the activations in Yam, create a further weighting. The examiner notes that the instant claim limitation merely specifies a further “weighting” to the process of the least squares approximation, and does not require a more specific form of adding dropout noise than what is taught in this reference.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Lee and Yam with the teachings of Srivastava by modifying the method of Lee (as modified by Yam) to include the further operation of “adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN.” The motivation would have been to reduce overfitting (Srivastava, § 11: “Dropout is a technique for improving neural networks by reducing overfitting”). 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following references depict the state of the art.
US 2019/0012594 A1 (Fukuda) teaches a neural network expansion technique in which a layer is decomposed and replaced by other layers. 
US 2013/0138436 A1 (Yu) teaches adding a new hidden layer that replaces the current output layer along with a new output layer (see, e.g., paragraph 5).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/            Supervisory Patent Examiner, Art Unit 2124