Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
	This Office Action is in response to applicant’s amendment filed on May 18, 2022, under which claims 1-10, 12, and 14 are pending and under consideration.

Response to Arguments
	Applicant’s amendments have overcome the previous claim objections, the previous double patenting rejection, and the previous rejections under § 112(b), § 101, and § 103. Therefore, these objections and rejections have been withdrawn. Furthermore, as a result of applicant’s amendments, the previous claim interpretation under § 112(f) for claim 14 is no longer applicable. 
	However, the claims remain rejected for double patenting and obviousness under the new grounds of rejection set forth below. Applicant’s remarks directed to the previous rejections are generally moot under the new grounds of rejection.
	In regards to Yam (which remains cited in the new ground of rejection but is applied in a different manner than in the previous rejection), applicant observed that “Yam…teaches optimizing initial weights for each layer” (applicant’s remarks, page 7). However, Yam is not relied upon for the technique of modifying an ANN to include a new layer, but is instead relied upon for teaching the general technique of least squares weight initialization. This technique is applicable to any arbitrary portion of a neural network, since any sub-portion of a neural network can be weight-initialized and trained in the manner of a full neural network. In the new grounds of rejection, Ter-Sarkisov teaches the addition of a new layer, along with initialization of the weights of the new layer. Therefore, Yam, together with Ter-Sarkisov, renders obvious the technique of applying least squares weight initialization to the new layer for which weight initialization is performed. 

Claim Objections
Claims 4-5 are objected to because of the following informalities:  
In claim 4, “having set of known input data” should be “having a set of known input data”
In claim 4, “for an instances of known input data” should be “for  instances of known input data”
In claim 5, “for each instance of input data the set of known input data” should be “for each instance of input data in the set of known input data”
For purposes of examination, the phrases objected have been interpreted as having the meaning of the suggested revision. 
Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-2, 4, 6-7, 9-10, 12, and 14 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 9, 11-13, 15-16, 18, and 20 of copending Application No. 16/280,065 in view of Islam et al. “A New Constructive Algorithm for Architectural and Functional Adaptation of Artificial Neural Networks," in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1590-1605, Dec. 2009 (“Islam”).
Although the claims at issue are not identical, they are not patentably distinct from each other. Corresponding features are shown in the table below.
Claims of the current application
Copending application 16/280,065 (claims filed on 4/26/2022)
	1. A computer-implemented method of generating a modified artificial neural network (ANN) from a base ANN having an ordered series of two or more successive layers of neurons, each layer passing data signals to a next layer in the ordered series, the neurons of each layer processing the data signals received from a preceding layer according to an activation function and weights for that layer,
	the method comprising:
	detecting the data signals for a first position and a second position in the ordered series of layers of neurons in the base ANN;
	generating the modified ANN from the base ANN by providing an introduced layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN;
	deriving an initial approximation of at least a set of weights for the introduced layer using a least squares approximation from the data signals detected for the first position and the second position; and
	processing training data using the modified ANN to train the modified ANN .
	1. A computer-implemented method of generating a derived artificial neural network (ANN) from a base ANN, the method comprising: 
	initialising a set of parameters of the derived ANN in dependence upon parameters of the base ANN; 
	inferring a set of output data from a set of input data using the base ANN; 
	quantising the set of output data; and 	training the derived ANN using training data comprising the set of input data and the quantised set of output data, 
	wherein the derived ANN has a different network structure to the base ANN, the base ANN having an ordered series of two or more successive layers of neurons, the two or more successive layers or the ordered series being fully connected layers, each layer passing data signals to the next layer in the ordered series, the neurons of each layer processing the data signals received from the preceding layer according to an activation function and weights for that layer, wherein the method for processing the data signals received from the preceding layer according to an activation function and weights for that layer includes 	detecting the data signals for a first position and a second position in the ordered series of layers of neurons, 
	approximate an insertion layer using weight parameters and a bias term that approximate a sub-network;	generating the derived ANN from the base ANN by providing the insertion layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN, 	initialising at least a set of weights for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position; and 	in response to at least one of the two or more successive layers in the derived ANN being a convolutional layer, reformulate the convolutional layer as a fully connected layer.
	2. The method according to claim 1, in which the two or more successive layers are fully connected layers in which each neuron in a fully connected layer is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer.
	9. The method according to claim 1, wherein the two or more successive layers are fully connected layers in which each neuron in a fully connected layer is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer.
	4. The method according to claim 1, in which the training data comprises a set of data having set of known input data and corresponding output data, and in which processing includes varying at least the weighting of at least the introduced layer to so that, for an instances of known input data, the output data of the modified ANN is closer to corresponding known output data.
	11. The method of according to claim 1, in which the training step comprises varying at least the weighting of at least the insertion layer so that, for instances of known input data, an error function of the output data of the derived ANN is reduced where reducing the error function brings the output data of the derived ANN closer to matching the quantised set of output data.	See also claim 1 (“…training data comprising the set of input data and the quantised set of output data…	“)
	6. The method according to claim 1, in which generating includes providing the introduced layer to replace one or more layers of the base ANN.
	12. The method according to claim 1, in which the generating step comprises providing the insertion layer to replace one or more layers of the base ANN.
	7. The method according to claim 6, in which the introduced layer has a different layer size to that of the one or more layers it replaces.
	13. The method according to claim 12, in which the insertion layer has a different layer size to that of the one or more layers it replaces.
	9. The method according to claim 1, comprising adding a further weighting to the least squares approximation of the weights to simulate addition of dropout noise in the modified ANN.
	15. The method  according to claim 1, comprising adding a further weighting to the least squares approximation of the set of weights to simulate the addition of dropout noise in the ANN.
	10. The method according to claim 1, in which the neurons of each layer of the modifiedANN process the data signals received from the preceding layer according to a bias function for that layer, the method comprising deriving an initial approximation of at least a bias function for the introduced layer using a least squares approximation from the data signals detected for the first position and the second position.
	16. The method according to claim 1, in which the neurons of each layer of the base ANN process the data signals received from the preceding layer according to a bias function for that layer, the method comprising deriving an initial approximation of at least a bias function for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position
	12. A non-transitory machine-readable medium storing computer software which, when executed by a computer, causes the computer to implement the method of claim 1.
	18. A non-transitory computer-readable storage medium storing computer readable instructions thereon which, when executed by a computer, cause the computer to perform a method, the method comprising [operations  that are analogous to those of claim 1]
	14. A data processing apparatus comprising one or more processing circuits to implement the modified artificial neural network (ANN) generated by the method of claim 1.
	20. A data processing apparatus, comprising circuitry configured to [perform operations  that are analogous to those of claim 1]


	It is noted that in the copending application claims, the elements of “derived ANN” and “insertion layer” correspond to the elements of “modified ANN” and “introduced layer” since the derived ANN is a modified ANN. 
	The copending claims 1, 18, and 20 do not explicitly include the limitation that the processing of the training data includes “training the weights of the introduced layer from their initial approximation and at least one layer of the base ANN,” as recited in claim 1 of the instant application. 
However, these differences are obvious. Islam teaches, in the context of a neural network whose structure has been modified from a base ANN, “training the weights of the introduced layer from their initial approximation and at least one layer of the base ANN” [§ III.B, paragraphs 1-2: “The second important issue of constructive algorithms is the way that an ANN is trained after adding hidden neurons…we briefly summarize here two major schemes that are widely used in constructive approaches. One scheme is to train all weights of an ANN…, and the other scheme is to train only the weights that are associated with a newly added hidden neuron, keeping all other weights unchanged (fixed)… The former scheme is very simple and straightforward, because it trains all weights of the ANN after each addition step.” Note that training all the weights includes training at least one of the original layers prior to the introduced elements as well as the weights of the newly added neurons, which are analogous to an introduced layer, especially since new neurons in the method of Islam can be added in the form of a new layer (see FIG. 2).]
It would have been obvious to one of ordinary skill in the art to have modified the co-pending claims such that the processing training data includes “training the weights of the introduced layer from their initial approximation and at least one layer of the base ANN.” The motivation would have been to train the neural network using a widely used training method that is suitable for training a modified neural network and is also very simple and straightforward, as suggested by Islam (§ III.B, paragraphs 1-2, see parts quoted and underlined above).
	As noted above, a timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome a provisional rejection based on nonstatutory double patenting provided the reference application either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement.

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-2, 4, 8, 10, 12, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Ter-Sarkisov et al., “Incremental Adaptation Strategies for Neural Network Language Models,” arXiv:1412.6650v4 [cs.NE] 7 Jul 2015 (“Ter-Sarkisov”) in view of Yam et al., “A new method in determining initial weights of feedforward neural networks for training enhancement,” Neurocomputing 16 (1997) 23-32 (“Yam”) and Islam et al. “A New Constructive Algorithm for Architectural and Functional Adaptation of Artificial Neural Networks," in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1590-1605, Dec. 2009 (“Islam”).
As to claim 1, Ter-Sarkisov teaches a computer-implemented method of generating a modified artificial neural network (ANN) from a base ANN [Abstract: “We present efficient techniques to adapt a neural network language model to new data. …we propose…insertion of adaptation layers.” See 3rd page, FIG. 1, which shows an “inserted adaptation layer,” resulting in a modified ANN.] having an ordered series of two or more successive layers of neurons, [FIG. 1, which shows input, hidden, and output layers, i.e., successive layers of neurons, as described in § 4, paragraph 1 (“We will use three tanh hidden and a softmax output layer as depicted in Figure 1”) and § 5.1, paragraph 1 (“The projection layer of the CSLM was of dimension 320, followed by three tanh hidden layers of size 1024 and a softmax output layer of 32k neurons (short-list)”).] each layer passing data signals to a next layer in the ordered series, the neurons of each layer processing the data signals received from a preceding layer according to an activation function and weights for that layer, [Taught in FIG. 1, which shows that the layers are neural network layers that feed signals from one layer to the next layer. In the neural network as shown, inputs in the input layer are fed through the hidden layers, and eventually to the output layer. The limitation of “according to an activation function and weights for that layer” is taught by, e.g., § 4.1: “weights between two tanh layers in Figure 1,” where “Tanh” here refers to “the hyperbolic tangent activation function” (Table 4 activation function).]
the method comprising: 
detecting the data signals for a first position and a second position in the ordered series of layers of neurons in the base ANN; [First and second positions are shown in FIG. 1, which shows that the inserted adaptation layer is inserted between the second and third tanh layers. Thus, the output of the second tanh layer (or the input/output of any preceding layer) corresponds to a “first position,” and the input to the third tanh layer corresponds to a “second position.” The inputs/outputs at these positions are detected because the base neural network has been trained prior to the insertion of the adaptation layer, as described in § 4.1, paragraph 2, which refers to “new adaptation data and the original training data” and teaches that in the insertion case, “only the weights of this layer are updated,” indicating that the original weights were already trained. Furthermore, Table 4 teaches that the “original network architecture” is functional (i.e. is trained) without further adaptation. Since training is based on the input and output of neurons, the act of “detecting the data signals for a first position and a second position…” is taught by the reference.]
generating the modified ANN from the base ANN by providing an introduced layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN; [§ 4.1: “In the second method, adaptation layers are inserted in the neural network as outlined in red in Figure 1. This additional layer is initialized with the identity matrix and only the weights of this layer are updated.” As shown in FIG. 1, the inserted layer is between first and second positions with respect to the ordered series of layers of neurons of the base ANN as described above.]
deriving an initial approximation of at least a set of weights for the introduced layer […]; [§ 4.1: “In the second method, adaptation layers are inserted in the neural network as outlined in red in Figure 1. This additional layer is initialized with the identity matrix and only the weights of this layer are updated.”] and 
processing training data using the modified ANN to train the modified ANN including training the weights of the introduced layer from their initial approximation […]. [§ 4.1: “In the second method, adaptation layers are inserted in the neural network as outlined in red in Figure 1. This additional layer is initialized with the identity matrix and only the weights of this layer are updated.” 5th page, item 2b: “These additional layers are initialized with the identity matrix and only these layers are updated using backpropagation function.” Note that “backpropagation” here refers to the training the weights of the introduced layer by use of backpropagation.]
Ter-Sarkisov does not explicitly teach: 
(1)	the limitation that the initial approximation of the weights is derived “using a least squares approximation from the data signals detected for the first position and the second position”; and 
(2)	the limitation that the training includes “training…at least one layer of the base ANN.” 
Yam, in an analogous art, teaches limitation (1) listed above. Yam teaches “determining initial weights of feedforward neural networks for training enhancement” (title). Therefore, Yam is in the same field of endeavor as the claimed invention, namely machine learning.  
In particular, Yam teaches determining initial weights using a least squares approximation from the data signals detected for the first position and a second position; [Abstract: “The optimal initial weights are evaluated by using a least squares method at each layer.” The algorithm is described in § 2. Specifically, page 27, which teaches the least squares minimization problem represented by expression (13), i.e., “                        
                            m
                            i
                            n
                            i
                            m
                            i
                            z
                            e
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    A
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    W
                                                
                                                
                                                    l
                                                
                                            
                                            -
                                            
                                                
                                                    S
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                                
                                    2
                                
                            
                        
                    ” and teaches that “This linear least squares problem can be solved by QR factorisation using Householder reflections” (text below expression (12)). Note that the notation of                         
                            
                                
                                    
                                        
                                            x
                                        
                                    
                                
                                
                                    2
                                
                            
                        
                     denotes the L2-norm, which includes squared terms. Here, Wl is the weight of a layer (see § 2, paragraph 1), the inputs used to compute Al, including                         
                            
                                
                                    o
                                
                                
                                    i
                                    ,
                                    j
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                    as described in expression (5) correspond to the data signals of the “first position,” and the targets t (e.g.,                         
                            
                                
                                    t
                                
                                
                                    l
                                    +
                                    1
                                
                            
                        
                    ) that are incorporated into Sl  in accordance with expression (11) correspond to data signals of the “second position.” Note that since the method of Yam is applicable to each layer, a particular layer in the method of Yam corresponds to an “introduced layer.”] 
Yam is relied upon for teaching the general technique of least squares weight initialization. This technique is applicable to any arbitrary portion of a neural network, regardless of how many layers it is applied to in the specific examples in Yam, given that a sub-portion of a neural network can be weight-initialized and trained in the manner of a full neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Ter-Sarkisov with the teachings of Yam by modifying the operation of “deriving an initial approximation of at least a set of weights for the introduced layer” so as to be performed “using a least squares approximation from the data signals detected for the first position and a second position” as taught by Yam. The motivation would have been to obtain optimal initial weights and to perform further training with those weights as a starting point, such that the effect that the speed of training is increased, as suggested by Yam, § 1, last paragraph: “By starting with a set of optimal initial weights, the speed of training a neural network can be substantially increased.”
Islam, in an analogous art, teaches the remaining limitations. Islam relates to a “constructive algorithm for architectural and functional adaptation of artificial neural networks” (see title). As shown in FIG. 2, the algorithm may include the addition of a new neuron or a new hidden layer into the neural network. See also § II.A, paragraph 2: “The novelty of our criterion is that it can automatically add a hidden neuron to an existing hidden layer or a new hidden layer. Each hidden layer.” Therefore, Islam is in the same field of endeavor as the claimed invention, namely machine learning, and also pertains to the specific problem of modifying the structure of a neural network.
In particular, Islam teaches, in the context of a neural network whose structure has been modified from a base ANN, “training…at least one layer of the base ANN” [§ III.B, paragraphs 1-2: “The second important issue of constructive algorithms is the way that an ANN is trained after adding hidden neurons…we briefly summarize here two major schemes that are widely used in constructive approaches. One scheme is to train all weights of an ANN…, and the other scheme is to train only the weights that are associated with a newly added hidden neuron, keeping all other weights unchanged (fixed)… The former scheme is very simple and straightforward, because it trains all weights of the ANN after each addition step.” Note that training all the weights includes training at least one of the original layers prior to the introduced elements.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Ter-Sarkisov and Yam with the teachings of Islam by modifying the training of Ter-Sarkisov such that the training further includes “training…at least one layer of the base ANN.” The motivation would have been to implement a widely used training method, suitable for a modified neural network, that is very simple and straightforward, as suggested by Islam (§ III.B, paragraphs 1-2, see parts quoted and underlined above).

As to claim 2, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 1, as set forth above, but Ter-Sarkisov as modified thus-far does not explicitly teach the further limitations of the instant claim. 
Yam teaches the further limitations of “in which the two or more successive layers are fully connected layers in which each neuron in a fully connected layer 2Application No. 16/280,059is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer.” [§ 2, paragraph 1: “a multilayer neural network with L fully interconnected layers.” Note that a plurality of fully interconnected layers has the features of “each neuron in a fully connected layer 2Application No. 16/280,059is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer,” since such is the definition of fully connected layers. See also § 1, paragraph 1: “Multilayer feedforward neural networks are probably the most widely used neural networks.”]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated, into the thus-far combination of references, the above-described further teachings of Yam by modifying the neural network of Ter-Sarkisov to have the feature of “in which the two or more successive layers are fully connected layers in which each neuron in a fully connected layer 2Application No. 16/280,059is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer.” The motivation for doing so would have been to implement a common type of neural network structure (Yam, § 1, paragraph 1: “Multilayer feedforward neural networks are probably the most widely used neural networks.”) suitable for tasks such as character recognition (Yam, § 3, text below expression (16): “In the character recognition problem”). 

As to claim 4, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 1, but Ter-Sarkisov as modified thus far does not explicitly teach the further limitations of the instant claim. 
Yam teaches the further limitations of “in which the training data comprises a set of data having set of known input data and corresponding output data,” [Yam, § 2, paragraph 1: “For a training set with P patterns, all given inputs can be represented… Similarly, the targets can be represented by.” Note that the “targets” corresponding to the target output for training purposes, i.e., “known output data” recited in the below part of the claim. Regarding “output data,” the output of the layers is represented by Ol, as taught with respect to expression (1).] “and in which processing includes varying at least the weighting of at least the introduced layer to so that, for an instances of known input data, the output data of the modified ANN is closer to corresponding known output data.” [Yam, § 1, last paragraph: “To achieve an even smaller learning error, a training session can be continued with the conventional backpropagation algorithm (BP). By starting with a set of optimal initial weights, the speed of training a neural network can be substantially increased.” It is noted that the training of a neural network by backpropagation implies, to one of ordinary skill in the art, the act of “varying at least the weighting of at least the introduced layer to so that, for an instances of known input data, the output data of the modified ANN is closer to the corresponding known output data,” because the function of backpropagation is to optimize the weights to minimize loss (i.e., to be closer to the known output data, also referred to as the “target” as discussed above). This is especially the case since the context makes clear that the initialization is only to create “initial weights,” i.e., weights that are further adjusted during later training.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated, into the thus-far combination of references, the above-described further teachings of Yam by modifying the neural network of Ter-Sarkisov to have the feature of “in which the training data comprises a set of data having set of known input data and corresponding output data and in which processing includes varying at least the weighting of at least the introduced layer to so that, for an instances of known input data, the output data of the modified ANN is closer to corresponding known output data.” The motivation would have been to perform training of the neural network based on a set of targets, as suggested by Yam (§ 1, paragraph 1: “Multilayer feedforward neural networks are probably the most widely used neural networks. These kinds of networks are usually trained by the Backpropagation (BP) algorithm.” Yam, § 2, paragraph 1: “For a training set with P patterns, all given inputs can be represented… Similarly, the targets can be represented by.”)

As to claim 8, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 1, in which the first position and the second position are the same, and generating includes providing the introduced layer in addition to the layers of the base ANN. [As shown in FIG. 1 of Ter-Sarkisov, the inserted layer is between two hidden layers. Here, the position between layers the two hidden layers are considered to be two positions that are the same.]

As to claim 10, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 1, as set forth in the rejection above, but Ter-Sarkisov as modified thus far does not teach the further limitations of the instant claim.
Yam teaches the further limitations of “in which the neurons of each layer of the ANN process the data signals received from the preceding layer according to a bias function for that layer,” [Yam, § 2, paragraph 1: “The layer 1 consists of nl + 1 neurons (1 = 1, . . . , L - 1) in which the last neuron is a bias node with a constant output of 1.0.”] “the method comprising deriving an initial approximation of at least a bias function for the introduced layer using a least squares approximation from the data signals detected for the first position and a second position.” [As noted in the rejection of claim 1, Yam teaches estimating the weights of a layer. Since the layer includes a neuron that serves as a bias, the estimating of the initial weights of the bias neuron satisfies the instant limitation.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated, into the thus-far combination references, the above further teachings of Yam by modifying the method of Ter-Sarkisov (as modified thus far) to include the further feature that “the neurons of each layer of the ANN process the data signals received from the preceding layer according to a bias function for that layer” and the further operation of “deriving an initial approximation of at least a bias function for the introduced layer using a least squares approximation from the data signals detected for the first position and a second position.” The motivation for doing so would have been to incorporate a bias variable into the neural network such that the neural network has the benefit of a node with a constant output (Yam, § 1, paragraph 1: “the last neuron is a bias node with a constant output of 1.0.”). Furthermore, since the function of the bias is known in Yam, such a combination of teachings would have been an obvious combination of known elements according to known methods to yield predictable results. 

As to claim 12, the combination of Ter-Sarkisov, Yam, and Islam teaches a non-transitory machine-readable medium storing computer software which, when executed by a computer, causes the computer to implement the method of claim 1. [Ter-Sarkisov, § 6, paragraph 2: “very fast training of the neural network language model: a couple of minutes on a standard GPU.” Since the neural network is implemented using a GPU, Ter-Sarkisov implicitly teaches a computer that includes a non-transitory machine-readable medium storing software for implementing its method.]

As to claim 14, the combination of Ter-Sarkisov, Yam, and Islam teaches a data processing apparatus comprising one or more processing circuits to implement the modified artificial neural network (ANN) generated by the method of claim 1. [Ter-Sarkisov, § 6, paragraph 2: “very fast training of the neural network language model: a couple of minutes on a standard GPU.” Since the neural network is implemented using a GPU, Ter-Sarkisov implicitly teaches a computer that includes one or more processing circuits to implement the modified artificial neural network (ANN). Moreover, Table 4 (3rd page) teaches measured results for “test corpora.” Therefore, the neural network that is created is implemented to test the effectiveness of the neural network.]

2.	Claims 3 and 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Ter-Sarkisov in view of Yam and Islam, and further in view of Lee (US 2018/0204115 A1).
As to claim 3, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 1, but does not teach the further limitations of “in which at least one of the two or more successive layers is a convolutional layer, the method comprising deriving a fully connected layer from the convolutional layer.”  
Lee, in an analogous art, teaches the above limitations. Lee teaches a method that, as described in [0004], performs “at least one of inserting the layer between the first layer and the second layer of the neural network and replacing one of the first layer and the second layer with the layer to reduce a number of connections in the neural network.” Therefore, Lee is in the same field of endeavor as the claimed invention, namely machine learning, and also pertains to the problem of modifying the structure of a neural network.
In particular, Lee teaches “in which at least one of the two or more successive layers is a convolutional layer, the method comprising deriving a fully connected layer from the convolutional layer” [Lee, [0021]: “layer may be inserted between two adjacent layers of a neural network. For example, a layer (also referred to herein as an “insertion layer”) may be inserted between one layer (e.g., the last convolutional layer) and another layer (e.g., the first FC (fully-connected) layer) in a conventional convolutional neural network (CNN).” With respect to the limitation of “deriving,” this limitation is taught because the fully-connected layer follows the convolutional layer, and is thus trained based on the convolutional layer (note that training is described in [0018]: “The weight tuning…”)].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Ter-Sarkisov, Yam, and Islam with the teachings of Lee by modifying the neural network of Ter-Sarkisov such that at least one of the two or more successive layers is a convolutional layer, the method comprising deriving a fully connected layer from the convolutional layer. The motivation would have been to use a conventional type of neural network, namely a convolutional neural network (CNN) that includes a convolutional layer and a fully-connected layer, as suggested by Lee ([0021]: “in a conventional convolutional neural network (CNN)”). Moreover, since this modification would have yielded no more than predictable result of utilizing a known type of neural network, the instant modification would have been obvious as a combination prior art elements according to known methods to yield predictable results.

As to claim 6, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 1, but does not teach the further limitation of “in which generating includes providing the introduced layer to replace one or more layers of the base ANN.”  
Lee, in an analogous art, teaches the above limitations. Lee teaches a method that, as described in [0004], performs “at least one of inserting the layer between the first layer and the second layer of the neural network and replacing one of the first layer and the second layer with the layer to reduce a number of connections in the neural network.” Therefore, Lee is in the same field of endeavor as the claimed invention, namely machine learning, and also pertains to the problem of modifying the structure of a neural network.
In particular, Lee teaches “in which generating includes providing the introduced layer to replace one or more layers of the base ANN” [Lee, [0023]: “a layer of a neural network may be replaced by another layer (a “replacement layer”). For example, a layer (e.g., the first FC layer) in a neural network (e.g., a CNN) may be replaced by a replacement layer, wherein the replacement layer has a number of neurons less than a number of neurons of the layer being replaced.” [0034]: “In one example, layer 258 includes X neurons, wherein X<N. Thus, for example, layer 104 of neural network 100 (see FIG. 1) has been replaced by layer 258, wherein a number of neurons in layer 258 is less than a number of neurons in layer 104.” In the context of a replacement layer, the “first position” and the “second position” may be regarded as the input and output of the layer that is being replaced.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Ter-Sarkisov, Yam, and Islam with the teachings of Lee by modifying Ter-Sarkisov (as modified thus far), to include the layer replacement technique of Lee, so as to arrive at the feature of “in which generating includes providing the introduced layer to replace one or more layers of the base ANN.” The motivation would have been to reduce memory requirements and/or improve efficiency of a neural network, as suggested by Lee ([0016]: “a number of neurons in the replacement layer may be selected such that the number of connections in the neural network is reduced upon insertion of the insertion layer and/or the replacement layer. Accordingly, various embodiments disclosed herein may reduce memory requirements and/or improve efficiency of a neural network, while maintaining an acceptable level of performance (e.g., degree of accuracy) of the neural network.). 

As to claim 7, the combination of Ter-Sarkisov, Yam, Islam, and Lee teaches the method according to claim 6, in which the introduced layer has a different layer size to that of the one or more layers it replaces. [Lee, [0034]: “layer 104 of neural network 100 (see FIG. 1) has been replaced by layer 258, wherein a number of neurons in layer 258 is less than a number of neurons in layer 104.” Note that “number of neurons” here corresponds to the size of the layer.]

3.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Ter-Sarkisov in view of Yam and Islam, and further in view of Li et al. (US 10,885,900 B2) (“Li”).
As to claim 5, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 4, as set forth above, but does not teach the further limitation of “in which, for each instance of input data the set of known input data, the corresponding known output data are output data of the base ANN for that instance of input data.”  
Li, in an analogous art, teaches the above limitations. Li generally pertains to “speech recognition via teacher-student learning” (title), involving the use of a “neural network” (abstract) model. Therefore, Li is in the same field of endeavor as the claimed invention, namely machine learning.
In particular, Li teaches “in which, for each instance of input data the set of known input data, the corresponding known output data are output data of the base ANN for that instance of input data” [In general, Li teaches “A student model for a new domain is created based on the teacher model trained in an existing domain,” where the student model is analogous to the modified ANN and the “teacher model” is analogous to the base ANN. As the teacher model receives inputs that conform to the first domain, the student model is fed (in parallel) equivalent inputs that conform to the second domain. Col. 1, lines 39-42: “As the teacher model receives inputs that conform to the first domain, the student model is fed (in parallel) equivalent inputs that conform to the second domain.” Abstract, last sentence: “The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.” That is, the outputs from the teacher model correspond to “output data of the base ANN for that instance of input data.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Ter-Sarkisov, Yam, and Islam with the teachings of Li by further modifying Ter-Sarkisov, as already modified in the combination of references thus far, such that for each instance of input data the set of known input data, the corresponding known output data are output data of the base ANN for that instance of input data. The motivation would have been to make the behavior of the derived model converge to that of the base model, as suggested by Li, col. 8, lines 1-4 (“to make the behavior of the student model 160 for the target domain converge to that of the teacher model 150”). 

4.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Ter-Sarkisov in view of Yam and Islam, and further in view of Srivastava et al., “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research 15, June 2014, pp. 1929-1958 (“Srivastava”) (cited by applicant in the 2/20/2019 IDS).
As to claim 9, the combination of Ter-Sarkisov, Yam, and Islam teaches the method according to claim 1, but does not teach the further limitations of “comprising adding a further weighting to the least squares approximation of the weights to simulate addition of dropout noise in the modified ANN.”  
Srivastava, in an analogous art, teaches the above limitations. Srivastava teaches “dropout…to prevent neural networks from overfitting (title). Therefore, Srivastava is in the same field of endeavor as the claimed invention, namely machine learning. 
In particular, Srivastava teaches “adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN” [§ 10 (p. 1951), paragraph 1: “This new form of dropout amounts to adding a Gaussian distributed random variable with zero mean and standard deviation equal to the activation of the unit. That is, each hidden activation hi is perturbed to hi + hir where r ∼ N (0, 1), or equivalently hirt where rt ∼ N (1, 1).” That is, the hir when applied to the activations in Yam, create a further weighting. The examiner notes that the instant claim limitation merely specifies a further “weighting” to the process of the least squares approximation, and does not require a more specific form of adding dropout noise than what is taught in this reference.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Ter-Sarkisov, Yam, and Islam with the teachings of Srivastava by modifying the method of Ter-Sarkisov (as modified thus far) to include the further operation of “adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN.” The motivation would have been to reduce overfitting (Srivastava, § 11: “Dropout is a technique for improving neural networks by reducing overfitting”). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following document depicts the state of the art.
Sharma et al., “Constructive Neural Networks: A Review,” International Journal of Engineering Science and Technology Vol. 2(12), 2010, 7847-7855 reviews conventional techniques in constructive neural networks, including adding/replacing layers (§ 2) and training the whole neural network after modification to its structure (§ 4). 

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/            Supervisory Patent Examiner, Art Unit 2124