Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on June 15, 2022, in which claims 1, 18, and 20 are currently amended. Claims 7 and 14 are canceled. Claims 1-5, 7, 9, 11-13, 15-16, 18, and 20 are currently pending. 

Response to Arguments
The rejections to claims 1-5, 7, 9, 11-13, 15-16, 18, and 20 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-5, 7, 9, 11-13, 15-16, 18, and 20 under 35 U.S.C. 103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

	Claims 1- 5, 9, 15, 18, and 20 are rejected under U.S.C. §103 as being unpatentable over the combination of El-Yaniv (US 2017/0286830 A1), and Fukuda (US 2019/0012594 A1) and in further view of hav4ik (“Removing then Inserting a New Middle Layer in a Keras Model”, 2017). 

	 Regarding claim 1, EL-YANIV teaches A computer-implemented method of generating a derived artificial neural network (ANN) from a base ANN, the method comprising:([¶0040] "The present invention may be a system, a method, and/or a computer program product.")
	initialising a set of parameters of the derived ANN in dependence upon parameters of the base ANN;([¶0003] "When training a neural network, training data is put into the first layer of the network, and he network parameters are changed to as to fit the task at hand, for example how correct or incorrect it is, based on the task being performed." Changing a neural network parameters is interpreted as synonymous with initializing parameters of a second neural network dependent on the first.  This is the foundation of the mutation step in evolutionary neural networks.)
	inferring a set of output data from a set of input data using the base ANN;([¶0033] "inferring conclusions regarding new data by using a trained quantized neural network having quantized weight values, optionally binary, for each connection and a quantized activation functions associated with each neuron. During the training, quantized values of both the connections and the activation are used for example for inference.")
	quantising the set of output data; and training the derived ANN using training data comprising the set of input data and the quantised set of output data.(See FIG. 1 [¶ 0005] "The method comprises constructing a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set, receiving a training set dataset, using the training set dataset to train the neural network model according to respective the quantized connection weight values")
	wherein the derived ANN has a different network structure to the base ANN([¶0003] "When training a neural network, training data is put into the first layer of the network, and the network parameters are changed to as to fit the task at hand, for example how correct or incorrect it is, based on the task being performed." Derived neural network is interpreted as synonymous with changed neural network.)
	 the base ANN having an ordered series of two or more successive layers of neurons, ([¶0057] "The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function such as a binary connection weight function." [¶0057] "Optionally, a quantized function is a binary activation function which is implemented as a deterministic function.")
	the two or more successive layers or the ordered series being fully connected layers, each layer passing data signals to the next layer in the ordered series([¶0057] "The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function such as a binary connection weight function." [¶0050] "The neural network may be any DNN, including any feed-forward artificial neural network such as a convolutional neural network (CNN), fully connected neural network (FNN) and/or recurrent neural network (RNN).")
	the neurons of each layer processing the data signals received from the preceding layer according to an activation function and weights for that layer([¶0019] "The system comprises a storage comprising a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set" quantized activation value selected from a first finite set is interpreted as first position.  Quantized activation value selected from a second finite set is interpreted as second position.  Both positions are in the ordered series of layers as described in ¶0057)
	wherein the method for processing the data signals received from the preceding layer according to an activation function and weights for that layer includes detecting the data signals for a first position and a second position in the ordered series of layers of neurons([¶0019] "The system comprises a storage comprising a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set" quantized activation value selected from a first finite set is interpreted as first position.  Quantized activation value selected from a second finite set is interpreted as second position.  Both positions are in the ordered series of layers as described in ¶0057)
	initialising at least a set of weights for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position([¶0067] "A normalization function, referred to herein as BatchNorm(), batch-normalizes floating point activation values of neurons, by a batch normalization (BN)." [¶0069] "Optionally a shift-based batch normalization (SBN) technique is used for approximating the BN" Adamax and Adam learning rules are least-squares methods for batch-normalization in ¶0070 which is evident by the equation on ¶0070).
	However, EL-YANIV does not explicitly teach approximating an insertion layer using weight parameters and a bias term that approximate a sub-network
	wherein the approximated sub- network is a starting point for subsequent training of the derived ANN
	generating the derived ANN from the base ANN by providing an insertion layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN
	wherein the insertion layer replaces at least two successive layers of the ordered series of layers of neurons of the base ANN and the insertion layer is a differently sized layer compared to the other layers in the base ANN;.

	Fukuda, in the same field of endeavor, teaches approximating an insertion layer using weight parameters and a bias term that approximate a sub-network([¶0047] "The layer replacement module 136 is configured to increase the number of the hidden layers in the neural network 160 based at least in part on the plurality of the obtained new parameter sets... the layer replacement module 136 may set the first and second new parameter sets to upper and lower layers of the two new hidden layers, respectively, as initial conditions for subsequent pre-training, in a manner such that the two new hidden layers becomes equivalent to (at least approximates) the original pre-trained hidden layer" [¶0050] "In the particular embodiment where the weight matrix is used as the parameter set to be decomposed, new bias vectors can be calculated from the original bias vector B to set to the upper and lower layers. In a particular embodiment, a new bias vector B1 for the upper layer may be preferably set to be identical to the original bias vector B of the one pre-trained hidden layer 160 a and a new bias vector B2 for the lower layer may be set to be zero (B1=B; B2=0; TYPE 1)." Increasing the number of hidden layers interpreted as synonymous with inserting layers.  Fukuda explicitly teaches that the weight parameters and bias term are used to approximate a sub-network (upper or lower).)
	wherein the approximated sub- network is a starting point for subsequent training of the derived ANN([¶0047] "the layer replacement module 136 may set the first and second new parameter sets to upper and lower layers of the two new hidden layers, respectively, as initial conditions for subsequent pre-training")
	generating the derived ANN from the base ANN by providing an insertion layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN([¶0003] "In conventional pre-training processes, a new layer initialized with random parameters is inserted to the top of the hidden layers just below the output layer. Then, the neural network is pre-trained using the training data." derived ANN interpreted as resultant ANN after inserting new layer.).

	El-Yaniv and Fukuda are both directed towards generating artificial neural networks. Therefore, El-Yaniv and Fukuda are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of El-Yaniv with the teachings of Fukuda by decomposing a layer into multiple layers approximating the original.  Fukuda provides as an exemplary motivation for combination ([¶0064] “By inserting the new layers into the fixed position instead of inserting on a top hidden layer just below the output layer as done in standard discriminative pre-training, good quality error signals are expected to be back propagated in back propagation procedure of the discriminative pre-training.”).  This motivation also applies to the remaining claims depending on the combination.
	However, the combination of EL-YANIV, and Fukuda does not explicitly teach wherein the insertion layer replaces at least two successive layers of the ordered series of layers of neurons of the base ANN and the insertion layer is a differently sized layer compared to the other layers in the base ANN;.

	Hav4ik, in the same field of endeavor, teaches wherein the insertion layer replaces at least two successive layers of the ordered series of layers of neurons of the base ANN and the insertion layer is a differently sized layer compared to the other layers in the base ANN ([p. 2] "I'd like to take the two Conv layers in Block 1 and replace them with just one Conv layer, after loading the original weights into all of the other layers" See solution on p. 2 including ("new_conv = Conv2D(filters=64,kernel_size=(5,5),name='new_conv',padding='same')(layers[0].output)") which shows the insertion layer being a differently sized layer compared to the other layers and the layers it replaced.).

	The combination of EL-YANIV, and Fukuda as well as hav4ik are directed towards generating artificial neural networks.  Therefore, the combination of EL-YANIV, and Fukuda as well as hav4ik are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of EL-YANIV and Fukuda with the teachings of hav4ik by replacing two consecutive layers with a third layer of different dimensions.  It would have been obvious to one of ordinary skill in the art that reducing the number of layers could improve model performance, and hav4iks’ response to RACKGNOME on StackOverflow showed that there is not only a desire in the CNN community to perform this method, but a known solution.  This motivation for combination also applies to the remaining claims which depend on this combination. 

	 Regarding claim 2, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method according to claim 1, in which: the set of output data comprises one or more output data vectors each having a plurality of data values; and(EL-YANIV [¶0048] "Quantized activation functions and quantized weight functions are functions having a finite set of outputs. " [¶0049] "Optionally, the activation functions and quantized weight functions having an output selected from a group of 4, 8, 16, 32, 64, 128, 256, 512 and 1024 possible outputs which are represented in bits, optionally 2, 3, 5, 6, 7, 8, 9, and 10 bits." In the case of binarization which is expected from the neural network quantization, a 8 bit string is interpreted as a 8 element vector.)
	the quantising step comprises replacing each data value other than a data value having a highest value amongst the plurality of data values, by a first predetermined value.(EL-YANIV [¶Summary] "each the neuron gradient is of an output of a respective the quantized activation function in one layer of the plurality of layers with respect to an input of the respective quantized activation function and is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, the respective neuron gradient is set as a positive constant value and when the absolute value of the input is smaller than the positive constant threshold value the neuron gradient is set to zero...when an absolute value of said input is smaller than a positive constant threshold value, said respective neuron gradient is set as a positive constant value and when the absolute value of said input is larger than said positive constant threshold value said neuron gradient is set to zero" Data value having a highest value interpreted as synonymous with positive constant threshold value).
	
	 Regarding claim 3, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method according to claim 2, in which the first predetermined value is zero.(EL-YANIV [¶Summary] "each the neuron gradient is of an output of a respective the quantized activation function in one layer of the plurality of layers with respect to an input of the respective quantized activation function and is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, the respective neuron gradient is set as a positive constant value and when the absolute value of the input is smaller than the positive constant threshold value the neuron gradient is set to zero...when an absolute value of said input is smaller than a positive constant threshold value, said respective neuron gradient is set as a positive constant value and when the absolute value of said input is larger than said positive constant threshold value said neuron gradient is set to zero").
	
	 Regarding claim 4, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method according to claim 2, in which the quantising step comprises replacing a data value having a highest value amongst the plurality of data values, by a second predetermined value (EL-YANIV Each of the neuron gradients is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, for instance 1, the respective neuron gradient is set as a positive constant output value Examples of 1 and 0 are both given as potential predetermined values for quantization.).
	
	 Regarding claim 5, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method according to claim 4, in which the second predetermined value is 1.(EL-YANIV Each of the neuron gradients is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, for instance 1, the respective neuron gradient is set as a positive constant output value Examples of 1 and 0 are both given as potential predetermined values for quantization.).
	
	 Regarding claim 9, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method according to claim 1, in which the two or more successive layers are fully connected layers in which each neuron in a fully connected layer is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer.(EL-YANIV [¶0057] "The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function such as a binary connection weight function." [¶0050] "The neural network may be any DNN, including any feed-forward artificial neural network such as a convolutional neural network (CNN), fully connected neural network (FNN) and/or recurrent neural network (RNN).").
	
	 Regarding claim 15, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method of claim 1, comprising adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN.(EL-YANIV [¶0064] "stochastic gradient descent (SGD). The SGD requires exploring a space of parameters in small and noisy process steps where noise is averaged out by stochastic gradient contributions accumulated in each connection weight.").

Regarding claims 18 and 20, claims 18 and 20 are directed towards an apparatus implementing the method of claim 1.  Therefore, the rejection applied to claim 1 also applies to claims 18 and 20.  Claim 18 further teaches A non-transitory machine-readable medium storing computer readable instructions. (El-Yaniv [¶0040] "The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention."). 

	Claims 12, 13, and 16 are rejected under U.S.C. §103 as being unpatentable over the combination of EL-YANIV and Fukuda and hav4ik and Nakahara (“A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA”,2017).
	 Regarding claim 12, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method of claim 1.
	However, the combination of EL-YANIV, Fukuda, and hav4ik doesn't explicitly teach, the generating step comprises providing the insertion layer to replace one or more layers of the base ANN.

	Nakahara, in the same field of endeavor, teaches the generating step comprises providing the insertion layer to replace one or more layers of the base ANN.([p.1 col. 2] "we introduce the multiply accumulation (MAC) operation on the binarized CNN is almost the same as the binarized average pooling operation by a
trick of the training algorithm. Thus, the internal FC layers are replaced into an average pooling layer" Pooling layer interpreted as insertion layer).

	El-Yaniv, Fukuda, hav4ik, and Nakahara are both directed towards generating artificial neural networks, with the combination of El-Yaniv, Fukuda, and hav4ik teaching inserting layers into an artificial neural network model analogous to Nakahara.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of the combination of El-Yaniv, Fukuda, and hav4ik with the teachings of Nakahara by inserting a hidden layer between set points in a series of neural network layers and reformulating a convolutional layer as a fully connected layer. Nakahara teaches as motivation “Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better.”.  This motivation also applies to the remaining claims depending on this combination.  

	 Regarding claim 13, the combination of EL-YANIV, Fukuda, hav4ik, and Nakahara teaches A method according to claim 12, in which the insertion layer has a different layer size to that of the one or more layers it replaces.(Nakahara [p. 2 Col. 1] "In the CNN, almost parameters are focused on the FC layers. To remove them, we replace the internal FC layers into an average pooling one." See eqn. 2 for size calculation of pooling layer).
	
	 Regarding claim 16, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method of claim 1.
	However, the combination of EL-YANIV, Fukuda, and hav4ik doesn't explicitly teach the neurons of each layer of the base ANN process the data signals received from the preceding layer according to a bias function for that layer, the method comprising deriving an initial approximation of at least a bias function for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position.

	Nakahara, in the same field of endeavor, teaches the neurons of each layer of the base ANN process the data signals received from the preceding layer according to a bias function for that layer, the method comprising deriving an initial approximation of at least a bias function for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position (See FIG. 1 and Eqn. 1 "X denotes an input, W denotes a weight, Y denotes a bias, U denotes an internal output, f denotes an activation function, and Z denotes an output value to be mapped to (x, y) at the output feature map i + 1").

	El-Yaniv, Fukuda, hav4ik, and Nakahara are both directed towards generating artificial neural networks, with the combination of El-Yaniv, Fukuda, and hav4ik teaching inserting layers into an artificial neural network model analogous to Nakahara.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of the combination of El-Yaniv, Fukuda, and hav4ik with the teachings of Nakahara by inserting a hidden layer between set points in a series of neural network layers and reformulating a convolutional layer as a fully connected layer. Nakahara teaches as motivation “Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better.”.  This motivation also applies to the remaining claims depending on this combination.  

	Claim 11 is rejected under U.S.C. §103 as being unpatentable over the combination of EL-YANIV and Fukuda and hav4ik and Choi (“TOWARDS THE LIMIT OF NETWORK QUANTIZATION”, 2017).

	 Regarding claim 11, the combination of EL-YANIV, Fukuda, and hav4ik teaches A method of claim 1.
	However, the combination of EL-YANIV, Fukuda, and hav4ik doesn't explicitly teach the training step comprises varying at least the weighting of at least the insertion layer to so that, for an instances of known input data, the output data of the derived ANN is closer to the quantised set of output data..

	Choi, in the same field of endeavor, teaches the training step comprises varying at least the weighting of at least the insertion layer to so that, for an instances of known input data, the output data of the derived ANN is closer to the quantised set of output data. (The method of claim 1, in which the training step comprises varying at least the weighting of at least the insertion layer so that, for an instances of known input data, an error function of the output data of the derived ANN is reduced where reducing the error function brings the output data of the derived ANN closer to matching the quantised set of output data [p. 3 §3.1] "we propose Hessian-weighted k-means clustering for network quantization to minimize the performance loss due to quantization in neural networks. We consider a general non-linear neural network that yields output y = f(x; w) from input x, where w = [w1 · · · wN ]T is the vector consisting of all trainable network parameters in the network; N is the total number of trainable parameters in the network. A loss function loss(y, yˆ) is defined as the objective function that we aim to minimize in average, where yˆ = yˆ(x) is the expected (groundtruth) output for input x. Cross entropy or m").

	El-Yaniv, Fukuda, hav4ik, and Choi are all directed towards generating derived artificial neural networks.  Therefore, El-Yaniv, Fukuda, hav4ik, and Choi are analogous art in the same field of endeavor.  It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of the combination of El-Yaniv, Fukuda, and hav4ik with the teachings of Choi by minimizing an error function relative to the quantized weights.  It would be obvious to one of ordinary skill in the art that minimizing error improves accuracy and is one of the fundamental aspects of machine learning.  Choi provides as a motivation for combination ([p. 13 §A.2] "We compare uniform quantization with non-weighted mean and uniform quantization with Hessian-weighted mean in Figure 3, which shows that uniform quantization with Hessian-weighted mean slightly outperforms uniform quantization with non-weighted mean.").
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124