Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on November 8, 2021, in which claims 1-20 are amended. Claims 1-20 are currently pending.

Specification
Applicant's amendments made to the specification are acknowledged. Examiner’s objection to the specification are hereby withdrawn, as necessitated by Applicant’s amendments made to the specification.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-7 and 16-20 under U.S.C. 101 based on amendment have been considered and are persuasive. 
Applicant’s arguments with respect to rejection of claim 15 under U.S.C. 112(b) has been considered and is persuasive.  However, Applicant’s arguments with respect to rejection of claim 11 under U.S.C. 112(b) has been considered and are not deemed persuasive.
Applicant’s arguments with respect to rejection of claims 1-20 under U.S.C. 102/103 have been considered but are not deemed persuasive.  

layer to the output of the one dimensional neurons generated by the average pooling layer. This is equivalent to eliminate the internal FC layers and replace into a binarized average pooling one.”). Eliminating and replacing existing convolutional layers with fully connected layers is interpreted as synonymous with reformulating said convolutional layers into fully connected layers. Furthermore, doing for each of the last convolutional layers is interpreted as synonymous with in response to at least one of the two or more successive layers being a convolutional layer.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 6 and 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 6, claim 6 cites “in which: the derived ANN has the same network structure as the base ANN”, however claim 6 inherits claim 1 which cites “wherein the derived ANN has a different network structure to the base ANN”.  These limitations are contradictory and therefore indefinite.  In the interest of further examination, claim 6 is interpreted as being a different network structure despite the limitations of claim 1.  

Regarding claim 11, claim 11 recites “closer to the quantized data set” without any further support for the intended meaning.  “closer to the quantized data set” is indefinite as the metric of comparison is unknown and in the case of vector magnitudes, the difference between the outputs could theoretically be infinitesimally small.  For the sake of further examination any change in the base neural network with the intent of quantization is being interpreted as closer to the quantized data set.  
Claim 11 “The weighting” lacks antecedent basis.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7, 9, 11-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over EL-YANIV (US 2017/0286830 A1) and in view of Nakahara (“A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA”,2017). 

Regarding claim 1, EL-YANIV teaches A computer-implemented method of generating a derived artificial neural network (ANN) from a base ANN, the method comprising: ([¶0040] "The present invention may be a system, a method, and/or a computer program product.").
initialising a set of parameters of the derived ANN in dependence upon parameters of the base ANN; ([¶0003] "When training a neural network, training data is put into the first layer of the network, and he network parameters are changed to as to fit the task at hand, for example how correct or incorrect it is, based on the task being performed." Changing a neural network parameters is interpreted as synonymous with initializing parameters of a second neural network dependent on the first.  This is the foundation of the mutation step in evolutionary neural networks.).
inferring a set of output data from a set of input data using the base ANN; ([¶0033] "inferring conclusions regarding new data by using a trained quantized neural network having quantized weight values, optionally binary, for each connection and a quantized activation functions associated with each neuron. During the training, quantized values of both the connections and the activation are used for example for inference.").
quantising the set of output data; and training the derived ANN using training data comprising the set of input data and the quantised set of output data. (See FIG. 1 [¶ 0005] "The method comprises constructing a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set, receiving a training set dataset, using the training set dataset to train the neural network model according to respective the quantized connection weight values").
wherein the derived ANN has a different network structure to the base ANN ([¶0003] "When training a neural network, training data is put into the first layer of the network, and he network parameters are changed to as to fit the task at hand, for example how correct or incorrect it is, based on the task being performed." Derived neural network is interpreted as synonymous with changed neural network.).
the base ANN having an ordered series of two or more successive layers of neurons, ([¶0057] "The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function such as a binary connection weight function." [¶0057] "Optionally, a quantized function is a binary activation function which is implemented as a deterministic function.").
the two or more successive layers or the ordered series being fully connected layers, each layer passing data signals to the next layer in the ordered series ([¶0057] "The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function such as a binary connection weight function." [¶0050] "The neural network may be any DNN, including any feed-forward artificial neural network such as a convolutional neural network (CNN), fully connected neural network (FNN) and/or recurrent neural network (RNN).").
the neurons of each layer processing the data signals received from the preceding layer according to an activation function and weights for that layer ([¶0019] "The system comprises a storage comprising a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set" quantized activation value selected from a first finite set is interpreted as first position.  Quantized activation value selected from a second finite set is interpreted as second position.  Both positions are in the ordered series of layers as described in ¶0057).
wherein the method for processing the data signals received from the preceding layer according to an activation function and weights for that layer includes detecting the data signals for a first position and a second position in the ordered series of layers of neurons ([¶0019] "The system comprises a storage comprising a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set" quantized activation value selected from a first finite set is interpreted as first position.  Quantized activation value selected from a second finite set is interpreted as second position.  Both positions are in the ordered series of layers as described in ¶0057).
initialising at least a set of weights for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position ([¶0067] "A normalization function, referred to herein as BatchNorm( ), batch-normalizes floating point activation values of neurons, by a batch normalization (BN)." [¶0069] "Optionally a shift-based batch normalization (SBN) technique is used for approximating the BN" Adamax and Adam learning rules are least-squares methods for batch-normalization in ¶0070 which is evident by the equation on ¶0070). However, EL-YANIV does not explicitly teach generating the derived ANN from the base ANN by providing an insertion layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN 
  

Nakahara teaches generating the derived ANN from the base ANN by providing an insertion layer of neurons to provide processing between the first position and the second position with respect to the ordered series of layers of neurons of the base ANN ([Abstract] "In the paper, we eliminate internal FC layers excluding the last one, then, insert a binarized average pooling layer" See Fig. 1 in Nakahara.).
and in response to at least one of the two or more successive layers in the derived ANN being a convolutional layer, reformulate the convolutional layer as a fully connected layer. ([p.2 Col. 2] "For each L × L feature map of the last layer of the convolutional ones that extracts image feature, binarized averaging pooling with a L × L kernel is performed. We attach a FC layer to the output of the one dimensional neurons generated by the average pooling layer. This is equivalent to eliminate the internal FC layers and replace into a binarized average pooling one" Doing for each of the last convolutional layers interpreted as synonymous with in response to at least one of the two or more successive layers being a convolutional layer.). 

EL-YANIV and Nakahara are both directed towards generating artificial neural networks.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of EL-YANIV and Nakahara by inserting a hidden layer between set points in a series of neural network layers. Nakahara teaches as motivation “Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better.”.

Regarding claim 2, the combination of EL-YANIV, and Nakahara teaches A method according to claim 1, in which: the set of output data comprises one or more output data vectors each having a plurality of data values; and (EL-YANIV [¶0048] "Quantized activation functions and quantized weight functions are functions having a finite set of outputs. " [¶0049] "Optionally, the activation functions and quantized weight functions having an output selected from a group of 4, 8, 16, 32, 64, 128, 256, 512 and 1024 possible outputs which are represented in bits, optionally 2, 3, 5, 6, 7, 8, 9, and 10 bits." In the case of binarization which is expected from the neural network quantization, a 8 bit string is interpreted as a 8 element vector.).
the quantising step comprises replacing each data value other than a data value having a highest value amongst the plurality of data values, by a first predetermined value. (EL-YANIV [¶Summary] "each the neuron gradient is of an output of a respective the quantized activation function in one layer of the plurality of layers with respect to an input of the respective quantized activation function and is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, the respective neuron gradient is set as a positive constant value and when the absolute value of the input is smaller than the positive constant threshold value the neuron gradient is set to zero...when an absolute value of said input is smaller than a positive constant threshold value, said respective neuron gradient is set as a positive constant value and when the absolute value of said input is larger than said positive constant threshold value said neuron gradient is set to zero" Data value having a highest value interpreted as synonymous with positive constant threshold value). 

Regarding claim 3, the combination of EL-YANIV, and Nakahara teaches A method according to claim 2, in which the first predetermined value is zero. (EL-YANIV [¶Summary] "each the neuron gradient is of an output of a respective the quantized activation function in one layer of the plurality of layers with respect to an input of the respective quantized activation function and is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, the respective neuron gradient is set as a positive constant value and when the absolute value of the input is smaller than the positive constant threshold value the neuron gradient is set to zero...when an absolute value of said input is smaller than a positive constant threshold value, said respective neuron gradient is set as a positive constant value and when the absolute value of said input is larger than said positive constant threshold value said neuron gradient is set to zero"). 

Regarding claim 4,  A method according to claim 2, in which the quantising step comprises replacing a data value having a highest value amongst the plurality of data values, by a second predetermined value. (EL-YANIV Each of the neuron gradients is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, for instance 1, the respective neuron gradient is set as a positive constant output value Examples of 1 and 0 are both given as potential predetermined values for quantization.). 

Regarding claim 5, the combination of EL-YANIV, and Nakahara teaches A method according to claim 4, in which the second predetermined value is 1. (EL-YANIV Each of the neuron gradients is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, for instance 1, the respective neuron gradient is set as a positive constant output value Examples of 1 and 0 are both given as potential predetermined values for quantization.). 

Regarding claim 6, the combination of EL-YANIV, and Nakahara teaches 
A method according to claim 1, in which: the derived ANN (EL-YANIV [¶0019] “outputting a trained quantized neural network formed as an outcome of the training process.” Trained network interpreted as derived neural network.).
has the same network structure as the base ANN; and (EL-YANIV [¶0019] “The system comprises a storage comprising a neural network model having a plurality of neurons” stored neural network interpreted as base neural network).
the initialising step comprises setting the parameters of the derived ANN to be the same as respective parameters of the base ANN. (EL-YANIV [¶0032] “During the training, floating-point values of the connections are stored and used for the training.” See also FIG. 1). 

Regarding claim 7, the combination of EL-YANIV, and Nakahara teaches 
A method according to claim 1, in which the derived ANN has a different network structure to the base ANN. (EL-YANIV [¶0003] "When training a neural network, training data is put into the first layer of the network, and he network parameters are changed to as to fit the task at hand, for example how correct or incorrect it is, based on the task being performed." Derived neural network is interpreted as synonymous with changed neural network.). 

Regarding claim 9, the combination of EL-YANIV, and Nakahara teaches A method according to claim 1, in which the two or more successive layers are fully connected layers in which each neuron in a fully connected layer is connected to receive data signals from each neuron in a preceding layer and to pass data signals to each neuron in a following layer. (EL-YANIV [¶0057] "The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function such as a binary connection weight function." [¶0050] "The neural network may be any DNN, including any feed-forward artificial neural network such as a convolutional neural network (CNN), fully connected neural network (FNN) and/or recurrent neural network (RNN)."). 

Regarding claim 11, the combination of EL-YANIV, and Nakahara teaches A method of claim 1, in which the training step comprises varying at least the weighting of at least the insertion layer to so that, for an instances of known input data, the output data of the derived ANN is closer to the quantised set of output data. (EL-YANIV See FIG. 1 [¶ 0005] "The method comprises constructing a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set, receiving a training set dataset, using the training set dataset to train the neural network model according to respective the quantized connection weight values" any change in the base neural network with the intent of quantization is being interpreted as closer to the quantized data set.  ). 

Regarding claim 12, the combination of EL-YANIV, and Nakahara teaches 
A method of claim 1, in which the generating step comprises providing the insertion layer to replace one or more layers of the base ANN. (Nakahara [p.1 col. 2] "we introduce the multiply accumulation (MAC) operation on the binarized CNN is almost the same as the binarized average pooling operation by a trick of the training algorithm. Thus, the internal FC layers are replaced into an average pooling layer" Pooling layer interpreted as insertion layer). 

Regarding claim 13, the combination of EL-YANIV, and Nakahara teaches A method according to claim 12, in which the insertion layer has a different layer size to that of the one or more layers it replaces. (Nakahara [p. 2 Col. 1] "In the CNN, almost parameters are focused on the FC lay ers. To remove them, we replace the internal FC layers into an average pooling one." See eqn. 2 for size calculation of pooling layer). 

Regarding claim 14, the combination of EL-YANIV, and Nakahara teaches A method of claim 1, in which the generating step comprises providing the insertion layer in addition to the layers of the base ANN. (Nakahara [p. 2 Col. 1] "Suppose that the input x is a binary value, that is x ∈ {0, 1}, and the output of the binarized average pooling layer [p. 2 Col. 2] "We attach a FC layer to the output of the one dimensional neurons generated by the average pooling layer." is also the binarized one. Expr. (2) introduce a binarized average pooling into a majority operation."). 

Regarding claim 15, the combination of EL-YANIV, and Nakahara teaches A method of claim 1, comprising adding a further weighting to the least squares approximation of the weights to simulate the addition of dropout noise in the ANN. (EL-YANIV [¶0064] "stochastic gradient descent (SGD). The SGD requires exploring a space of parameters in small and noisy process steps where noise is averaged out by stochastic gradient contributions accumulated in each connection weight."). 

Regarding claim 16, the combination of EL-YANIV, and Nakahara teaches A method of claim 1, in which the neurons of each layer of the base ANN process the data signals received from the preceding layer according to a bias function for that layer, the method comprising deriving an initial approximation of at least a bias function for the insertion layer using a least squares approximation from the data signals detected for the first position and a second position (Nakahara See FIG. 1 and Eqn. 1 "X denotes an input, W denotes a weight, Y denotes a bias, U denotes an internal output, f denotes an activation function, and Z denotes an output value to be mapped to (x, y) at the output feature map i + 1"). 

Regarding claim 18, EL-YANIV teaches A non-transitory machine-readable medium which stores computer software according to claim 17. ([¶0040] "The present invention may be a system, a method, and/or a computer program product."). 

Regarding claim 19, claim 19 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Regarding claim 20, claim 20 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124