DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/08/2020 and 04/21/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 10 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 10 recites the limitation "the second residual error" in “wherein the second residual error comprises a second difference between the first residual error and a second value corresponding to the second binary representation of the first residual error”.  There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 12-17, 19, 21, 23-24, 26, and 50 are rejected under 35 U.S.C. 103 as being unpatentable over Courbariaux et al. (“Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1”) in view of Toderici et al. (US 10192327 B1).
Regarding Claim 1,
Courbariaux teaches a system, comprising: at least one processor; and at least one memory including program code which when executed by the at least one processor provides operations comprising: 
Training (pg. 1, Abs. We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.), based at least on a training dataset (pg. 1, Abs. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets.), a machine learning model, the machine learning model including a first neuron configured to generate an output by at least applying, to one or more inputs to the first neuron (pg. 6, section 3.2; Artificial neurons are basically multiply-accumulators computing weighted sums of their inputs.), an activation function (pg. 3, section 1.3; seen as propagating the gradient through hard tanh, which is the following piece-wise linear activation function: 
    PNG
    media_image1.png
    44
    401
    media_image1.png
    Greyscale
), the output of the activation function being subject to a multi-level binarization function configured to generate an estimate of the output (pg. 2, section 1.1; Our first binarization function is deterministic: 
    PNG
    media_image2.png
    51
    332
    media_image2.png
    Greyscale
 Binarization function performed at different layers (i.e multi-level) pg. 4, section 1.6; As the output of one layer is the input of the next, all the layers inputs are binary, with the exception of the first layer.), and the estimate of the output including a first bit providing a first binary representation of the output (pg. 2, section 1.1; where x b is the binarized variable (weight or activation) and x the real-valued variable.); and 
Courbariaux does not explicitly disclose 
and a second bit providing a second binary representation of a first residual error associated with the first binary representation of the output;
in response to determining that the training of the machine learning model is complete, deploying the trained machine learning model to perform a cognitive task.
However, Toderici (US 10192327 B1) teaches
and a second bit providing a second binary representation of a first residual error associated with the first binary representation of the output (col. 8 lines 36-42 where E.sub.t and D.sub.t represent the encoder network and decoder network at iteration t, respectively, B represents the binarizer, b.sub.t represents the progressive binary code representation for the iteration, 
    PNG
    media_image3.png
    28
    29
    media_image3.png
    Greyscale
 represents the progressive reconstruction of the original image x with γ=0 for “one shot” reconstruction, or γ=1 for additive reconstruction, and r.sub.t represents the residual error of x and the reconstruction 
    PNG
    media_image3.png
    28
    29
    media_image3.png
    Greyscale
);
in response to determining that the training of the machine learning model is complete, deploying (col. 15 lines 46-51; A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form,) the trained machine learning model to perform a cognitive task (Col. 10 lines 54-58; Once the neural network system 100 has been trained, the system can be used to perform variable rate image compression by varying the number of iterations performed by the system to generate a compressed representation of a received input image.).
Courbariaux and Toderici are analogous because they are both directed towards the same field of endeavor of binary neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the binary neural network of Courbariaux with the residual error of Toderici.
Doing so would allow for calculating a difference between the expected output and the actual output of the neural network. The differences can be minimized through each iteration this improving the accuracy of the neural network (Col. 5 lines 4-15;).
Regarding Claim 2,
Courbariaux and Toderici teach the system of claim 1. Courbariaux further teaches wherein the first neuron is further configured to apply, to the one or more inputs, at least one binary weight having one of two values prior to applying the activation function (pg. 2, section 1.1; When training a BNN, we constrain both the weights and the activations to either +1 or −1. Those two values are very advantageous from a hardware perspective, as we explain in Section 4.).
Regarding Claim 12,
Courbariaux and Toderici teach the system of claim 1. Toderici further teaches wherein the estimate of the output further includes a third bit providing a third binary representation of a second residual error associated with the second binary representation of the first residual error (col. 8 lines 36-42; where E.sub.t and D.sub.t represent the encoder network and decoder network at iteration t, respectively, B represents the binarizer, b.sub.t represents the progressive binary code representation for the iteration, 
    PNG
    media_image3.png
    28
    29
    media_image3.png
    Greyscale
 represents the progressive reconstruction of the original image x with γ=0 for “one shot” reconstruction, or γ=1 for additive reconstruction, and r.sub.t represents the residual error of x and the reconstruction 
    PNG
    media_image3.png
    28
    29
    media_image3.png
    Greyscale
).
Courbariaux and Toderici are analogous because they are both directed towards the same field of endeavor of binary neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the binary neural network of Courbariaux with the residual error of Toderici.
Doing so would allow for calculating a difference between the expected output and the actual output of the neural network. The differences can be minimized through each iteration this improving the accuracy of the neural network (Col. 5 lines 4-15;).
Regarding Claim 13,
Courbariaux and Toderici teach the system of claim 1. Courbariaux further teaches wherein the machine learning model further includes a second neuron configured to receive (pg. 6, section 3.2; Artificial neurons are basically multiply-accumulators computing weighted sums of their inputs.), as an input, the estimate of the output of the activation function applied at the first neuron (pg. 3, section 1.3; seen as propagating the gradient through hard tanh, which is the following piece-wise linear activation function: 
    PNG
    media_image1.png
    44
    401
    media_image1.png
    Greyscale
), and wherein the second neuron is further configured to apply, to the estimate of the output of the activation function, one or more binary weights (pg. 1, Abs.; At training-time the binary weights and activations are used for computing the parameters gradients.).
Regarding Claim 14,
Courbariaux and Toderici teach the system of claim 13. Courbariaux further teaches wherein the one or more binary weights are applied to the estimate of the output of the activation function by determining a dot product between the one or more binary weights (pg. 4. Algorithm 5;
    PNG
    media_image4.png
    24
    339
    media_image4.png
    Greyscale
) and the estimate of the output of the activation function (pg. 3, section 1.3; seen as propagating the gradient through hard tanh, which is the following piece-wise linear activation function: 
    PNG
    media_image1.png
    44
    401
    media_image1.png
    Greyscale
).
Regarding Claim 15,
Courbariaux and Toderici teach the system of claim 14. Courbariaux further teaches wherein the dot product is determined by performing an exclusive NOR (XNOR) operation between the one or more binary weights and the estimate of the output of the activation function (pg. 4. Algorithm 5;
    PNG
    media_image4.png
    24
    339
    media_image4.png
    Greyscale
), and wherein the dot product is further determined by performing a pop-count operation to determine a quantity of bits set by the exclusive NOR (XNOR) operation (pg. 7; section 3.3; When necessary, we apply each filter on the map and perform the required multiply-accumulate (MAC) operations (in our case, using XNOR and popcount operations).).
Regarding Claim 16,
Courbariaux and Toderici teach the system of claim 15. Courbariaux further teaches wherein a fixed quantity of hardware blocks are used to perform the exclusive NOR (XNOR) operation and the pop-count operation (pg. 7; section 3.3; When necessary, we apply each filter on the map and perform the required multiply-accumulate (MAC) operations (in our case, using XNOR and popcount operations).).
Regarding Claim 17,
Courbariaux and Toderici teach the system of claim 15. Courbariaux further teaches wherein a quantity of hardware blocks used to perform the exclusive NOR (XNOR) operation and the pop-count operation are determined based at least on a quantity of levels of binarization associated with the multi-level binarization function (pg. 7; section 3.3; When necessary, we apply each filter on the map and perform the required multiply-accumulate (MAC) operations (in our case, using XNOR and popcount operations).).
Regarding Claim 19,
Courbariaux and Toderici teach the system of claim 15. Courbariaux further teaches wherein multiple hardware blocks are configured to perform the exclusive NOR (XNOR) operation and the pop-count operation on the first bit comprising the estimate of the output of the activation function and the second bit comprising the estimate of the output of the activation function at least partially in parallel (pg. 7; section 3.3; When necessary, we apply each filter on the map and perform the required multiply-accumulate (MAC) operations (in our case, using XNOR and popcount operations). Since we now have binary filters, many 2D filters of size k×k repeat themselves. By using dedicated hardware/software, we can apply only the unique 2D filters on each feature map and sum the result wisely to receive each 3D filter’s convolutional result.).
Regarding Claim 21,
Courbariaux and Toderici teach the system of claim 1. Courbariaux further teaches wherein the machine learning model comprises a binary neural network (pg. 1, Abs. We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.).
Regarding Claim 23,
Courbariaux and Toderici teach the system of claim 1.  Courbariaux further teaches wherein the activation function comprises a sigmoid function and/or a rectified linear unit (ReLU) function (pg. 2, section 1.1; where σ is the “hard sigmoid” function).
Regarding Claim 24,
Courbariaux and Toderici teach the system of claim 1. Toderici further teaches further comprising: 
performing the cognitive task by at least applying the trained machine learning model (Col. 10 lines 54-58; Once the neural network system 100 has been trained, the system can be used to perform variable rate image compression by varying the number of iterations performed by the system to generate a compressed representation of a received input image.); and 
providing, as a result of the cognitive task, an output of the trained machine learning model (Col. 10 lines 54-58; Once the neural network system 100 has been trained, the system can be used to perform variable rate image compression by varying the number of iterations performed by the system to generate a compressed representation of a received input image.).
Courbariaux and Toderici are analogous because they are both directed towards the same field of endeavor of binary neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the binary neural network of Courbariaux with the residual error of Toderici.
Doing so would allow for calculating a difference between the expected output and the actual output of the neural network. The differences can be minimized through each iteration this improving the accuracy of the neural network (Col. 5 lines 4-15;).
Regarding Claim 26,
Claim 26 is the method claim corresponding to the system of claim 1. Claim 26 is substantially similar to claim 1 and is rejected on the same grounds.
Regarding Claim 50,
Claim 50 is the computer readable-medium claim corresponding to the system of claim 1. Claim 50 is substantially similar to claim 1 and is rejected on the same grounds.

Claims 3, 4, 7 are rejected under 35 U.S.C. 103 as being unpatentable over the combination Courbariaux/Toderici, as applied above, and further in view of Choi et al. (US-20190122116-A1).
Regarding Claim 3,
Courbariaux and Toderici teach the system of claim 2. 
Courbariaux and Toderici do not explicitly disclose
processing, with the machine learning model, the training dataset during a first training epoch using a function having a first slope to approximate the at least one binary weight; and 
3processing, with the machine learning model, the training dataset during a second training epoch using the function having a second slope to approximate the at least one binary weight.
However, Choi (US 20190122116 A1) teaches
processing, with the machine learning model, the training dataset during a first training epoch using a function having a first slope (para [0092] As depicted, an activation function can be expressed as: 
    PNG
    media_image5.png
    25
    219
    media_image5.png
    Greyscale
, where actFn( ) refers to an activation function, Clip( ) refers to a clipping function, and m is the slope of the activation (with a smaller m value indicating a steeper slope). Through repeated training epochs, the clipping activation function approaches binarization. That is, as m decreases through repeated training epochs, the stope becomes steeper, and the activation function approaches a binarization function.) to approximate the at least one binary weight (para [0109] Based on back propagation, a weight of one or more neurons can be adjusted for the next training epoch.); and 
3processing, with the machine learning model, the training dataset during a second training epoch (para [0095] Graph 1400 shows that after a sufficient number of training epochs (i.e. approximately 250 training epochs),) using the function having a second slope (para [0092] As depicted, an activation function can be expressed as: 
    PNG
    media_image5.png
    25
    219
    media_image5.png
    Greyscale
, where actFn( ) refers to an activation function, Clip( ) refers to a clipping function, and m is the slope of the activation (with a smaller m value indicating a steeper slope). Through repeated training epochs, the clipping activation function approaches binarization. That is, as m decreases through repeated training epochs, the stope becomes steeper, and the activation function approaches a binarization function.) to approximate the at least one binary weight (para [0109] Based on back propagation, a weight of one or more neurons can be adjusted for the next training epoch.).
Courbariaux, Toderici, and Choi are analogous because they are both directed towards the same field of endeavor of binary neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the binary neural network of Courbariaux and Toderici with the binarization function of Choi.
Doing so would allow for incorporating lower-precision processing units. Lower-precision bits such as 8-4-or 2-bits improves the performance of the neural network while reducing the computational cost (para [0034])
Regarding Claim 4,
Courbariaux, Toderici, and Choi teach the system of claim 3.  Courbariaux further teaches wherein the first training epoch and/or the second training epoch comprises a forward pass and a backward pass of the training dataset through the machine learning model (pg. 8, section 5; Moreover, Lin et al. (2015) quantize the neurons only during the back propagation process, and not during forward propagation.).
Regarding Claim 7,
Courbariaux, Toderici, and Choi teach the system of claim 3. Choi further teaches wherein the second slope is greater than the first slope to increase a conformance between the function and a step function representative of the at least one binary weight (para [0092] As depicted, an activation function can be expressed as: 
    PNG
    media_image5.png
    25
    219
    media_image5.png
    Greyscale
, where actFn( ) refers to an activation function, Clip( ) refers to a clipping function, and m is the slope of the activation (with a smaller m value indicating a steeper slope). Through repeated training epochs, the clipping activation function approaches binarization. That is, as m decreases through repeated training epochs, the stope becomes steeper, and the activation function approaches a binarization function.).
Courbariaux, Toderici, and Choi are analogous because they are both directed towards the same field of endeavor of binary neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the binary neural network of Courbariaux and Toderici with the binarization function of Choi.
Doing so would allow for incorporating lower-precision processing units. Lower-precision bits such as 8-4-or 2-bits improves the performance of the neural network while reducing the computational cost (para [0034])

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the combination Courbariaux/Toderici/Choi, as applied above, and further in view of Thorpe et al. (US-20190286944-A1).
Regarding Claim 8, 
Courbariaux, Toderici, and Choi teach the system of claim 3. 
	Courbariaux, Toderici, and Choi do not explicitly disclose
wherein using the function to approximate the at least one binary weight during the training of the machine learning model generates the trained machine learning model to include one or more semi-binarized weights, and wherein the one or more semi-binarized weights are replaced with one or more corresponding binary weights prior to the deployment of the trained machine learning model to perform the cognitive task.
However, Thorpe et al. (US 20190286944 A1) teaches 
wherein using the function to approximate the at least one binary weight during the training of the machine learning model generates the trained machine learning model to include one or more semi-binarized weights, and wherein the one or more semi-binarized weights are replaced with one or more corresponding binary weights prior to the deployment of the trained machine learning model to perform the cognitive task (para [0015] Weighted or un-weighted synapses are replaced by set of binary weights. Learning only requires flipping some of these binary weights and performing sums and comparisons, thus minimizing the computational burden.).
Courbariaux, Toderici, Choi, and Thorpe are analogous because they are both directed towards the same field of endeavor of binary neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the binary neural network of Courbariaux, Toderici, and Choi with the method of replacing weights with binary weights of Thorpe.
Doing so would allow for minimizing the computational burden of the learning process. Reduced computations improves the efficiency of the neural network (para [0013]).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over the combination Courbariaux/Toderici, as applied above, and further in view of Tai et al. ("Image super-resolution via deep recursive residual network.").
Regarding Claim 10,
Courbariaux and Toderici teach system of claim 1. Toderici further teaches
wherein the first residual error comprises a first difference between the output and a first value corresponding to the 4First Pieliminary Amendmentfirst binary representation of the output (col. 8 lines 36-42; where E.sub.t and D.sub.t represent the encoder network and decoder network at iteration t, respectively, B represents the binarizer, b.sub.t represents the progressive binary code representation for the iteration, 
    PNG
    media_image3.png
    28
    29
    media_image3.png
    Greyscale
 represents the progressive reconstruction of the original image x with γ=0 for “one shot” reconstruction, or γ=1 for additive reconstruction, and r.sub.t represents the residual error of x and the reconstruction 
    PNG
    media_image3.png
    28
    29
    media_image3.png
    Greyscale
), 
Courbariaux and Toderici are analogous because they are both directed towards the same field of endeavor of binary neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the binary neural network of Courbariaux with the residual error of Toderici.
Doing so would allow for calculating a difference between the expected output and the actual output of the neural network. The differences can be minimized through each iteration this improving the accuracy of the neural network (Col. 5 lines 4-15;).
Courbariaux and Toderici do not explicitly disclose
and wherein the second residual error comprises a second difference between the first residual error and a second value corresponding to the second binary representation of the first residual error.
However, Tai teaches
and wherein the second residual error comprises a second difference (pg. 3148, section 1; All of the three models learn the residual image between the input Interpolated LR (ILR) image and the ground truth HR image in the residual branch.) between the first residual error and a second value corresponding to the second binary representation of the first residual error (where u = 1, 2, ..., U, U is the number of residual units in a recursive block, Hu−1 and Hu are the input and output of the u-th residual unit, and F denotes the residual function. Instead of directly using the above residual unit, we modify Eq. 3 so that the inputs to the identity branch and the residual branch are different. As described in the beginning of Sec. 3, the inputs to all of the identity branches of the residual units in one recursive block are kept the same, i.e., H0 in Fig. 3. As a result, there are multiple paths between the input and output of our recursive block, as shown in Fig. 4.).
Courbariaux, Toderici, and Tai are analogous because they are both directed towards the same field of endeavor of neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural network of Courbariaux and Toderici with the residual neural network of Tai.
Doing so would allow for recursive learning to improve the accuracy by increasing depth without adding any weight parameters. The neural network achieves the best performance with fewer parameters (pg. 3148, section 1; Last but not least, through recursive learning, DRRN can improve accuracy by increasing depth without adding any weight parameters.) 

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over the combination Courbariaux/Toderici, as applied above, and further in view of Garbin et al. (US 20180144240 A1).
Regarding Claim 18,
Courbariaux and Toderici teach the system of claim 15. 
	Courbariaux and Toderici do not explicitly disclose
wherein a single hardware block is configured to perform the exclusive NOR (XNOR) operation and the pop-count operation on the first bit comprising the estimate of the output of the activation function and the second bit comprising the estimate of the output of the activation function sequentially.
However, Garbin et al. (US 20180144240 A1) teaches
wherein a single hardware block is configured to perform the exclusive NOR (XNOR) operation and the pop-count operation on the first bit comprising the estimate of the output of the activation function and the second bit comprising the estimate of the output of the activation function sequentially (para [0006] However, these approaches have not demonstrated that they can efficiently reduce the high energy that is involved for each classification run on a GPU, e.g., the high energy associated with leakage energy component related to the storage of the NN weights. A benefit of assuming weights and activations of two possible values each (either +1 or −1) is that the multiply-accumulate operation (i.e., dot-product) that is typically encountered in NNs boils down to a popcount of element-wise XNOR or XOR operations.).
Courbariaux, Toderici, and Garbin are analogous because they are both directed towards the same field of endeavor of neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural network of Courbariaux and Toderici with the popcount operation of Garbin.
Doing so would allow for reducing high every associated with energy leakage associated with storing weights. The overall efficiency of the neural network can be reduced by reducing the energy consumption (para [0006])

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Su et al. (US 20200356011 A1) – discloses a neural network with a binarization function.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.N./Examiner, Art Unit 2121                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145