DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the application filed on 1/10/2020.
Claims 1-24 are pending and have been examined.

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. The present application claims foreign priority based on Korean application number KR10-2019-0087099 filed on 7/18/2019, and claims priority to U.S. Provisional application No. 62/791,237 filed on 1/11/2019. The examiner notes that a certified copy of application number KR10-2019-0087099 was retrieved on 2/10/2020. 
Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 119(e) as follows:
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA  35 U.S.C. 112, except for the best mode requirement. See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994).
The disclosure of the prior-filed application, U.S. Provisional application No. 62/791,237 (hereinafter “the ‘237 provisional application”) fails to provide adequate support or enablement in the manner provided by 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph for one or more claims of this application. Independent claims 1 and 16 recite, using respective similar language, “determining layer contraction parameters for determining an affine transformation relationship between the input layer and the output layer, for approximation of the inference process; and performing inference on one or more other sequential input samples among the sequential input samples using affine transformation based on the layer contraction parameters determined with respect to the reference sample”, and claims 4 and 19 both recite “wherein the affine transformation is a transformation of a multiply-accumulate (MAC) operation and an operation of an activation function in the hidden layers, based on a form of a Hadamard product using the layer contraction parameters”. The as-filed specification of the ‘237 provisional application fails to provide adequate support or enablement for at least these elements of claims 1, 4, 16 and 19. That is, the as-filed specification of the ‘237 provisional application fails to provide adequate support or enablement for at least the above-noted “determining an affine transformation relationship … and performing inference on one or more other sequential input samples among the sequential input samples using affine transformation” elements of independent claims 1 and 16, and the above-noted “wherein the affine transformation is a transformation of a multiply-accumulate (MAC) operation and an operation of an activation function in the hidden layers, based on a form of a Hadamard product using the layer contraction parameters” elements of dependent claims 4 and 19. Thus, the as-filed specification of the ‘237 provisional application fails to provide adequate support or enablement for at least the above-noted elements of claims 1, 4, 16 and 19. Based on their respective dependencies from independent claims 1 and 16, the specification of the ‘237 provisional application also fails to provide adequate support or enablement for dependent claims 2-15 and 17-24. 
For example, the original specification of the ‘237 provisional application is silent regarding any “affine transformation” let alone “determining an affine transformation relationship between the input layer and the output layer, for approximation of the inference process; and performing inference on one or more other sequential input samples among the sequential input samples using affine transformation” as recited in independent claims 1 and 16, using respective similar language. Further, for example, the original specification of the ‘237 provisional application also fails to mention any “affine transformation” or any “Hadamard product”, much less “wherein the affine transformation is a transformation of a multiply-accumulate (MAC) operation and an operation of an activation function in the hidden layers, based on a form of a Hadamard product using the layer contraction parameters” as recited in claims 4 and 19. 
Therefore, the effective filing date for claims 1-7, 9-11 and 15-20 of the instant application is the filing date of the priority Korean priority application, 7/08/2019. Examiner will consider if the ‘237 provisional application supports each of the other claims if a rejection would need to rely upon an intervening reference between the actual filing date of the priority Korean application, 7/08/2019 and the 1/11/2019 filing of the ‘237 provisional application.
Each claim will receive benefit of the earliest filing date above for which a continuous chain of support can be established for the entirety of the claim. As discussed above, the effective filing date for at least claims 1-24 of the instant application is the filing date of the Korean priority application, 07/18/2019.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 5/26/2020 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement has been considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(3) because Figures 2 and 5 include letters which do not measure at least .32 cm. (1/8 inch) in height (i.e., many of the subscript and superscript characters in FIGs. 2 and 5– including the subscript and superscript characters element 210 in FIG. 2, and the subscript and superscript characters in elements/steps 501 and 503 in FIG. 5).
The drawings are also objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference characters not mentioned in the description: 
Reference characters 641 and 642 shown in Figure 6D are not found in the detailed description. 
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities:
Reference characters 641 and 642 shown in Figure 6D are not described in applicant’s specification (see, e.g., paragraphs 120 and 120 describing FIG. 6D). Appropriate correction is required.
The Brief Description of the Drawings does not include a reference to and brief description of each of the drawings as set forth in 37 CFR 1.74. In particular, at paragraph 35, in the “BRIEF DESCRIPTION OF THE DRAWINGS”, the specification recites “FIGS. 6A to 6D are views for describing methods of determining whether to update a reference sample, according to one or more embodiments.” 
As set forth in 37 CFR 1.74, “[w]hen there are drawings, there shall be a brief description of the several views of the drawings and the detailed description of the invention shall refer to the different views by specifying the numbers of the figures, and to the different parts by use of reference letters or numerals (preferably the latter).” See MPEP § 608.01(f). Appropriate correction is required.
In paragraphs 9 and 24 of the specification, the recitations of “to update of the reference sample” are grammatically incorrect and appear to either include the extraneous word “of” or are missing one or more words between “update” and “of”. Appropriate correction is required.
Also, in paragraphs 9, 11 and 24 of the specification, the recitations of “an input sample proceeding the reference sample among the sequential input samples” are grammatically incorrect. As discussed in the section 112(b) rejections below, it is unclear whether these recitations are typographical errors (i.e., “proceeding” should read “preceding” as in paragraph 15) or translation errors (i.e., “proceeding” should read “following” or “subsequent to”). Appropriate correction is required.

Claim Objections
Claims 5-12 and 20-24 are objected to because of the following informalities: 
Claims 5, 7 and 20 each recite “to update of the reference sample” (see, e.g., line 3 in both of claims 5 and 20, and lines 1-2 of claim 7). These recitations are grammatically incorrect and appear to either include the extraneous word “of” or are missing one or more words between “update” and “of”. For examination purposes, recitations of “to update of the reference sample” are being interpreted as “to update [[of]] the reference sample”. Appropriate correction is required.
	Also, claims 6-12 and 21-24 are objected to based on their dependencies from claims 5 and 20, respectively.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 5-12 and 20-24 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Lines 6-7 of both claims 5 and 12 recite “wherein the current input sample is an input sample proceeding the reference sample among the sequential input samples.” These recitations are grammatically incorrect and unclear. In particular, based on paragraphs 9, 15 and 24 of the specification, which alternatively recite “an input sample proceeding the reference sample among the sequential input samples” and “an input sample preceding the current input sample”, it is unclear whether the term “proceeding” in claims 5 and 20 is a typographical error that should read “preceding” or whether it is a translation error and should read “subsequent to” or “following”. That is, it is unclear whether the recitations of “wherein the current input sample is an input sample proceeding the reference sample among the sequential input samples” should read “wherein the current input sample is an input sample preceding  the reference sample among the sequential input samples.” or “wherein the current input sample is an input sample following  the reference sample among the sequential input samples”. For the purposes of determining patent eligibility and comparison with the prior art, the examiner is interpreting the term “wherein the current input sample is an input sample proceeding the reference sample among the sequential input samples” as the current input sample being an input sample preceding or following the reference sample in the sequential input samples. Appropriate correction is required.
Also, claims 6-12 and 21-24 which depend from claims 5 and 20, respectively, are rejected under 35 U.S.C. 112(b) as being indefinite under the same rationale as claims 5 and 20.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 5-8, 13-16 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Das et al. (U.S. Patent Application Pub. No. 2018/0322606 A1, hereinafter “Das”) in view of non-patent literature Rueckauer et al. ("Conversion of continuous-valued deep networks to efficient event-driven networks for image classification." Frontiers in neuroscience 11 (2017): 682: 1-12, hereinafter “Rueckauer”).
With respect to claim 1, Das discloses the invention as claimed including a processor-implemented neural network method (see, e.g., Abstract and paragraph 141, “method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network”, “the general-purpose processing unit (GPGPU) 700 can be configured to be particularly efficient in processing the type of computational workloads associated with training deep neural networks” [i.e., a processor-implemented neural network method]), the method comprising:
determining a reference sample among sequential input samples to be processed by a neural network (aside from repeating the claim language in paragraphs 5, 9-17, 20-21 and 24-28 of applicant’s specification, and stating “determine a reference sample from among the sequential input samples. For example, when the sequential input samples are individual frames of video data, the processor 110 may determine image data of a first frame of the frames to be a first reference sample.” in paragraph 82, the specification does not define what is meant by “a reference sample”. The plain meaning of sample is “a small part of anything or one of a number” or “a subset of a population” (see, dictionary.com/browse/sample). Therefore, “a reference sample”, under the broadest reasonable interpretation (BRI), in light of the specification, is any subset or part of a sequence of input data items) (see, e.g., paragraphs 181, 188 and 190, “parse the decoded video and perform preliminary processing operations on the frames of the decoded video in preparation of processing the frames using a trained image recognition model … for a CNN that is used to perform image recognition on the high-resolution video data”, “The mini-batch is split … using a subset of the samples in the mini-batch.”, “For a layer of a neural network, the input data 1402 … is partitioned … Node 0 receives a first block of input data 1402A” [i.e., determine a first reference block/frame sample in a sequence of input images/frames]), the neural network comprising an input layer, one or more hidden layers, and an output layer (see, e.g., paragraph 162, “RNN 1000 can be described has having an input layer 1002 that receives an input vector, hidden layers 1004 to implement a recurrent function, …, and an output layer 1006 to output a result.”);
performing an inference process of obtaining an output activation of the output layer based on operations in the hidden layers corresponding to the reference sample input to the input layer (see, e.g., paragraphs 135, 162, 182 and 224, “The hidden layer transforms input received by the input layer into a representation that is useful for generating output in the output layer … Data received at the nodes of an input layer … are propagated … to the nodes of the output layer via an activation function that calculates the states of the nodes”, “input layer 1002 that receives an input vector, hidden layers 1004 to implement a recurrent function, …, and an output layer 1006 to output a result.”, “GPGPU 1306 can support instruction that are specifically optimized to perform inferencing computations on a trained neural network”, “An inferencing pass through the neural network can generate output in which each pixel of an input image is annotated with a classification or label identifying the object associated with the pixel” [i.e., perform inferencing process to obtain/classification/label of output layer based on hidden layer functions/operations corresponding to the sample input to the input layer]);
determining layer … parameters for determining an affine transformation relationship between the input layer and the output layer, for approximation of the inference process (see, e.g., paragraphs 150 and 158, “RNNs enable modeling of sequential data by sharing parameter data across different parts of the neural network.”, “convolution stage 916 can include an affine transformation, which is any transformation that can be specified as a linear transformation plus a translation … The convolution stage computes the output of functions (e.g., neurons) that are connected to specific regions in the input … The neurons compute a dot product between the weights of the neurons and the region in the local input to which the neurons are connected. The output from the convolution stage 916 defines a set of linear activations” [i.e., determine neuron layer weights/parameters to determine an affine transformation between the input and output layers to approximate the inference/activation output]); and
performing inference on one or more other sequential input samples among the sequential input samples using affine transformation based on the layer … parameters determined with respect to the reference sample (as indicated above, “the reference sample”, under the BRI, in light of the specification, is any subset or part of a sequence of input data items) (see, e.g., paragraphs 158, 162 and 224, “convolution stage 916 performs several convolutions … to produce … activations. The convolution stage 916 can include an affine transformation, … Affine transformations include rotations, translations, scaling, and combinations of these transformations. The convolution stage computes the output of functions (e.g., neurons) that are connected to specific regions in the input, … The neurons compute a dot product between the weights of the neurons and the region in the local input … The output from the convolution stage 916 defines a set of linear activations that are processed by successive stages of the convolutional layer 914.”, “input layer 1002 that receives an input vector …, and an output layer 1006 to output a result. … A second input (x2) [i.e., one or more other sequential input samples] can be processed by the hidden layer 1004 using state information that is determined during the processing of the initial input (x1) [i.e., the reference sample]. A given state can be computed as ff=f(fff+fffff), where f and f are parameter matrices”, “An inferencing pass through the neural network can generate output in which each pixel of an input image is annotated with a classification or label identifying the object associated with the pixel” [i.e., perform inference/compute output using affine transformation based on the layer weights/parameters determined with respect to the initial input/reference sample]).
Although Das substantially discloses the claimed invention, Das is not relied on for explicitly disclosing determining layer contraction parameters for determining a … relationship between the input layer and the output layer, for approximation of the inference process; and
performing inference … based on the layer contraction parameters.
In the same field, analogous art Rueckauer teaches determining layer contraction parameters (aside from repeating the claim language in paragraphs 5, 8-11, 20 and 23-24 and stating “The layer contraction parameters determined with respect to the current reference sample may include a single weight matrix indicating weights, a bias vector indicating biases, and a binary mask” in paragraphs 6 and 21, applicant’s specification does not explicitly define “layer contraction parameters”. The plain meaning of contraction is “an act or instance of contracting or the quality or state of being contracted” (see, dictionary.com/browse/contraction), and the plain meaning of the verb contract is “to draw together; make shorter, thinner, narrower, etc.” or “to shorten (a word, phrase, etc.) by combining or omitting some of its elements” (see, dictionary.com/browse/contract).Therefore, “layer contraction parameters”, under the BRI, in light of the specification, can include compressing, pruning, reducing, downscaling, combining, merging or omitting any weights, bias values, or a binary mask associated with a neural network layer) (see, e.g., pages 2, 4 and 10, “take the parameters of a pre-trained ANN and to map them to an equivalent-accurate SNN … we show that the conversion to spiking networks is synergistic with ANN network compression techniques such as parameter quantization”, “weight normalization mechanism … can simply be extended to biases by linearly rescaling all weights and biases such that the ANN activation a … is smaller than 1 for all training examples. In order to preserve the information encoded within a layer, the parameters of a layer need to be scaled jointly.” “our work builds upon and complements the recent advances in … network compression.” [i.e., determine network layer compression/contraction parameters for layer weights and biases]) for determining a … relationship between the input layer and the output layer, for approximation of the inference process (see, e.g., pages 5-6 and 9-10, “Batch-normalization … BN introduces additional layers where affine transformations of inputs are performed … transformations can be integrated into the weight vectors, thereby preserving the effect of BN, but eliminating the need to compute the normalization repeatedly for each sample during inference … the input to the neurons in the first hidden layer is obtained by multiplying the corresponding kernels with the analog input image x”, “Since softmax is applied at the last layer of the network, one could simply infer the classification output from the softmax”, “network is trained using full-precision weights in combination with binarized weights. Either set of weights can be used during inference.”, “SNNs during inference lie close to the original ANNs” [i.e., for determining a mapping/relationship between the input layer and the classification output of the output layer for inference approximation that is close to the original ANN]); and
performing inference … based on the layer contraction parameters (see, e.g., pages 6 and 10, “infer the classification output from the softmax computed on the membrane potentials, without another spike generation mechanism. This simplification could speed up inference time and possibly improve the accuracy”, “the final error rate in a spiking network drops off rapidly during inference when an increasing number of operations is used to classify a sample. The network classification error rate can be tailored to the number of operations that are available during inference, allowing for accurate classification” [i.e., performing inference with the SNN/spiking neural network based on the compression/contraction parameters]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das to incorporate the teachings of Rueckauer to provide techniques for implementing “Spiking neural networks (SNNs)” by converting “common operations such as max-pooling, softmax, batch-normalization and Inception-modules” into “spiking equivalents of these operations” so that “continuous-valued deep Convolutional Neural Networks (CNNs) can be converted into accurate spiking equivalents.” (See, e.g., Rueckauer, Abstract, page 1). Doing so would have allowed Das to use Rueckauer’s techniques to convert continuous-valued deep networks and CNNs into SNNs, thus “allowing conversion of nearly arbitrary CNN architectures” whereby “the SNNs can achieve more than 2x reductions in operations compared to the original CNNs”, as suggested by Rueckauer (See, e.g., Rueckauer, Abstract and page 1). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claim 13, as discussed above, Das in view of Rueckauer teaches the method of claim 1.
Das further discloses wherein each of the sequential input samples corresponds to each of consecutive frames of video data (see, e.g., FIG. 15A – depicting “Inputs (Video … 1502” for “Video Summarization” and paragraph 181, “vision processor 1304 can work in concert to accelerate computer vision operation … vision processor 1304 can then parse the decoded video and perform preliminary processing operations on the frames of the decoded video in preparation of processing the frames using a trained image recognition model.” [i.e., the sequential input samples correspond to consecutive frames of video data]), and
the determining of the reference sample comprises determining image data of a first frame of the consecutive frames to be the reference sample (see, e.g., FIG. 15A – depicting input video data for “Image Understanding”, “Video Summarization” and “Speech/NLP Understanding” and paragraphs 190 and 197, “Node 0 receives a first block of input data 1402A”, “Input data 1502 is provided to a layer of applications 1504. In one embodiment the input data 1502 is multi-modal input including but not limited to video and/or image data, data” [i.e., determining the reference sample/first block of input data 1402A includes determining image data of a first frame of the consecutive video frames]).

Regarding claim 14, as discussed above, Das in view of Rueckauer teaches the method of claim 1.
Das further discloses wherein the performing of the inference comprises determining either one of an image recognition result and a voice recognition result (see, e.g., paragraph 197, “applications 1504 include multi-modal fusion and decision-making applications that can process the input to enable machine learning tasks such as image understanding, video summarization, speech and natural language processing” [i.e., the inference includes determining either image understanding/recognition or speech processing/voice recognition]).

Regarding claim 15, as discussed above, Das in view of Rueckauer teaches the method of claim 1.
Examiner’s Note: claim 15, as drafted, depends from claim 1. If applicant intended for claim 15 to be an independent claim, the examiner suggests that one way to do so is to amend the last step of claim 15 to explicitly recite the steps of claim 1 instead of the current recitation of a “non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, configure the at least one processor to perform the method of claim 1.”
	Das further discloses a non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, configure the at least one processor to perform the method (see, e.g., paragraphs 330, 347 and claim 18, “one embodiment may be implemented by representative code stored on a machine-readable medium which represents and/or defines logic within an integrated circuit such as a processor. For example, the machine-readable medium may include
instructions which represent various logic within the processor. When read by a machine, the instructions may cause the machine to fabricate the logic to perform the techniques described herein.”, “a non-transitory machine-readable medium to store instructions for execution by the heterogeneous processing system”, “a non-transitory machine-readable medium to store instructions for execution by the heterogeneous processing system to cause the heterogeneous processing system to:” [perform method operations/steps]). (as indicated above, Das in view of Rueckauer teaches the method of claim 1, see above citations to Das and Rueckauer regarding the limitations of claim 1).

With respect to independent claim 16, claim 16 is substantially similar to claim 1 and therefore is rejected on the same ground as claim 1, discussed above. In particular, claim 16 is an apparatus claim that corresponds to the method of claim 1. 
In addition, Das further discloses a neural network data apparatus comprising: at least one processor configured (see, e.g., paragraphs 341-343, “Examples may include subject matter such as … an apparatus or system according to embodiments and examples described herein.”, “computer programs can be configured to perform particular operations or actions based on instructions that, when executed by a data processing apparatus, cause the data processing apparatus to perform the actions indicated actions.”, “One embodiment provides for a system to compute and distribute data for distributed training of a neural network, the system comprising … a first set of general-purpose processor cores to execute instructions … ; and a general-purpose graphics processor to perform compute operations associated with machine learning framework workflow” [i.e., a neural network apparatus comprising a processor configured to perform operations]).

Regarding claims 5 and 20, as discussed above, Das in view of Rueckauer teaches the method of claim 1 and the apparatus of claim 16.
Das further discloses determining whether to update the reference sample (as indicated above, “the reference sample”, under the BRI, in light of the specification, is any subset or part of a sequence of input data items) (see, e.g., paragraphs 140, 172 and 190-191, “machine learning framework 604 can process input data received from the machine learning application 602 and generate the appropriate input to a compute framework 606.” [i.e., determine if appropriate to update reference sample in input data by generating/updating input data], “combining data include[s] parameter averaging and update based data … can be performed in a decentralized manner, where the updates are compressed and transferred between nodes.” [i.e., data is updated and transferred between network nodes], “For a layer of a neural network, the input data 1402, weight data 1404, and/or activation data 1406 is partitioned … Node 0 receives a first block of input data 1402A … Compute operations are performed at Node 0 to generate a first partial activation 1406A … Node 1 receives a second block of input data 1402B”, “a set of partial activations 1406A-1406B is generated by based on the application of a mathematical operation (e.g., convolution) to the input data 1402A-1402B” [i.e., determine whether to update the reference sample 1402A in input data 1402 based on applying mathematical operations to the input data]); and 
in response to determining to update of the reference sample, updating the reference sample to a current input sample, and updating the layer … parameters based on the updated reference sample (as indicated above, “to update of the reference sample” has been interpreted as to update the reference sample) (see, e.g., paragraphs 154, 188 and 190-191, “The network can then learn from those errors using an algorithm … to update the weights of the of the neural network.”, “input data 1402 is split along a mini-batch dimension and the same model is replicated across the nodes. The mini-batch is split across several compute nodes, with each node responsible for computing gradients with respect to all model parameters using a subset of the samples in the mini-batch” [i.e., updating model layer parameters and weights], “a set of partial activations 1406A-1406B is generated by based on the application of a mathematical operation (e.g., convolution) to the input data 1402A-1402B and weight data 1404A-1404B.” [i.e., updating based on the updated input data 1402A-B with updated reference sample/block]),
wherein the current input sample is an input sample proceeding the reference sample among the sequential input samples (as indicated above, “wherein the current input sample is an input sample proceeding the reference sample among the sequential input samples” has been interpreted as the current input sample being an input sample preceding or following the reference sample in the sequential input samples) (see, e.g., paragraph 190, “For a layer of a neural network, the input data 1402, weight data 1404, and/or activation data 1406 is partitioned … Node 0 receives a first block of input data 1402A … Compute operations are performed at Node 0 to generate a first partial activation 1406A … Node 1 receives a second block of input data 1402B … Node 2 can perform compute operations on third input data 1402C” [i.e., the current input sample/block precedes or follows the reference sample/block among the sequential input samples 1402A, B and C]).
Although Das substantially discloses the claimed invention, Das is not relied on for explicitly disclosing updating the layer contraction parameters.
In the same field, analogous art Rueckauer teaches updating the layer contraction parameters (as indicated above, “the layer contraction parameters”, under the BRI, in light of the specification, can include compressing, pruning, reducing, downscaling, combining, merging or omitting any weights, bias values, or a binary mask associated with a neural network layer) (see, e.g., pages 3-4, “For a network with L layers let Wl, l ∈ {1, . . . , L} denote the weight matrix connecting units in layer l − 1 to layer l, with biases bl .”, “weight normalization mechanism … can simply be extended to biases by linearly rescaling all weights and biases such that the ANN activation a … is smaller than 1 for all training examples. In order to preserve the information encoded within a layer, the parameters of a layer need to be scaled jointly.” [i.e., updating/normalizing the downscaling/contraction parameters]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das to incorporate the teachings of Rueckauer to provide techniques for implementing “Spiking neural networks (SNNs)” by converting “common operations such as max-pooling, softmax, batch-normalization and Inception-modules” into “spiking equivalents of these operations” so that “continuous-valued deep Convolutional Neural Networks (CNNs) can be converted into accurate spiking equivalents.” (See, e.g., Rueckauer, Abstract, page 1). Doing so would have allowed Das to use Rueckauer’s techniques to convert continuous-valued deep networks and CNNs into SNNs, thus “allowing conversion of nearly arbitrary CNN architectures” whereby “the SNNs can achieve more than 2x reductions in operations compared to the original CNNs”, as suggested by Rueckauer (See, e.g., Rueckauer, Abstract and page 1). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claim 6, as discussed above, Das in view of Rueckauer teaches the method of claim 5.
Das further discloses performing inference on the current input sample based on the updated layer … parameters (see, e.g., paragraphs 162 and 224, “input layer 1002 that receives an input vector …, and an output layer 1006 to output a result. … A second input (x2) [i.e., the current input sample] can be processed by the hidden layer 1004 using state information that is determined during the processing of the initial input (x1) [i.e., based on updated state information]. A given state can be computed as ff=f(fff+fffff), where f and f are parameter matrices”, “An inferencing pass through the neural network can generate output in which each pixel of an input image is annotated with a classification or label identifying the object associated with the pixel” [i.e., perform inference/compute output for the current input sample x2 based on the updated layer weights/parameters]).
Although Das substantially discloses the claimed invention, Das is not relied on for explicitly disclosing performing inference … based on the updated layer contraction parameters.
In the same field, analogous art Rueckauer teaches performing inference … based on the updated layer contraction parameters (as indicated above, the “layer contraction parameters”, under the BRI, in light of the specification, can include compressing, pruning, reducing, downscaling, combining, merging or omitting any weights, bias values, or a binary mask associated with a neural network layer) (see, e.g., pages 3-4, 6 and 10, “For a network with L layers let Wl, l ∈ {1, . . . , L} denote the weight matrix connecting units in layer l − 1 to layer l, with biases bl .”, “weight normalization mechanism … can simply be extended to biases by linearly rescaling all weights and biases such that the ANN activation a … is smaller than 1 for all training examples. In order to preserve the information encoded within a layer, the parameters of a layer need to be scaled jointly.” [i.e., updating the normalization and downscaling/contraction parameters], “infer the classification output from the softmax computed on the membrane potentials, without another spike generation mechanism. This simplification could speed up inference time and possibly improve the accuracy”, “the final error rate in a spiking network drops off rapidly during inference when an increasing number of operations is used to classify a sample. The network classification error rate can be tailored to the number of operations that are available during inference, allowing for accurate classification” [i.e., performing inference with the SNN/spiking neural network based on the updated layer compression/contraction parameters]).
The motivation to combine Das and Rueckauer is the same as discussed above with respect to claim 5.

Regarding claim 7, as discussed above, Das in view of Rueckauer teaches the method of claim 5.
Das further discloses in response to determining not to update of the reference sample, performing inference on the current input sample based on the layer … parameters determined with respect to the reference sample (as indicated above, “the reference sample”, under the BRI, in light of the specification, is any subset or part of a sequence of input data items, and “to update of the reference sample” has been interpreted as to update the reference sample) (see, e.g., paragraphs 162 and 224, “input layer 1002 that receives an input vector …, and an output layer 1006 to output a result. … A second input (x2) [i.e., one or more other sequential input samples] can be processed by the hidden layer 1004 using state information that is determined during the processing of the initial input (x1) [i.e., the reference sample]. A given state can be computed as ff=f(fff+fffff), where f and f are parameter matrices”, “An inferencing pass through the neural network can generate output in which each pixel of an input image is annotated with a classification or label identifying the object associated with the pixel” [i.e., perform inference/compute output based on the layer weights/parameters determined with respect to the initial input/reference sample]).
Although Das substantially discloses the claimed invention, Das is not relied on for explicitly disclosing performing inference on the current input sample based on the layer contraction parameters.
In the same field, analogous art Rueckauer teaches performing inference on the current input sample based on the layer contraction parameters (as indicated above, “the layer contraction parameters”, under the BRI, in light of the specification, can include compressing, pruning, reducing, downscaling, combining, merging or omitting any weights, bias values, or a binary mask associated with a neural network layer) (see, e.g., pages 6 and 10, “infer the classification output from the softmax computed on the membrane potentials, without another spike generation mechanism. This simplification could speed up inference time and possibly improve the accuracy”, “the final error rate in a spiking network drops off rapidly during inference when an increasing number of operations is used to classify a sample. The network classification error rate can be tailored to the number of operations that are available during inference, allowing for accurate classification” [i.e., performing inference with the SNN/spiking neural network based on the compression/contraction parameters]).
The motivation to combine Das and Rueckauer is the same as discussed above with respect to claim 5.

Regarding claims 8 and 21, as discussed above, Das in view of Rueckauer teaches the method of claim 5 and the apparatus of claim 20.
Das further discloses wherein the determining of whether to update the reference sample comprises determining to update the reference sample in response to performing inference on an n-number of the sequential input samples following the reference sample (aside from repeating the claim language in paragraphs 12 and 25, applicant’s specification states “inference has been performed on n-number of input samples, where n is a natural number” in paragraph 117. Therefore, “an n-number”, under the BRI, in light of the specification is any number n of input samples, where n is a natural number) (see, e.g., paragraphs 162, 190 and 224, “input layer 1002 that receives an input vector …, and an output layer 1006 to output a result. … A second input (x2) [i.e., the current input sample] can be processed by the hidden layer 1004 using state information that is determined during the processing of the initial input (x1). A given state can be computed as ff=f(fff+fffff), where f and f are parameter matrices”, “For a layer of a neural network, the input data 1402 … is partitioned … Node 0 receives a first block of input data 1402A … Compute operations are performed at Node 0 to generate a first partial activation 1406A … Node 1 receives a second block of input data 1402B … Node 2 can perform compute operations on third input data 1402C” [i.e., a n-number, 2, of input samples/blocks 1402B and C following the first reference block/sample 1402A], “An inferencing pass through the neural network can generate output in which each pixel of an input image is annotated with a classification or label identifying the object associated with the pixel” [i.e., perform inference/compute output for n=2 input samples based on the updated layer weights/parameters]).

Claims 2-4, 9-12, 17-19 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Das in view of Rueckauer as applied to claims 1 and 16 above, and further in view of non-patent literature Nguyen et al. (“Deep learning sparse ternary projections for compressed sensing of images.” 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2017: 1125-1129, hereinafter “Nguyen”).
Regarding claims 2 and 17, as discussed above, Das in view of Rueckauer teaches the method of claim 1 and the apparatus of claim 16.
Although Das substantially discloses the claimed invention, Das is not relied on for explicitly disclosing wherein the layer contraction parameters determined with respect to the current reference sample comprise a single weight matrix indicating weights, a bias vector indicating biases.
In the same field, analogous art Rueckauer teaches wherein the layer contraction parameters determined with respect to the current reference sample comprise a single weight matrix indicating weights, a bias vector indicating biases (as indicated above, “the layer contraction parameters”, under the BRI, in light of the specification, can include compressing, pruning, reducing, downscaling, combining, merging or omitting any weights, bias values, or a binary mask associated with a neural network layer) (see, e.g., pages 3-4 and 9, “For a network with L layers let Wl, l ∈ {1, . . . , L} denote the weight matrix connecting units in layer l − 1 to layer l, with biases bl .” [i.e., a single weight matrix Wl and a bias vector bl], “weight normalization mechanism … can simply be extended to biases by linearly rescaling all weights and biases such that the ANN activation a … is smaller than 1 for all training examples. In order to preserve the information encoded within a layer, the parameters of a layer need to be scaled jointly.” [i.e., layer normalization and downscaling/contraction parameters comprise weights and biases], “We expect that the transient of the network could be reduced by training the network with constraints on the biases or the β parameter of the batch-normalization layers.”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das to incorporate the teachings of Rueckauer to provide techniques for implementing “Spiking neural networks (SNNs)” by converting “common operations such as max-pooling, softmax, batch-normalization and Inception-modules” into “spiking equivalents of these operations” so that “continuous-valued deep Convolutional Neural Networks (CNNs) can be converted into accurate spiking equivalents.” (See, e.g., Rueckauer, Abstract, page 1). Doing so would have allowed Das to use Rueckauer’s techniques to convert continuous-valued deep networks and CNNs into SNNs, thus “allowing conversion of nearly arbitrary CNN architectures” whereby “the SNNs can achieve more than 2x reductions in operations compared to the original CNNs”, as suggested by Rueckauer (See, e.g., Rueckauer, Abstract and page 1). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.
Although Das in view of Rueckauer substantially teaches the claimed invention, Das in view of Rueckauer is not relied on to teach wherein the layer contraction parameters … comprise … a binary mask.
In the same field, analogous art Nguyen teaches wherein the layer contraction parameters … comprise … a binary mask (see, e.g., page 1126, “neural networks are trained with binary weights … Another direction in simplifying deep neural networks is to compress pre-trained networks. … connection pruning, weight quantization and Huffman coding are employed to compress deep neural networks … we construct a sparse binary mask M ∈ {0,1}nxm with entries equal to 1 corresponding to the largest weights in Θ. The sparse sensing weights are updated according to Θs = M ʘ Θ, where ʘ represents the Hadamard product. The binarization step involves a mapping of the sparse continuous valued weights to sparse binary weights Θs ∈ {-1,0,+1}nxm.” [i.e., layer contraction/compression/pruning parameters comprise a binary mask]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das in view of Rueckauer to incorporate the teachings of Nguyen to provide “a deep learning approach to obtain very sparse ternary projections for compressed sensing” and a “deep learning architecture [that] jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion.” (See, e.g., Nguyen, Abstract, page 1125). Doing so would have allowed Das in view of Rueckauer to use Nguyen’s deep learning approach and architecture for “compressed sensing”, which “allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix” where “results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity”, as suggested by Nguyen (See, e.g., Nguyen, Abstract, page 1125). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claims 3 and 18, as discussed above, Das in view of Rueckauer and Nguyen teaches the method of claim 2 and the apparatus of claim 17.
Although Das in view of Rueckauer substantially teaches the claimed invention, Das in view of Rueckauer is not relied on to teach wherein the binary mask is a vector determined to perform activation masking by replacing an operation of an activation function performed in each of the hidden layers.
In the same field, analogous art Nguyen teaches wherein the binary mask is a vector determined to perform activation masking by replacing an operation of an activation function performed in each of the hidden layers (see, e.g., page 1126, “full binary neural networks (BNNs), with binary weights and binary hidden unit activations. … The scaling layer is followed by L hidden layers. These hidden layers employ the Rectified Linear Unit (ReLU) activation function … Implementation-wise, we construct a sparse binary mask M ∈ {0,1}nxm with entries equal to 1 corresponding to the largest weights in Θ. The sparse sensing weights are updated according to Θs = M ʘ Θ, where ʘ represents the Hadamard product. The binarization step involves a mapping of the sparse continuous valued weights to sparse binary weights Θs ∈ {-1,0,+1}nxm.” [i.e., binary mask is a vector M with entries to perform hidden unit weight binarization replacing the activation function in each hidden layer]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das in view of Rueckauer to incorporate the teachings of Nguyen to provide “a deep learning approach to obtain very sparse ternary projections for compressed sensing” and a “deep learning architecture [that] jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion.” (See, e.g., Nguyen, Abstract, page 1125). Doing so would have allowed Das in view of Rueckauer to use Nguyen’s deep learning approach and architecture for “compressed sensing”, which “allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix” where “results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity”, as suggested by Nguyen (See, e.g., Nguyen, Abstract, page 1125). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claims 4 and 19, as discussed above, Das in view of Rueckauer and Nguyen teaches the method of claim 2 and the apparatus of claim 17.
Das further discloses wherein the affine transformation is a transformation of a multiply … operation and an operation of an activation function in the hidden layers (see, e.g., paragraphs 152, 155 and 158-159, “neural networks used in deep learning are artificial neural networks composed of multiple hidden layers”, “activations within the fully connected layers 908 can be computed using matrix multiplication” [i.e., a multiply operation], “convolution stage 916 performs several convolutions in parallel to produce a set of linear activations. The convolution stage 916 can include an affine transformation, which is any transformation that can be specified as a linear transformation plus a translation. Affine transformations include rotations, translations, scaling, and combinations of these transformations. The convolution stage computes the output of functions (e.g., neurons) … The neurons compute a dot product between the weights of the neurons and the region in the local input to which the neurons are connected. The output from the convolution stage 916 defines a set of linear activations”, “each linear activation is processed by a non-linear activation function” [i.e., the affine transformation is a transformation of a dot product/multiplication operation and of an activation in the hidden layers]). 
Although Das substantially discloses the claimed invention, Das is not relied on for explicitly disclosing wherein the affine transformation is a transformation of a multiply-accumulate (MAC) operation and an operation of an activation function in the hidden layers.
In the same field, analogous art Rueckauer teaches wherein the affine transformation is a transformation of a multiply-accumulate (MAC) operation and an operation of an activation function in the hidden layers (see, e.g., pages 5-7 and 9, “Batch-normalization … BN introduces additional layers where affine transformations of inputs are performed”, “each fan-in operation consist of a multiplication and addition”, “additions required in SNNs are cheaper than multiply accumulates needed in ANNs … the cost of performing a 32-bit floating-point addition is about 14 X lower than that of a MAC operation”, “By virtue of the quantized activations, these two SNNs are able to approximate the ANN activations” [i.e., the affine transformation for the SNN is a transformation of a multiply-accumulate/multiply-add operation of an ANN and an approximation of operation of an activation function in the hidden layers]).
The motivation to combine Das and Rueckauer is the same as discussed above with respect to claims 2 and 17.
Although Das in view of Rueckauer substantially teaches the claimed invention, Das in view of Rueckauer is not relied on to teach a transformation of … an operation of an activation function in the hidden layers, based on a form of a Hadamard product using the layer contraction parameters.
In the same field, analogous art Nguyen teaches a transformation of … an operation of an activation function in the hidden layers, based on a form of a Hadamard product using the layer contraction parameters (see, e.g., page 1126, “binary hidden unit activations. … The scaling layer is followed by L hidden layers. These hidden layers employ the Rectified Linear Unit (ReLU) activation function … we construct a sparse binary mask M ∈ {0,1}nxm with entries equal to 1 corresponding to the largest weights in Θ. The sparse sensing weights are updated according to Θs = M ʘ Θ, where ʘ represents the Hadamard product. The binarization step involves a mapping of the sparse continuous valued weights to sparse binary weights Θs ∈ {-1,0,+1}nxm.” [i.e., transformation of an operation of the activation function in the hidden layers L based on a Hadamard product using the layer contraction parameters/sparse binary weights]).
The motivation to combine Das in view of Rueckauer with Nguyen is the same as discussed above with respect to claims 2 and 17.

Regarding claims 9 and 22, as discussed above, Das in view of Rueckauer teaches the method of claim 5 and the apparatus of claim 20.
Das further discloses wherein the determining of whether to update the reference sample comprises comparing a error … value between the current input sample and the reference sample with a threshold value (see, e.g., paragraphs 154 and 159, “An input vector is presented to the network for processing. The output of the network is compared to the desired output using a loss function and an error value is calculated for each of the neurons in the output layer. The error values are then propagated backwards until each neuron has an associated error value which roughly represents its contribution to the original output. The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network”, “which uses an activation function defined as f(f)=max(0, f), such that the activation is thresholded at zero.” [i.e., the determining whether to update the reference sample in the input vector includes comparing an error value between the current and reference input samples with a threshold]).
Although Das in view of Rueckauer substantially teaches the claimed invention, Das in view of Rueckauer is not relied on to teach a mean-square error (MSE) value between the current input sample and the reference sample.
In the same field, analogous art Nguyen teaches a mean-square error (MSE) value between the current input sample and the reference sample (see, e.g., pages 1126-1127, “our network training follows the standard mini-batch gradient descent method. Denote xi; ^xi the input and reconstructed patches, respectively, with xi; ^xi ϵ Rn, n = S2. We employ the mean squared error between the input and the reconstruction as our loss function: 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
”, We approximate Θ(j) with αj, Θsb(j), where αj ϵ R+ is a scale factor, corresponding to the jth entry of the scaling weights α. The values of Θsb(j) and αj can be determined by minimizing the following mean square error with respect to Θsb(j), αj: 
    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
” [i.e., determining whether to update/scale the reference sample xi in the mini-batch of input data comprises comparing the mean square error value between the current sample ^xi and the reference sample xi]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das in view of Rueckauer to incorporate the teachings of Nguyen to provide “a deep learning approach to obtain very sparse ternary projections for compressed sensing” and a “deep learning architecture [that] jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion.” (See, e.g., Nguyen, Abstract, page 1125). Doing so would have allowed Das in view of Rueckauer to use Nguyen’s deep learning approach and architecture for “compressed sensing”, which “allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix” where “results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity”, as suggested by Nguyen (See, e.g., Nguyen, Abstract, page 1125). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claim 10, as discussed above, Das in view of Rueckauer and Nguyen teaches the method of claim 9.
Das further discloses wherein the determining of whether to update the reference sample comprises determining to update the refence sample to be the current input sample, in response to the [error] value being greater than or equal to a predetermined threshold value (see, e.g., paragraphs 154 and 159, “An input vector is presented to the network for processing. The output of the network is compared to the desired output using a loss function and an error value is calculated for each of the neurons in the output layer. The error values are then propagated backwards until each neuron has an associated error value which roughly represents its contribution to the original output. The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network”, “which uses an activation function defined as f(f)=max(0, f), such that the activation is thresholded at zero.” [i.e., the determining whether to update the reference sample in the input vector includes determining to update the reference sample to the current input sample in the input vector responsive to the error value being greater than or equal to a predetermined threshold]).
Although Das in view of Rueckauer substantially teaches the claimed invention, Das in view of Rueckauer is not relied on to teach the MSE value.
In the same field, analogous art Nguyen teaches the MSE value (see, e.g., pages 1126-1127, “our network training follows the standard mini-batch gradient descent method. Denote xi; ^xi the input and reconstructed patches, respectively, with xi; ^xi ϵ Rn, n = S2. We employ the mean squared error between the input and the reconstruction as our loss function: 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
”, We approximate Θ(j) with αj, Θsb(j), where αj ϵ R+ is a scale factor, corresponding to the jth entry of the scaling weights α. The values of Θsb(j) and αj can be determined by minimizing the following mean square error with respect to Θsb(j), αj: 
    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
” [i.e., the mean square error (MSE) value]).
The motivation to combine Das in view of Rueckauer with Nguyen is the same as discussed above with respect to claim 9.
 
Regarding claims 11 and 23, as discussed above, Das in view of Rueckauer teaches the method of claim 5 and the apparatus of claim 20.
Das further discloses wherein the determining of whether to update the reference sample comprises comparing a … error … value between an inference result of an input sample preceding the current input sample and an inference result of the reference sample with a threshold value (see, e.g., paragraphs 154 and 159, “An input vector is presented to the network for processing. The output of the network is compared to the desired output using a loss function and an error value is calculated for each of the neurons in the output layer. The error values are then propagated backwards until each neuron has an associated error value which roughly represents its contribution to the original output. The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network”, “which uses an activation function defined as f(f)=max(0, f), such that the activation is thresholded at zero.” [i.e., the determining whether to update the reference sample in the input vector includes comparing an error value between an inference result/original output of the network for an input sample preceding the current input sample in the input vector and an inference result/output result of the network of the reference sample with a threshold]).
 Although Das in view of Rueckauer substantially teaches the claimed invention, Das in view of Rueckauer is not relied on to teach comparing a mean-square error (MSE) value between an inference result of an input sample preceding the current input sample and an inference result of the reference sample.
In the same field, analogous art Nguyen teaches comparing a mean-square error (MSE) value between an inference result of an input sample preceding the current input sample and an inference result of the reference sample (see, e.g., pages 1126-1127, “The output layer is a linear fully connected layer with size equal to the input dimension [i.e., outputs of output layer are inferences corresponding to sequence of input values, including current input sample and reference sample] … our network training follows the standard mini-batch gradient descent method. Denote xi; ^xi the input and reconstructed patches, respectively, with xi; ^xi ϵ Rn, n = S2. We employ the mean squared error between the input and the reconstruction as our loss function: 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
”, We approximate Θ(j) with αj, Θsb(j), where αj ϵ R+ is a scale factor, corresponding to the jth entry of the scaling weights α. The values of Θsb(j) and αj can be determined by minimizing the following mean square error with respect to Θsb(j), αj: 
    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
” [i.e., determining whether to update/scale the reference sample xi in the mini-batch of input data comprises comparing the mean square error value between inference results/outputs for the current sample ^xi and the reference sample xi]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das in view of Rueckauer to incorporate the teachings of Nguyen to provide “a deep learning approach to obtain very sparse ternary projections for compressed sensing” and a “deep learning architecture [that] jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion.” (See, e.g., Nguyen, Abstract, page 1125). Doing so would have allowed Das in view of Rueckauer to use Nguyen’s deep learning approach and architecture for “compressed sensing”, which “allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix” where “results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity”, as suggested by Nguyen (See, e.g., Nguyen, Abstract, page 1125). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Regarding claims 12 and 24, as discussed above, Das in view of Rueckauer teaches the method of claim 5 and the apparatus of claim 20.
Although Das in view of Rueckauer substantially teaches the claimed invention, Das in view of Rueckauer is not relied on to teach wherein the determining of whether to update the reference sample is based on whether signs of intermediate activations of each layer of the neural network are changed by a determined frequency by a binary mask determined for each layer of the neural network.
In the same field, analogous art Rueckauer teaches wherein the determining of whether to update the reference sample is based on whether signs of intermediate activations of each layer of the neural network are changed by a determined frequency by a binary mask determined for each layer of the neural network (see, e.g., page 1126, “neural networks are trained with binary weights {-1,+1} The study in [26] extends [25] to full binary neural networks (BNNs), with binary weights and binary hidden unit activations [i.e., binary values of -1 or +1 change signs of intermediate, hidden unit activations of each layer in the neural network] … we construct a sparse binary mask M ∈ {0,1}nxm with entries equal to 1 corresponding to the largest weights in Θ. The sparse sensing weights are updated according to Θs = M ʘ Θ, where ʘ represents the Hadamard product. The binarization step involves a mapping of the sparse continuous valued weights to sparse binary weights Θs ∈ {-1,0,+1}nxm.” [i.e., signs of activations are changed by a determined frequency by a binary mask and binary weights, -1, +1 for each layer in the neural network]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Das in view of Rueckauer to incorporate the teachings of Nguyen to provide “a deep learning approach to obtain very sparse ternary projections for compressed sensing” and a “deep learning architecture [that] jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion.” (See, e.g., Nguyen, Abstract, page 1125). Doing so would have allowed Das in view of Rueckauer to use Nguyen’s deep learning approach and architecture for “compressed sensing”, which “allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix” where “results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity”, as suggested by Nguyen (See, e.g., Nguyen, Abstract, page 1125). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.


Conclusion

The prior art made of record, listed on form PTO-892, and not relied upon, is considered pertinent to applicant's disclosure. 
For example, Molchanov et al. (U.S. Patent Application Pub. No. 2018/0114114 A1, hereinafter “Molchanov”) discloses the invention as claimed including “a method for neural network pruning … the method 100 may also be performed by a program, custom circuitry, or by a combination of custom circuitry and a program. For example, the method 100 may be executed by a GPU (graphics processing unit), CPU (central processing unit), neural network, or any processor capable of implementing a neural network” [i.e., a processor-implemented method for pruning/contracting a neural network] (see, e.g., Molchanov, paragraph 17).
The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/R.K.B./Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125