DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on December 3, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to because in Figure 5, the “no” and the “yes” arrows emanating from reference character 509 appear to be reversed (i.e., the “no” should be a “yes” and vice versa).  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities: 
In paragraph 9, “generating … generate” should be merely “generating”.
In paragraph 54, “hybolic” should be “hyperbolic”.
In paragraph 80, “cause CPU” should be “cause the CPU”.
Paragraph 85 abruptly ends without the last sentence being completed.
In paragraph 90, “extant technique” should be “extant techniques”; “assign to the PEs” should be “assign them to the PEs”.
In paragraph 112, “compares normalized” should be “compares a normalized”.
In paragraph 118, “example only, with a true scope” should be “examples only, with the true scope”.
Appropriate correction is required.
The abstract of the disclosure is objected to because it begins with the implied phrase “The present disclosure relates to”.  Correction is required.  See MPEP § 608.01(b).

Claim Objections
Claim 3 is objected to because of the following informalities:  “any one of claims 1” should be “claim 1”.  Appropriate correction is required.
Claim 12 is objected to because of the following informalities:  “plurality of processing element” should be “plurality of processing elements”.  Claim 13 is objected to for dependency on claim 12.  Appropriate correction is required.
Claim 16 is objected to because of the following informalities:  “different … than” should be “different … from”.  Claim 17 is objected to for dependency on claim 16.  Appropriate correction is required.
Claims 19-20 are objected to because of the following informalities:  the word “generate” should be deleted from the generating limitation.  Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “approximately” in claim 13 is a relative term which renders the claim indefinite. The term “approximately” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  As an initial matter, note that “approximately” is similar to “about” and “substantially,” both of which the courts have held to be indefinite when not accompanied by an explanation in the specification of the requisite degree or a general understanding by ordinary artisans of what the metes and bounds of the term are.  MPEP § 2173.05(b)(III).  Here, “approximately” is not defined by the specification, and the only paragraph in the specification in which it appears, paragraph 89, merely repeats the claim language.  Examiner can find no evidence that an ordinary artisan would understand how many calculations are “approximately the same number” are or that there is a general understanding of the cutoff discrepancy between the number of calculations.  As such, the claim is indefinite.  For purposes of examination, the same number of calculations or up to a 10% difference in the number of calculations will be deemed to read on the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-7, 10-12, 14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sze et al., “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” in 105(12) Proc. IEEE Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Sys. 274-81 (2001) (“Achlioptas”).
Regarding claim 1, Sze discloses “[a] system for dynamic sparse execution of a neural network, comprising:
at least one global buffer configured to receive inputs for the neural network (Sze Fig. 25 shows a global buffer connected to processing elements; p. 2311, first two paragraphs on right-hand column discloses that the filter weights and input activations are read from the global buffer, processed by MAC units, and the resulting partial sums [inputs] are put back into the global buffer; see also Fig. 31); 
a plurality of processing elements configured to execute activation functions for nodes of the neural network (Sze p. 2311, third paragraph indicates that in one type of neural network accelerator, each processing element handles the processing for each output activation value by fetching the corresponding input activations from neighboring PEs; see also Figs. 25 (showing processing elements connected to the global buffer), 11 (showing various activation functions used in CNNs), 31 (showing that the result of calculations by the PEs is sent to a ReLU (activation function) unit to generate an output feature map (output))); and 
at least one processor (Sze Fig. 31 contains an RLC decoder to decode the input feature map and an RLC encoder to encode the output feature map, which collectively comprise a processor) configured to: 
… reduce at least one dimension of the inputs from the at least one global buffer and generate a corresponding predictable output neuron map1 for use by the plurality of processing elements (in a CNN, a variety of computations that reduce the dimensionality of a feature map are referred to as pooling; a stride of greater than one is typically used so that there is a reduction in the dimension of the representation [feature map] – Sze, paragraph spanning pp. 2302-03; see also Fig. 10 (showing that the output of the pooling layer goes either to another CONV layer or to a fully connected layer – i.e., to another processing element), Figs. 22, 31 (showing that the global buffer exchanges data with the PEs), p. 2312, last paragraph (disclosing that the input fmap decoder unit is a compression unit)), and 
receive outputs from the plurality of processing elements (Sze Fig. 31 shows an RLC encoder that receives output from a ReLU unit, which receives outputs from the PEs via the global buffer), reduce at least one dimension of the outputs (Sze p. 2312, last paragraph, discloses that the fmap units are compression [dimensionality reduction] units; see also paragraph spanning pp. 2302-03 (disclosing that the pooling layer of the CNN reduces the dimension of the feature maps), Fig. 31 (showing the RLC encoder that compresses the output feature map)), and update the corresponding predictable output neuron map for use by the plurality of processing elements based on the reduced outputs (Sze p. 2312, last paragraph and Fig. 31 disclose that the chip that contains the RLC decoder and encoder communicates with an off-chip DRAM using a 64-b bidirectional data bus [i.e., data, including the output feature map, may flow from the RLC encoder to the DRAM and back to the RLC decoder for further decoding/updating of the feature map]; see also p. 2302, first full paragraph (disclosing that the output feature map is calculated by passing a stack of filters over an input feature map [thereby updating the feature map])).”
Sze appears not to disclose explicitly the remaining limitations of the claim.  However, Achlioptas discloses “execut[ing] ternary random projection to reduce at least one dimension of the inputs (given a high-dimensional pointset, the pointset could be embedded into a lower dimensional space without suffering great distortion – Achlioptas, sec. 1, first two paragraphs; one can replace projections onto random hyperplanes with simpler and faster operations, requiring extremely simple probability distributions such as sqrt(3) with probability 1/6, 0 with probability 2/3, and –sqrt(3) with probability 1/6 [ternary random projection] – id. at sec. 1.1, first three paragraphs and Theorem 2)….”
Achlioptas and the instant application both relate to dimensionality reduction of datasets and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sze with to use ternary random projection to reduce the inputs’ dimensionality, as disclosed by Achlioptas, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would render the operation of dimensionality reduction simpler and faster relative to projection onto random hyperplanes without any sacrifice in the quality of the embedding.  See Achlioptas, sec. 1.1, third paragraph.
  
Regarding claim 2, Sze, as modified by Achlioptas, discloses that “the at least one processor iteratively receives current outputs from the plurality of processing elements (Sze Fig. 31 shows that the RLC encoder receives outputs from the ReLU unit, which in turn receives output from the global buffer that received partial sums from the processing elements), reduces at least one dimension of the current outputs (Sze p. 2312, last full paragraph and Fig. 31 show that the RLC encoder compresses the output feature map; see also paragraph spanning pp. 2302-03 (disclosing that the pooling layer of a convolutional network reduces the dimensionality of the feature map)), and updates the corresponding predictable output neuron map, based on the reduced current outputs (Sze p. 2312, last paragraph and Fig. 31 disclose that the chip that contains the RLC decoder and encoder communicates with an off-chip DRAM using a 64-b bidirectional data bus [i.e., data, including the output feature map, may flow from the RLC encoder to the DRAM and back to the RLC decoder for further decoding/updating of the feature map]; see also p. 2302, first full paragraph (disclosing that the output feature map is calculated by passing a stack of filters over an input feature map [thereby updating the feature map])), for use by the plurality of processing elements in generating next outputs until the plurality of processing elements have executed each layer of the neural network (Sze Fig. 31 shows that control flow may proceed from the processing elements to the global buffer to the ReLU unit to the RLC encoder and to the DRAM, then from the DRAM to the RLC encoder, back to the global buffer, and back to the processing elements [i.e., the reduced output feature map produced by the RLC encoder and the PEs of the last layer are fed back into the PEs for calculation of a next layer]; see also paragraph spanning pp. 2312-13 (disclosing that the fixed-size PE array may accommodate different layer shapes – i.e., the PE array is iteratively used to process each layer until all layers are processed)).”  

Regarding claim 3, Sze, as modified by Achlioptas, discloses that “each processing element comprises a control logic and a multiply-accumulate accelerator (fundamental component of both the CONV and the fully connected layers of a CNN are multiply-and-accumulate operations – Sze, p. 2307, first full paragraph on right-hand column; cost of chip depends on area efficiency, which accounts for the amount of control logic – id. at p. 2324, third bullet point; filter weights and input activations may be processed by MAC units/, and the resulting sums or output activations are put back into the global buffer (implying the existence of control logic in the PE to perform these calculations) – id. at p. 2311, second full paragraph on right-hand column; see also Figs. 25 (showing that each PE performs a multiply-and-accumulate operation), 31 (showing that each PE has a MAC operation and a control)).”

Regarding claim 4, Sze, as modified by Achlioptas, discloses that “the at least one processor comprises a plurality of adder trees (one example of an accelerator reads input activations and filter weights from a buffer and processes them through MAC units with custom adder trees – Sze, p. 2311, second full paragraph on right-hand column).”  

Regarding claim 5, Sze, as modified by Achlioptas, discloses that “the global buffer is further configured to transmit the predictable output neuron map from the at least one processor to the plurality of processing elements and to transmit the outputs from the plurality of processing elements to the at least one in one example of a neural network accelerator, an output of an input feature map compression unit is fed into the global buffer, which is then sent to the PE array [processing elements]; the global buffer then sends the output to a ReLU unit, which is then sent to an output feature map compression unit [the compression units and ReLU collectively comprise a processor] – Sze, last full paragraph on p. 2312 and Fig. 31).”  

Regarding claim 6, Sze, as modified by Achlioptas, discloses that “the plurality of processing elements are organized in an array along a first dimension and a second dimension (Sze Fig. 31 shows that at least one neural network accelerator has a processing element array arranged in two dimensions).”  

Regarding claim 7, Sze, as modified by Achlioptas, discloses that “the plurality of processing elements share a first bus along the first dimension and communicate with the global buffer using a second bus along the second dimension (one neural network accelerator chip communicates with off-chip DRAM using a 64-b bidirectional data bus to fetch data into the global buffer – Sze, p. 2312, last full paragraph; Fig. 31 shows that each PE in a row communicates with other PEs in the same row via another set of horizontal buses).”  

Regarding claim 10, Sze, as modified by Achlioptas, discloses that “the plurality of processing elements and the at least one processor are configured to execute instructions in parallel (multiply-and-accumulate operations can be easily parallelized; highly-parallel compute paradigms are commonly used, including both spatial and temporal architectures – Sze, p. 2307, first full paragraph on right-hand column).”  

Regarding claim 11, Sze, as modified by Achlioptas, discloses that “the at least one processor reduces at least one dimension of the outputs and updates the corresponding predictable output neuron map35Attorney Docket No.: 12852.0316-00000Alibaba Ref No.: A23102U S 2 with execution of one or more of the activation functions by the plurality of processing elements (Sze Fig. 31 shows that the ReLU unit [here considered one of the processing elements] applies a ReLU function to the partial sums and passes the result to the RLC encoder [part of the processor] which compresses the output, thereby updating the output feature map; since the RLC encoder performs the compression directly on the results of the ReLU operation, the two operations occur concurrently).”  

Regarding claim 12, Sze, as modified by Achlioptas, discloses that “the at least one processor re-assigns the nodes to the plurality of processing element[s] whenever the predictable output neuron map is updated (each CONV layer in a CNN is composed of high-dimensional convolutions; the input activations of a layer are structured as a set of input feature maps that are convolved with a 2_d filter; the result of this computation is output activations that comprise one channel of an output feature map [i.e., once the input feature map is updated to become the output feature map, the active nodes/neurons become those of the next layer] – Sze, p. 2302, first full paragraph; a fixed-size PE array can be used to accommodate different layer shapes [i.e., the same PEs are used to calculate each layer] – id. at paragraph spanning pp. 2312-13).”  

Regarding claim 14, Sze, as modified by Achlioptas, discloses “a quantizer configured to truncate the inputs before reducing at least one dimension of the inputs (Sze Fig. 39 shows that each MAC contains a quantizer after the multiply-and-accumulate operation [since this operation occurs in each PE, where computation for all neural network layers takes place, quantization in the PEs performing the operations of the CONV layer may occur before the processing by the PEs that perform the operations of the max pooling layer]; see also p. 2317, first paragraph (disclosing that the quantization may be fixed or variable)).”  

Regarding claim 16, Sze, as modified by Achlioptas, discloses that “the global buffer receives the inputs from a memory that is on a different chip than the global buffer (Sze Fig. 31 and p. 2312, last full paragraph disclose that in at least one neural network accelerator, an off-chip DRAM [memory] communicates with the chip using a 64-b bidirectional data bus to fetch data into the global buffer).”  

Regarding claim 17, Sze, as modified by Achlioptas, discloses that “the global buffer is further configured to transmit final outputs to the memory (Sze Fig. 31 and p. 2312, last full paragraph disclose that in at least one neural network accelerator, an off-chip DRAM [memory] communicates with the chip using a 64-b bidirectional data bus to fetch data into the global buffer; Fig. 31 also shows that the global buffer sends the output to a ReLU unit, whose output is sent to an RLC encoder, which is then sent to the off-chip DRAM).”

Regarding claim 18, Sze, as modified by Achlioptas, discloses that “the plurality of processing elements further comprise local buffers for storing inputs and outputs (Sze p. 2310, last paragraph discloses that the weights [inputs] may be stored in a register file (RF) [local buffer] in the PE; p. 2311, second full paragraph discloses that the accumulation of partial sums for the same output activation value local in the RF).”

Regarding claim 20, Sze discloses “[a] non-transitory computer-readable storage medium storing a set of instructions that is executable by a computing device to cause the computing device to perform a method for dynamic sparse execution of a neural network (Sze Fig. 31 and p. 2312, last full paragraph disclose a neural network accelerator consisting of a processing element (PE) array, a global buffer, and ReLU and feature map compression units connected to an off-chip DRAM [non-transitory computer-readable medium]), the method comprising: 
providing, via a buffer, inputs for a neural network to at least one processor (Sze Fig. 31 shows that a global buffer provides a filter, an input feature map, and partial sums [inputs] to the PEs [the PEs, RLC decoder, RLC encoder, and ReLU unit collectively comprise a processor]); …
generating, via the at least one processor, … a corresponding predictable output neuron map (in a CNN, a variety of computations that reduce the dimensionality of a feature map are referred to as pooling; a stride of greater than one is typically used so that there is a reduction in the dimension of the representation [output map] – Sze, paragraph spanning pp. 2302-03; see also p. 2312, last paragraph (disclosing that the input fmap decoder unit is a compression unit)); 
executing, via a plurality of processing elements, one or more first activation functions of the neural network using the reduced inputs to generate first outputs (Sze p. 2311, third paragraph indicates that each processing element may handle the processing for each output activation value by fetching the corresponding input activations from neighboring PEs; see also Figs. 25 (showing processing elements connected to the global buffer), 11 (showing various activation functions used in CNNs), 31 (showing that the result of calculations by the PEs is sent to a ReLU (activation function) unit to generate an output feature map (output))); 
providing, via the buffer, the first outputs to the at least one processor (Sze Fig. 31 shows that the partial sum outputs of the PE array are sent to the global buffer, which then provides those outputs to the ReLU unit and the RLC encoder [part of the processor]); 
reducing, via the at least one processor, at least one dimension of the first outputs (Sze p. 2312, last paragraph and Fig. 31 show that the outputs, after passing through the ReLU unit, pass through an RLC encoder that compresses [reduces a dimension of] the output feature map; see also paragraph spanning pp. 2302-03 (disclosing that pooling is used to reduce the dimensionality of the feature map)); 
Sze p. 2312, last paragraph and Fig. 31 disclose that the chip that contains the RLC decoder and encoder communicates with an off-chip DRAM using a 64-b bidirectional data bus [i.e., data, including the output feature map, may flow from the RLC encoder to the DRAM and back to the RLC decoder for further decoding/updating of the feature map]; see also p. 2302, first full paragraph (disclosing that the output feature map is calculated by passing a stack of filters over an input feature map [thereby updating the feature map])); and  6Attorney Docket No. 12852.0316-00000 Preliminary Amendment 
executing, via the plurality of processing elements, one or more second activation functions of the neural network using the reduced first outputs to generate second outputs (Sze p. 2311, third paragraph indicates that each processing element may handle the processing for each output activation value by fetching the corresponding input activations from neighboring PEs [i.e., execute the activation function]; see also Figs. 25 (showing processing elements connected to the global buffer), 11 (showing various activation functions used in CNNs), 31 (showing that the (reduced) input feature map is passed from the RLC decoder to the global buffer and from the global buffer to the PE array and that the outputs are passed through a ReLU (activation function) layer to produce second outputs)).”
Sze appears not to disclose explicitly the further limitations of the claim.  However, Achlioptas discloses “executing, via the at least one processor, ternary random projection to reduce at least one dimension of the inputs (given a high-dimensional pointset, the pointset could be embedded into a lower dimensional space without suffering great distortion – Achlioptas, sec. 1, first two paragraphs; one can replace projections onto random hyperplanes with simpler and faster operations, requiring extremely simple probability distributions such as sqrt(3) with probability 1/6, 0 with probability 2/3, and –sqrt(3) with probability 1/6 [ternary random projection] – id. at sec. 1.1, first three paragraphs and Theorem 2)….”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sze with to use ternary random projection to reduce the inputs’ dimensionality, as disclosed by Achlioptas, and an ordinary artisan could reasonably expect to have done See Achlioptas, sec. 1.1, third paragraph.

Claim 19 is a method claim corresponding to non-transitory computer-readable medium claim 20 and is rejected for the same reasons as given in the rejection of that claim.

Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Sze in view of Achlioptas and further in view of Lee et al. (US 20200285944) (“Lee”).
Regarding claim 8, Sze, as modified by Achlioptas, discloses “reduced inputs (Sze Fig. 31 and p. 2312, last full paragraph disclose an RLC decoder and RLC encoder that function as compression units; paragraph spanning pp. 2302-03 discloses that the pooling layer of a CNN reduces the dimensionality of the feature map [input to next layer]).”  
Neither Sze nor Achlioptas appears to disclose explicitly the further limitations of the claim.  However, Lee discloses that “the at least one processor further comprises a systolic array configured to reduce at least one dimension of a set of weights for the neural network based on the … inputs (neural network may be implemented by processing element arrays (PE arrays) [systolic arrays] – Lee, paragraph 41; the operation of each graph convolutional layer is a propagation function of a feature matrix for the neural network’s previous layer [i.e., input into subsequent layer]; the weight matrix in the propagation function may be reduced in dimensionality to correspond to the number of features at the next layer [i.e., the dimensionality of the layer into which the matrix is being input determines how the matrix is reduced] – id. at paragraph 63).”  
Lee and the instant application both relate to dimensionality reduction in neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Sze and Achlioptas to reduce the dimensionality of the weights based on the inputs, as disclosed by Lee, and an ordinary artisan could reasonably expect to See Lee, paragraph 63.

Regarding claim 9, Sze, as modified by Achlioptas, discloses “reduced outputs (Sze Fig. 31 and p. 2312, last full paragraph disclose an RLC decoder and RLC encoder that function as compression units; paragraph spanning pp. 2302-03 discloses that the pooling layer of a CNN reduces the dimensionality of the feature map [output of present layer]).”
Neither Sze nor Achlioptas appears to disclose explicitly the further limitations of the claim.  However, Lee discloses that “the systolic array is further configured to reduce at least one dimension of the set of weights for the neural network based on the … outputs (neural network may be implemented by processing element arrays (PE arrays) [systolic arrays] – Lee, paragraph 41; the operation of each graph convolutional layer is a propagation function of a feature matrix for the neural network’s previous layer [i.e., output of previous layer]; the weight matrix in the propagation function may be reduced in dimensionality to correspond to the number of features at the next layer [i.e., the initial dimensionality of the output determines how the matrix is reduced] – id. at paragraph 63).” 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Sze and Achlioptas with to reduce the dimensionality of the network weights based on an output, as disclosed by Lee, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce the number of operations that the system would need to perform.  See Lee, paragraph 63. 

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Sze in view of Achlioptas and further in view of Proshin et al. (US 10783433) (“Proshin”).
Regarding claim 13, Sze, as modified by Achlioptas, discloses that “re-assigning comprises grouping the activation functions based on the predictable output neuron map (in a CNN, a nonlinear activation function is typically applied after each CONV or FC layer [i.e., the activation functions are grouped by layer] – Sze, p. 2302, last full paragraph; see also Fig. 10 (showing that each CONV layer contains convolution, non-linearity, normalization, and pooling operations, so that the activation function of the next layer is based on the feature map produced by pooling in the previous layer))….” 
Proshin discloses “grouping the activation functions … such that each group has approximately the same number of calculations (method for training and self-organization of a neural network includes dividing a set of adjustable parameters p, which includes activation function parameters, into several groups of n parameters for each factor that have to be calculated simultaneously in m points [since each group has n parameters, the same number of calculations is involved for each activation function parameter group] – Proshin, col. 8, ll. 47-53).”
Proshin and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Sze and Achlioptas to group the activation functions so that the same number of calculations is performed in each group, as disclosed by Proshin, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow for multiple activation functions to be computed in parallel, thereby saving processing time.  See Proshin, col. 6, ll. 23-28 (disclosing that a subset of parameters from the entire parameter set may be calculated in parallel).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Sze in view of Achlioptas and further in view of Yao (US 20180046894) (“Yao”).
Regarding claim 15,  the rejection of claim 14 is incorporated.  Sze further discloses truncation from fixed-point values (using dynamic fixed point, the bitwidth can be reduced [truncated] to 8 b for the weights and 10 b for the activations without any fine tuning of the weights; both can reach 8 b with fine tuning of the weights – Sze, paragraph spanning pp. 2317-18).
Neither Sze nor Achlioptas appears to disclose explicitly the further limitations of the claim.  However, Yao discloses that “the truncation comprises a truncation from 16-bit … values to 4-bit fixed-fixed-point quantizing may comprise converting 16-bit floating point numbers into 4-bit fixed-point numbers – Yao, claim 6).”  
Yao and the instant application both relate to hardware acceleration of neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Sze and Achlioptas to truncate the values from 16-bit to 4-bit, as disclosed by Yao, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would significantly reduce memory footprint and computation resources.  See Yao, paragraph 94.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849. The examiner can normally be reached M-R 7a-5:30p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/R.C.V./             Examiner, Art Unit 2125           

         /KAMRAN AFSHAR/                      Supervisory Patent Examiner, Art Unit 2125                                                                                                                                                                                                                                                                                                                                                         


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 While the specification uses the term “predictable output neuron map” frequently, it does not appear to define the term explicitly, and Examiner can find no evidence that it was an accepted term of art before the effective filing date.  Paragraphs 54-55 appear to suggest, albeit not explicitly state, that the PON map is used to sparsify the weights, and Figure 3 appears to show that it is then applied to the original inputs to generate a full, non-sparse output.  Therefore, for purposes of examination, any feature map that has reduced dimensionality relative to the full input space and is used to produce a non-sparse output will be deemed a “predictable output neuron map”.
        2 “Concurrently” is not defined in the specification (it appears only once in paragraph 81); thus, Examiner interprets it according to its ordinary dictionary definition, namely “acting in conjunction; cooperating”.  Dictionary.com, definition 2 of “concurrent”, https://www.dictionary.com/browse/concurrent.  See also MPEP § 2111.01(I).