Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending for examination. Claims 1, 14, and 15 are independent.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/13/2022 has been entered.
 
Response to Amendment
This office action is responsive to the amendment filed on 01/13/2022. As directed by the amendment, claims 1-2, 7, 14, and 15 are amended.

Response to Arguments
Applicant's arguments filed 01/13/2022 have been fully considered but they are not persuasive. 
Applicant Argues: No combination of Zhou, Xu, and Seide describes or suggests each and every feature recited in Claim 1. In particular, the Office relies on Zou for allegedly describing encoding a plurality of gradients and points to equation 12 in Zou as allegedly describing this feature. However, Zou does not appear to describe or suggest encoding a plurality of gradients according to a probability related to at least the magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients. See Applicant's specification, paragraphs [0044], [0074], and [0092]. Further, neither Zu or Seide describe encoding a plurality of gradients according to a probability related to at least the magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients. Thus, to further prosecution, Applicant has amended Claim 1 to further define encoding a plurality of gradients according to a probability related to at least the magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients. As such, Applicant submits that Claim 1 is patentable over the cited references. 
Further, to the extent independent Claims 14 and 15 include recitations similar to the recitations of Claim 1, Applicant submits that independent Claims 14 and 15 are also patentable over the cited references. 

Examiners response: Examiner respectfully disagrees, under broadest reasonable interpretation the amended limitation is still disclosed by Zhou. Zhou describes on page 5 that dr is a gradient tensor (i.e. vector) and equation 12 discloses                         
                            
                                
                                    d
                                    r
                                
                                
                                    2
                                    m
                                    a
                                    x
                                    ⁡
                                    (
                                    |
                                    d
                                    r
                                    |
                                    )
                                
                            
                        
                    . Under broadest reasonable interpretation, dr contains scalers of the individual gradients that have a magnitude and |dr| is the divided magnitude vector. The limitation states “a probability related to at least a magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients” and equation 12 of Zhou discloses the described related probability. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
In Claim 14, line 2, "means for storing a plurality of gradients of a loss function of the neural network" has been interpreted under 112(f) as a means plus function limitation because of the combination of a non-structural term "means" and functional language "for storing a plurality of gradients of a loss function of the neural network" without reciting sufficient structure to achieve the function. The specification discloses (paragraph 82) what the means for storing a plurality of gradients of a loss function of the neural network.
In Claim 14, line 4, "means for encoding the plurality of gradients" has been interpreted under 112(f) as a means plus function limitation because of the combination of a non-structural term "means" and functional language "for encoding the plurality of gradients" without reciting sufficient structure to achieve the function. The specification discloses (paragraph 82) what the means for encoding the plurality of gradients.
In claim 14, “a means for encoding distances between individual ones of the plurality of gradients which are not set to zero;” The specification in Para 0082 discloses a means for encoding distances.
In Claim 14, line 7, "means for sending the encoded plurality of gradients to one or more other computation nodes of the neural network training system over a communications network" has been interpreted under 112(f) as a means plus function limitation because of the combination of a non-structural term "means" and functional language "for sending the encoded plurality of gradients to one or more other computation nodes of the neural network training system over a communications network" without reciting sufficient structure to achieve the function. The specification discloses (paragraph 82) what the means for sending the encoded plurality of gradients to one or more other computation nodes of the neural network training system over a communications network.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1
According to the first part of the analysis, in the instant case, claims 1-13 are directed to a system, claim 14 is directed to a system, and claims 15-20 is directed toward a system. Thus, each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter).

Step 2A, Prong 1
Following the determination of whether or not the claims fall within one of the four categories (Step 1), it must be determined if the claims recite a judicial exception (e.g. mathematical concepts, mental processes, certain methods of organizing human activity) (Step 2A, Prong 1). In this case, the claims are determined to recite a judicial exception as explained below.

Regarding Claims 1, 14, and 15
encodes the plurality of gradients according to a probability related to at least a magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients and setting the first set of the plurality of gradients to zero and a second set of the plurality of gradients to one of a plurality of quantization levels (This step for encoding gradients appears to be is understood to be a recitation of math.); and 
encodes, each distance between individual ones of the plurality of gradients which are not set to zero (This step for encoding distances appears to be is understood to be a recitation of math.); and 

Step 2A, Prong 2
Following the determination that the claims recite a judicial exception, it must be determined if the claims recite additional elements that integrate the exception into a practical application of the exception (Step 2A, Prong 2). In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that integrate the exception into a practical application of the exception as explained below.

Regarding Claims 1, 14, and 15
an encoder which: (encoder is generic computer equipment)
a memory storing a plurality of gradients of a loss function of the neural network (This step for storing and transmitting is extra solution activity and memory is generic computer equipment.); 
a processor which sends the encoded plurality of gradients to one or more other computation nodes of the neural network training system over a communications network (This step appears to be directed to transmitting information, which is understood to be extra-solution activity.).

Step 2B
Based on the determination in Step 2A of the analysis that the claims are directed to a judicial exception, it must be determined if the claims contain any element or combination of elements sufficient to ensure that the claim amounts to significantly more than the judicial exception (Step 2B). In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the same reasons given above in the Step 2A, Prong 2 analysis. Furthermore, each additional element identified above as being insignificant extra-solution activity is also well-known, routine, conventional as described below.

Regarding Claims 1, 14, and 15
an encoder which (The encoder is understood to be generic computer equipment. See MPEP 2106.05(f).):
a memory storing a plurality of gradients of a loss function of the neural network (This step appears to be directed to storing and transmission See MPEP 2106.05(d). Memory is understood to be generic computer equipment. See MPEP 2106.05(f).); 
a processor which sends the encoded plurality of gradients to one or more other computation nodes of the neural network training system over a communications network (This step appears to be directed to transmitting information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g). The processor is understood to be generic computer equipment. See MPEP 2106.05(f).).

Step 2A, Prong 1

	Regarding claim 2
	wherein each distance between individual ones of the plurality of gradients which are not set to zero is encoded using Elias recursive encoding (This step is understood to be a recitation of mathematical calculations.).

	Regarding Claim 3
wherein the encoder encodes the plurality of gradients according to a probability related to at least the magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients (This step is understood to be a recitation of mathematical calculations.).
Regarding Claims 4 and 16
wherein the encoder sets individual ones of the gradients to zero according to the outcome of a biased process, the bias being calculated from at least the magnitude of the individual gradient (This step is understood to be a recitation of mathematical calculations.).

Regarding Claim 5
wherein the encoder outputs a magnitude of the plurality of gradients, a list of signs of a plurality of gradients which are not set to zero by the encoder, and relative positions of the plurality of gradients which are not set to zero by the encoder. (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process and math.)

Regarding Claim 6
wherein the encoder further comprises an integer encoder which compresses a plurality of integer (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process and math.)

Regarding Claim 7
wherein the encoding uses Elias recursive coding and comprises encoding a position of a first nonzero entry of a set of quantization levels in an interval [0,1]. (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process and math.)
Regarding Claim 8
wherein the encoder encodes the plurality of gradients according to a probability related to a tuning parameter which controls a trade-off between training time of the neural network and the amount of data sent to the other computation nodes. (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process and math.)

Regarding Claims 9 and 20
wherein the tuning parameter is selected according to user input. (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process)

Regarding Claims 12 and 17
comprising a decoder which decodes encoded gradients received from other computation nodes (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process and), and wherein the processor updates weights of the neural network using the stored gradients and the decoded gradients (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process).

Step 2A, Prong 2
	
	Regarding Claims 10 and 18
	wherein the tuning parameter is automatically selected according to bandwidth availability. (The specification of data to be stored is understood to be a field of use limitation.)

	Regarding Claims 11 and 19
wherein a value of the tuning parameter in use by the computation node is displayed at a user interface. (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity.)
	
	Regarding Claims 12 and 17
	“Decoder” and “processor” (The decoder/processor are understood to be generic computer equipment.)
	
	Regarding Claim 13
the memory storing weights of the neural network and wherein the processor updates the weights using the plurality of gradients and gradients received from the other computation nodes. (This step for storing and transmitting is extra solution activity and memory/processor are generic computer equipment.)

Step 2B

Regarding Claims 10 and 18
	wherein the tuning parameter is automatically selected according to bandwidth availability. (The specification of data to be stored is understood to be a field of use limitation. See MPEP 2106.05(h).)

	Regarding Claims 11 and 19
wherein a value of the tuning parameter in use by the computation node is displayed at a user interface. (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).)
	
	Regarding Claims 12 and 17
	“Decoder” and “processor” (The decoder/processor are understood to be generic computer equipment. See MPEP 2106.05(f).)
	
	Regarding Claim 13
the memory storing weights of the neural network and wherein the processor updates the weights using the plurality of gradients and gradients received from the other computation nodes. (This step appears to be directed to storing and transmission See MPEP 2106.05(d). Memory/processor are understood to be generic computer equipment. See MPEP 2106.05(f).)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1,3-4, 8-9, 12-17, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou, et. al ("DOREFA-NET: TRAINING LOW BITWIDTH CONVOLUTIONAL NEURAL NETWORKS WITH LOW BITWIDTH", hereafter "Zhou") in view of Xu et al. ("Image Smoothing via L0 Gradient Minimization", hereafter "Xu"), and Seide, et al. ("1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs", In Proceedings of 15th Annual Conference of the International Speech Communication Association, hereafter "Seide").

Regarding Claim 1
Zhou discloses: A computation node of a neural network training system comprising (Zhou Abstract): a memory storing a plurality of gradients of a loss function of the neural network (Zhou p. 2 paragraph 5 2-bit gradients [gradients of loss function are stored in memory. Zhou's "objective function" is equivalent to this applications "loss functions"].); 
an encoder which: 
encodes the plurality of gradients according to a probability related to at least a magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients ([Page 5 first two para and Equation 12] Zhou discloses that dr is a gradient tensor (i.e. vector). Under broadest reasonable interpretation, in equation 12 dr is a magnitude of the individual gradients and |dr| is the magnitude vector of the plurality of gradients.) and setting the first set of the plurality of gradients to zero and a second set of the plurality of gradients to one of a plurality of quantization levels (Zhou p.5 equation 12 [equation 12 renders different quantization levels]. Setting gradients to zero and a non-zero quantized level inherently creates a first set and a second set.), and
Zhou does not explicitly disclose: encodes, each distance between individual ones of the plurality of gradients which are not set to zero;
However, Xu discloses in the same field of endeavor: encodes, each distance between individual ones of the plurality of gradients which are not set to zero ([Section 2.1 1D Smoothing] “Our method counts amplitude changes discretely, written as c(f) = #{p | | fp − fp+1| ≠ 0}, (1) where p and p + 1 index neighboring samples (or pixels). | fp − fp+1| is a gradient w.r.t. p in the form of forward difference. #{} is the counting operator, outputting the number of p that satisfies | fp− fp+1| ≠ 0, that is, the L0 norm of gradient.” Examiner interprets c(f) as encoding each distance.);
It would have been obvious of one of skill in the art at the time of filing to combine Zhou, and Xu. Doing so can provide an optimization framework making use of L0 gradient minimization (Abstract Xu).
Zhou in view of Xu does not explicitly disclose: a processor which sends the encoded plurality of gradients to one or more other computation nodes of the neural network training system over a communications network.
However, Seide discloses in the same field of endeavor: a processor (Seide abstract GPU) which sends the encoded plurality of gradients to one or more other computation nodes of the neural network training system over a communications network (Seide p. 1060 section 3.1 The algorithm we use for aggregating the sub-gradients over computer nodes [send gradients to other nodes].).
	It would have been obvious of one of skill in the art at the time of filing to combine Zhou, Xu, and Seide. One would be motivated to combine the gradient quantization of Zhou with the quantized gradient being sent to other nodes as taught by Seide to speed up the learning of neural networks.

Regarding Claim 14
Zhou in view of Xu and Seide discloses: A computation node of a neural network training system comprising: (Claim 14 corresponds to claim 1 and the rest of the limitations are rejected on the same ground)

Regarding Claim 15
Zhou in view of Xu and Seide discloses: A computer implemented method at a computation node of a neural network training system comprising: (Claim 15 corresponds to claim 1 and the rest of the limitations are rejected on the same ground)

Regarding Claim 3
Zhou in view of Xu and Seide discloses: The computation node of claim 1, wherein the encoder encodes the plurality of gradients according to a probability related to at least the magnitude of the individual gradient divided by the magnitude of a vector of the plurality of gradients (Zhou p.5 equation 12 [dr/ (maxo (ldrl))].).

Regarding Claim 4
Zhou in view of Xu and Seide discloses: The computation node of claim 1 wherein the encoder sets individual ones of the gradients to zero according to the outcome of a biased process, the bias being calculated from at least the magnitude of the individual gradient (Zhou p.5 equation 12. [Biased process is interpreted as a discrete random variable.]).

Regarding Claim 8
Zhou in view of Xu and Seide discloses: The computation node of claim 1 wherein the encoder encodes the plurality of gradients according to a probability related to a tuning parameter (Zhou equation 12 [k]) which controls a trade-off between training time of the neural network and the amount of data sent to the other computation nodes (Zhou p.8 balancing between multiple factors like training time; Zhou Fig. 1 [Fig. 1 shows epochs (epoch is iteration; More iterations is more training time) and each plot has designation of W-A-G where G is the number of bits of the gradient. Fig. 1 shows the tradeoff between training time and number of bits. As the number of bits is reduced the training time is increased for a given accuracy.])

Regarding Claim 9
Zhou in view of Xu and Seide discloses: The computation node of claim 8 wherein the tuning parameter is selected according to user input (Zhou equation 12 [user selects k])

Regarding Claim 12
Zhou in view of Xu and Seide discloses: The computation node of claim 1 comprising a decoder which decodes encoded gradients received from other computation nodes (Seide p. 1060 left column paragraph 2 The algorithm we use for aggregating the sub-gradients [gradient] over compute nodes ... Each compute node which it will receive in quantized form from all peer nodes [other computation nodes]), and wherein the processor updates weights of the neural network using the stored gradients and the decoded gradients (Seide p. 1059 eq. 1 [lambda is model weight]. equations in right column [Gijl(t) is stored gradient; Q"(-l)G"quant, ilg(t) is decoded gradient]).

Regarding Claim 13
Zhou in view of Xu and Seide discloses: The computation node of claim 1 the memory storing weights (Seide p. 1059 eq. 1 [lambda is model weight]) of the neural network and wherein the processor updates the weights using the plurality of gradients and gradients received from the other computation nodes (Seide p. 1059 eq. I [lambda is model weight]. Equations in right column [Gijl(t) is stored gradient; Q"(l) G"quant, ilg(t) is decoded gradient]).

Regarding Claim 16
Zhou in view of Xu and Seide discloses: The method of claim 15 wherein the encoder sets individual ones of the gradients to zero according to the outcome of a biased coin flip process, the bias being calculated from at least the magnitude of the individual gradient (Zhou p.5 “To further compensate the potential bias introduced by gradient quantization, we introduce an extra noise function” and Equation 12. Examiner interprets |dr| as the magnitude of the individual gradient.).

Regarding Claim 17
Zhou in view of Xu and Seide discloses: The method of claim 15 comprising further decoding encoded gradients received from other computation nodes (Seide p. 1060 left column paragraph 2 The algorithm we use for aggregating the sub-gradients [gradient] over compute nodes ... Each compute node which it will receive in quantized form from all peer nodes [other computation nodes]), and updating weights of the neural network using the stored gradients and the decoded gradients (Seide p. 1059 eq. 1 [lambda is model weight]. equations in right column [Gijl(t) is stored gradient; Q"(-l)G"quant, ilg(t) is decoded gradient]).

Regarding Claim 20
Zhou in view of Xu and Seide discloses: The method of claim 15 comprising selecting the value of the tuning parameter according to user input (Zhou equation 12 [user selects k]).

Claim(s) 2 and 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou, et. al ("DOREFA-NET: TRAINING LOW BITWIDTH CONVOLUTIONAL NEURAL NETWORKS WITH LOW BITWIDTH", hereafter "Zhou") in view of Xu et al. ("Image Smoothing via L0 Gradient Minimization", hereafter "Xu"), Seide, et al. ("1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs", In Proceedings of 15th Annual Conference of the International Speech Communication Association, hereafter "Seide"), and Au (US 20100067688, hereafter "Au").

Regarding Claim 2
Zhou in view of Xu and Seide discloses: The computation node of claim 1 
Zhou in view of Xu and Seide does not explicitly discloses: wherein each distance between individual ones of the plurality of gradients which are not set to zero is encoded using Elias recursive encoding.
However, Au discloses in the same field of endeavor: wherein each distance between individual ones of the plurality of gradients which are not set to zero is encoded using Elias recursive encoding (Au paragraph 0012, “A universal code is a prefix code that maps positive integers to their corresponding binary codewords ... That is, given an arbitrary source with nonzero entropy, a universal code achieves average codeword length, which is at most a constant times the optimal possible for that source. Typical universal codes include Elias gamma coding, Elias delta coding, Elias omega coding [Spec states Elias omega coding is same as Elias recursive coding], Fibonacci coding, Levenstein coding, and Exp-Golomb coding.”).
It would have been obvious of one of skill in the art at the time of filing to combine Zhou, Xu, Seife and the method for encoding with coding schemes taught by Au. One would be motivated to use the methods of Au in order to provide data compression using Elias coding (Para 0012, Au).

Regarding Claim 6
Zhou in view of Xu, Seide, and Au discloses: The computation node of claim 1 wherein the encoder further comprises an integer encoder which compresses a plurality of integers (Au paragraph 0012 A universal code is a prefix code that maps positive integers to their corresponding binary codewords).

Regarding Claim 7
Zhou in view of Xu, Seide, and Au discloses: The computation node of Claim 1 wherein the encoding uses Elias recursive coding (Au paragraph 0012 A universal code is a prefix code that maps positive integers to their corresponding binary codewords ... That is, given an arbitrary source with nonzero entropy, a universal code achieves average codeword length, which is at most a constant times the optimal possible for that source. Typical universal codes include Elias gamma coding, Elias delta coding, Elias omega coding [Spec states Elias omega coding is same as Elias recursive coding], Fibonacci coding, Levenstein coding, and Exp-Golomb coding.) comprises encoding a position of a first nonzero entry of a set of quantization levels in an interval [0,1] ([Page 4-5], Zhou equations 11-12 “The above function first applies an affine transform on the gradient, to map it into [0, 1], and then inverts the transform after quantization.”). 

Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou, Xu, Seide, in further view of Tseng ("Sparse Vectors", 3/15/1999,https://www.cs.umd.edu/Outreach/hsContest99/questions/node3 .html, hereafter "Tseng").

Regarding Claim 5
Zhou in view of Xu and Seide discloses: The computation node of claim 1.
Zhou in view of Xu and Seide does not explicitly disclose: wherein the encoder outputs a magnitude of the plurality of gradients, a list of signs of a plurality of gradients which are not set to zero by the encoder, and relative positions of the plurality of gradients which are not set to zero by the encoder.
However, Tseng discloses in the same field of endeavor: wherein the encoder outputs a magnitude of the plurality of gradients, a list of signs of a plurality of gradients which are not set to zero by the encoder, and relative positions of the plurality of gradients which are not set to zero by the encoder (Tseng paragraph 2 Generally, the entries of the list will correspond to the non-zero elements of the vector in order, with each entry containing the index [relative position] and value [magnitude] for that entry [Each value includes both a value and sign. For example, look above heading "Test data used in judging"].).
It would have been obvious of one of skill in the art at the time of filing to combine Zhou, Xu, Seide, and Tseng. One would be motivated to use the methods of Tseng to compress data to improve the efficiency of storing a vector (Tseng inefficient to use a one-dimensional array to store a sparse vector).

Claim 10, 11 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of Xu, Seide, in further view of Lin (US 20160328644), hereafter "Lin".

Regarding Claim 10
Zhou in view of Xu and Seide discloses: The computation node of claim 8.
Zhou in view of Xu and Seide does not explicitly disclose: wherein the tuning parameter is automatically selected according to bandwidth availability.
However, Lin discloses in the same field of endeavor: wherein the tuning parameter is automatically selected according to bandwidth availability (Lin Abstract A new configuration [tuning parameter] for the machine learning process is determined based at least in part on the current system resources [bandwidth] and the performance specifications. The method also includes dynamically selecting between a current configuration and the new configuration based at least in part on the current system resources and the performance specifications)..
It would have been obvious of one of skill in the art at the time of filing to combine Zhou, Xu, Seide, and the method for adaptive neural networks taught by Lin. Doing so allows for configurating parameters of a machine learning process based on the current system resources (Abstract, Lin).

Regarding Claim 11
Zhou in view of Xu, Seide, and Lin discloses: The computation node of claim 8 wherein a value of the tuning parameter in use by the computation node is displayed at a user interface ((Lin [0030] local processing unit 202 may comprise a local state memory 204 and a local parameter memory 206 that may store parameters [tuning parameter] of a neural network [0092] a user interface (e.g., keypad, display, mouse, joystick, etc.))).

Regarding Claim 18
Zhou in view of Xu, Seide, and Lin discloses: The method of claim 15 comprising automatically selecting the value of the tuning parameter according to bandwidth availability (Lin Abstract A new configuration [tuning parameter] for the machine learning process is determined based at least in part on the current system resources [bandwidth] and the performance specifications. The method also includes dynamically selecting between a current configuration and the new configuration based at least in part on the current system resources and the performance specifications).

Regarding Claim 19
Zhou in view of Xu, Seide and Lin discloses: The method of claim 15 comprising outputting the value of the tuning parameter at a graphical user interface (Lin [0030] local processing unit 202 may comprise a local state memory 204 and a local parameter memory 206 that may store parameters [tuning parameter] of a neural network [0092] a user interface (e.g., keypad, display, mouse [graphical user interface], joystick, etc.)).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Chattopadhyay et al. (US 20160004976 A1, hereinafter "Chattopadhyay") also describes a probability related to gradients divided by the magnitude (Para 0058)..
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TEWODROS E MENGISTU whose telephone number is (571)270-7714. The examiner can normally be reached Mon-Fri 9:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ABDULLAH KAWSAR can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TEWODROS E MENGISTU/Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127