DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). However, the certified copies for the following parent applications have not been successfully retrieved: No. CN201810479540.0, filed on 5/18/2018,201811040961.x, filed 9/6/2018, and CN201811592249.0, filed on 12/25/2018. 
Applicants should check to see whether the Office has received a copy of the foreign application under the priority document exchange program because successful retrieval of priority documents cannot be guaranteed. To be entitled to priority, the Office must receive a copy of the foreign application from the participating foreign intellectual property office within the pendency of the application and before the patent is granted, or receive a paper certified copy of the foreign application during that time period.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 3, 9, 10, 11 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a mathematical concept without significantly more. The claim(s) recite(s) a mathematical algorithm used to obtain the gradient update precision T according to the input 
PEG Analysis:
Step 1
Claim 1 is directed to a neural network operation module configured to perform operations of a multi-layer neural network (see claim 1, preamble; see also fig.1B, page 49, lines 7-12, for structural components of a neural network operation module). Therefore, the neural network operation module is directed to a system.
Step 2A Prong 1:
The claim recites a mathematical formula or calculation that is used to obtain the gradient update precision T according to the obtained input precision, weight precision and the output gradient precision, and if the gradient update T is less than Tr, adjust the input precision, the weight precision, and the output gradient precision to minimize the absolute value of the difference between the gradient update precision T and the preset precision Tr. In other words, and also it is evident in the dependent claim 2, specification Page 42, lines 14-20, that the mathematical concept or the computation can be represented by: 
T= Sx(l) + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l), for L >0, Abs IT-TrI = 0
 
[AltContent: rect]“…obtain input neuron precision Sx(l), weight precision Sw(l), and output neuron gradient precision S    x(l)  of an Lth layer of the multi-layer neural network, wherein L is an integer greater than 0, obtain gradient update precision T according to the input neuron precision Sx(l), the weight precision Sw(l), and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  , and if the gradient update precision T is greater than preset precision Tr, adjust the input neuron precision Sx(l), the weight precision Sw(l) and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l) to minimize an absolute value of a difference between the gradient update precision T and the preset precision Tr…”  (Claim 1, lines 5-11)
 “…represent an input neuron and a weight of the Lth layer according to the adjusted input neuron precision Sx(l) and the weight precision Sw(l) and represent an output neuron gradient of the Lth layer obtained from computations according to the adjusted output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l) for subsequent computations.” (Claim 1, lines 12-15)
Step 2A Prong 2
The combination of additional elements in the claim are:
“1. A neural network operation module configured to perform operations of a multi-layer neural network, comprising: 
a storage unit configured to store input neuron precision, weight precision, and output neuron gradient precision; 
a controller unit configured to…
and an operating unit configured to…” (Claim 1, lines 1-5, 12)
The additional elements do not integrate the exception into a practical application. In particular, the combination of additional elements, such as the storage unit, controller unit, and the operating unit are not using the mathematical formula or calculation in a specific manner that sufficiently limits the use of the mathematical concept to the practical application of neural 
Step 2B
As noted previously, the claim as a whole merely describes to generally “apply” the concept of storing, updating, adjusting the input/output and the precision variables in a neural network environment. The claimed components, such as the storage unit, controller unit, and the operating unit are recited at a high level of generality and are merely invoked as tools to perform an existing precision calculation. Thus, even when viewed as a whole, nothing in the claim adds significantly more to the mathematical concept. The claim is ineligible.
As to claim 2, 
Step 1
Claim 2 is dependent from claim 1 and is directed to a system.
Step 2A Prong 1
As discussed in claim 1, the claim 2 recites a mathematical formula or calculation:
 “…computations on the input neuron precision Sx(l)  the weight precision = Sw(l)  ,and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  according to a preset formula to obtain the gradient update precision T wherein,
T= Sx(l) + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l)  “    (see claim 2, lines 4-7)
Step 2A Prong 2

“The module of claim 1, wherein obtaining, by the controller unit, the gradient update precision T according to the input neuron precision Sx(l), the weight precision Sw(l), and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: 
the controller unit performs…”  (see claim 2, preamble)
The additional elements do not integrate the exception into a practical application for the same reasoning as discussed in claim 1 above. Furthermore, the obtaining the gradient update precision T according to the input neuron precision Sx(l), the weight precision Sw(l), and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  is recited at a high level of generality (i.e., as a general means of gathering the update precision, the input neuron precision and the output gradient data for use in the computations), and amounts to mere data gathering, which is a form of insignificant extra-solution activity. Each of the additional limitations is no more than mere instructions to apply the exception using a generic neural network components.
Step 2B 
The same reasoning in Step 2B in claim 1 is also applicable in claim 2.
As to claim 3,
Step 1
Claim 3 is dependent from claim 2, and is directed to a system as above.
Step 2A Prong 1
The mathematical concept is:
“…wherein adjusting … the input neuron precision Sx(l), the weight precision Sw(l), and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  includes:
 … keeps the input neuron precision Sx(l) and the weight precision Sw(l) unchanged, and decreases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l) .”  (Claim 3, lines 1-4)
Again, this is directed to the formula in claim 2, the input precision and the weight precision are constant while the output gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   is a variable (i.e. “adjustable”).“T= Sx(l) + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l)  “    
Step 2A Prong 2
The additional elements in the claim are:
“The module of claim 2, …  by the controller unit … the controller unit…” (Claim 3 lines 1,3)
The claim as a whole merely describes how to generally “apply” the concept of keeping the input neuron, the weight precision unchanged while adjusting the output gradient precision.  The claimed component, such as the controller unit, is recited at a high level of generality and are merely invoked as tools to perform an existing precision calculation. Therefore, the additional elements do not integrate the exception into a practical application.
Step 2B
The same reasoning in Step 2B in claim 1 is also applicable in claim 3 and not being repeated herein.
As to claim 4, claim 4 recites:
“wherein when the controller unit increases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  , the controller unit decreases a bit width of a fixed point data format representing the output neuron gradient.”

Claim 4 is directed to a system.
Step 2A Prong 1
The mathematical concept is:
“… increases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  … decreases a bit width of a fixed point data format representing the output neuron gradient.”
Step 2A Prong 2
However, the claim as whole (wherein when the controller unit increases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  , the controller unit decreases a bit width of a fixed point data format representing the output neuron gradient.”) and in view of applicant’s teaching in the specification (Page 44, lines 11-23) integrate the exception into a practical application. In particular, the combination of additional element(s), such as the controller unit, uses the mathematical formula in a specific manner that sufficiently limits the use of the mathematical concept to the practical application of decreasing the bit width of the fixed point data format in order to reduce precision redundancy as the precision redundancy may increase computational overhead and waste computing resources. Thus, the claim is not directed to the recited judicial exception, and the claim as a whole is eligible.  The relevant teaching of applicant specification is shown below:
“Note that, a reason why the controller unit 102 increases the output neuron gradient precision Sx(l)  is that the output neuron gradient precision  S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   is smaller than the required precision, and in this case, precision redundancy occurs which may increase computational overhead and waste computing resources. For the purpose of reducing computational overhead and avoiding wasting of computing resources, the precision of the output neuron gradient S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  needs to be increased. 
Specifically, it can be known from the description above that after the controller unit 102 increases the output neuron gradient precision    a determination of whether precision redundancy occurs needs to be made. In other words, it needs to be determined whether the output neuron gradient precision  S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  is smaller than the required precision. When it is determined that the output neuron gradient precision is less than the required precision, the bit width of the fixed point data format representing the output neuron gradient may be decreased to increase the output neuron gradient precision  and reduce precision redundancy. “ 
(Applicant’ specification Page 44, lines 11-23)
As to claim 5, claim 5 is eligible under similar analysis of claim 4 above. Particularly the feature shown below integrates the exception into a practical application. Thus, the claim is not directed to the recited judicial exception, and the claim as a whole is eligible.  
“ if the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   is less than the required precision, the controller unit decreases the bit width of the fixed point data format representing the output neuron gradient.”
As to claim 6, claim 6 is dependent from and further limits claim 4 and the same analysis in claim 4 is also applicable to claim 6. Claim 6 as a whole is eligible.  For the sake of simplicity, the details of the analysis in claim 4 are not being repeated herein.
Similarly, claim 7 is dependent from and further limits claim 4 and the analysis in claim 4 is also applicable to claim 7. Claim 7 as a whole is eligible.  
As to claim 8, claim 8 recites:
“The module of claim 1, wherein the controller unit is further configured to: obtain the preset precision Tr according to a method of machine learning, or obtain the preset precision Tr according to a count of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer, wherein the greater the count of output neurons, the count of samples during batch processing, and the learning rate of the L-1th layer are, the greater the preset precision Tr is.”
Step 1
Claim 8 is directed to a system as in the parent claim 1.
Step 2A Prong 1
The mathematical concept is:
“… wherein the greater the count of output neurons, the count of samples during batch processing, and the learning rate of the L-1th layer are, the greater the preset precision Tr is.”  (Claim 8, lines 4-6)
It can be readily seen that the above claim elements of the count of output neurons, the count of the samples, and the learning rate are directly proportional to the preset precision Tr. Therefore, it is directed to a mathematical concept. 
Step 2A Prong 2:
However, the remaining features recite:
“The module of claim 1, wherein the controller unit is further configured to: obtain the preset precision Tr according to a method of machine learning, or obtain the preset precision Tr according to a count of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer…”  (claim 8, lines 1-4)
 of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer integrate the exception into a practical application. In particular the combination of additional elements use the mathematical concept in a specific manner that sufficiently limits the use of the mathematical concept to the practical application of obtaining the preset precision according to a method of machine learning or according to a count of output neurons, learning rate and the count of sampling (i.e. the batch size) during the batch processing. Thus, the claim is not directed to the recited judicial exception, and the claim as a whole is eligible.  
As to claim 9, claim 9 includes similar limitations of claim 1 except it is directed to a method. Similar analysis in claim 1 is also applicable in claim 9. Therefore, claim 9 is rejected under the same reason as in claim 1 above. The details of the analysis are not being repeated herein.
The dependent claims 10, 11 correspond to dependent claims 2, 3 and are rejected under the same reason as in claims 2, 3 above. The details of the rejection are not being repeated herein.
Dependent claims 12,13,14,15,16 correspond to dependent claims 4,5,6,7,8, respectively, and are not rejected under “101” for the reasons already set forth above.


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

Claims 1,2,3,4,6,7,8,9,10,11,12,14,15,16 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16 of copending Application No. 16720145 (20200183752). Although the claims at issue are not identical, they are not patentably distinct from each other because of the following reason.
As to current claim 1, although the copending claim 1 does not recite:
“if the gradient update precision T is greater than preset precision Tr” as clamed (current claim 1, lines 8, 9, emphasis added), the copending claim 1 teaches:
“if the gradient update precision T is less than preset precision Tr” (copending claim 1, lines 8, 9, emphasis added).
It would have been  obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include if the gradient update precision T is greater than preset precision Tr as clamed because one of ordinary skill in the art should be able to recognize the use of known technique, such as adjusting the input neuron precision, the weight precision and the output neuron gradient precision based on the compared gradient update precision with 
Current dependent claims 2,3,4,6,7,8 includes similar limitations and correspond to copending dependent claims 2,3,4,6,7,8, and are rejected under the reason as set forth in claim 1 above. Similar analysis and reasoning in claim 1 above is also applicable for the dependent limitations of increase/decreases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   (claim 3), increase/decrease the bit width of the fixed point data format representing the output neuron gradient (claims 6,7,8.), which are recognizable by one of ordinary skill in the art. For the sake of simplicity the same analysis is not being repeated herein.
As to current independent claim 9, claim 9 includes similar issue as set forth in claim 1 above, and is rejected under the same reasoning as in claim 1 above. The details of the rejection are not being repeated herein (See also the claim mapping below)
Current dependent claims 10,11,12,14,15,16 include similar limitations and correspond to copending dependent claims 10,11,12,14,15,16, and are rejected under the reason as set forth in claim 9 above. Similar analysis and reasoning in claim 9 above is also applicable for the S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   (claim 11), increase/decrease the bit width of the fixed point data format representing the output neuron gradient (claims 12,14,15), which are recognizable by one of ordinary skill in the art. For the sake of simplicity the same analysis is not being repeated herein.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Copending Application 16720145
Current Application 16720171
1. A neural network operation module configured to perform operations of a multi-layer neural network, comprising: 
a storage unit configured to store input neuron precision, weight precision, and output neuron gradient precision; 
a controller unit configured to obtain input neuron precision Sx(l), weight precision Sw(l), and output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   of an Lth layer of the multi-layer neural network, 
wherein L is an integer greater than 0, 
obtain gradient update precision T according to the input neuron precision Sx(l), 
the weight precision Sw(l), and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  , and 
if the gradient update precision T is less than preset precision Tr, adjust the input neuron precision Sx(l)   the weight precision Sw(l)  and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   

to minimize an absolute value of a difference between the gradient update precision T and the preset precision Tr; and 
an operating unit configured to represent an input neuron and a weight of the Lth layer according to the adjusted input neuron precision Sx(l)   and the weight precision Sw(l)    and 
represent an output neuron gradient of the Lth layer obtained from computations according to the adjusted output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   for subsequent computations.

wherein obtaining, by the controller unit, the gradient update precision T according to the input neuron precision Sx(l)   , the weight precision Sw(l)   , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: the controller unit performs computations on the input neuron precision Sx(l)    the weight precision Sw(l)    ,and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   according to a preset formula to obtain the gradient update precision T, 
wherein the preset formula is 
T= Sx(l) + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l)

3. The module of claim 2, wherein adjusting, by the controller unit, the input neuron precision Sx(l)    , the weight precision Sw(l)    , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: the controller unit keeps the input neuron precision Sx(l)    and the weight precision Sw(l)  unchanged, and increases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   .

4. The module of claim 3, wherein when the controller unit increases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   , the controller unit decreases a bit width of a fixed point data format representing the output neuron gradient.

6. The module of claim 4, wherein decreasing, by the controller unit, the bit width of the fixed point data format representing the output neuron gradient includes: the controller unit decreases the bit width of the fixed point data format representing the output neuron gradient according to a first preset stride N1, wherein the first preset stride N1 can be 1, 2, 4, 6, 7, 8, or another positive integer.
7. The module of claim 4, wherein decreasing, the controller unit, the bit width of the fixed point data format representing the output neuron gradient includes: the controller unit decreases the bit width of the fixed point data format representing the output neuron gradient with an increment of 2 times.
8. The module of claim 1, wherein the controller unit is further configured to: obtain the preset precision Tr according to a method of machine learning, or obtain the preset precision Tr according to a count of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer, wherein the greater the count of output neurons, the count of samples during batch processing, and the learning rate of the L-1th layer are, the greater the preset precision Tr is.

9. A neural network operation method, comprising: 
obtaining input neuron precision Sx(l)    , weight precision Sw(l)    , and output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   of an Lth layer of a neural network; 
obtaining gradient update precision T by performing computations according to the input neuron precision Sx(l)     , the weight precision Sw(l)    , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   ; 
if the gradient update precision T is less than preset precision Tr, adjusting the input neuron precision Sx(l)     ,the weight precision Sw(l)     ,and the output neuron gradient S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   

 to minimize an absolute value of a difference between the gradient update precision T and the preset precision Tr; 
representing an input neuron and a weight of the Lth layer according to the adjusted input neuron precision Sx(l)   and the weight precision Sw(l)     ; and 
representing an output neuron gradient of the Lth layer obtained from computations according to the adjusted output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   for subsequent computations.
10. The method of claim 9, 
wherein obtaining the gradient update precision T by performing computations according to the input neuron precision Sx(l)      the weight precision Sw(l)     and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: performing computations on the input neuron precision Sx(l)     the weight precision Sw(l)      , and 
the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   according to a preset formula to obtain the gradient update precision T,
wherein the preset formula is 
T= Sx(l) + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l)

11. The method of claim 10, 
wherein adjusting the input neuron precision Sx(l)     , the weight precision Sw(l)     and 
the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: keeping the input neuron precision Sx(l)     and the weight precision Sw(l)     unchanged, and increasing the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   .


12. The method of claim 11, 
wherein when increasing the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)    ,the method further includes decreasing a bit width of a fixed point data format representing the output neuron gradient.


14. The method of claim 12, wherein, the decreasing the bit width of the fixed point data format representing the output neuron gradient includes: decreasing the bit width of the fixed point data format representing the output neuron gradient according to a first preset stride N1, wherein the first preset stride N1 can be 1, 2, 4, 6, 7, 8, or another positive integer.

15. The method of claim 12, wherein decreasing the bit width of the fixed point data format representing the output neuron gradient includes: decreasing the bit width of the fixed point data format representing the output neuron gradient with an increment of 2 times.

16. The method of claim 9, further comprising: obtaining the preset precision Tr according to a method of machine learning, or obtaining the preset precision Tr according to a count of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer, wherein the greater the count of output neurons, the count of samples during batch processing, and the learning rate of the L-1th layer are, the greater the preset precision Tr is.
1. A neural network operation module configured to perform operations of a multi-layer neural network, comprising: 
a storage unit configured to store input neuron precision, weight precision, and output neuron gradient precision; 
a controller unit configured to obtain input neuron precision Sx(l) , weight precision Sw(l) and output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  of an Lth layer of the multi-layer neural network, 
wherein L is an integer greater than 0; 
obtain gradient update precision T according to the input neuron precision Sx(l)   , 
the weight precision Sw(l)  , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  ; and 
if the gradient update precision T is greater than preset precision Tr, adjust the input neuron precision Sx(l) the weight precision Sw(l) and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  
to minimize an absolute value of a difference between the gradient update precision T and the preset precision Tr; and 
an operating unit configured to represent an output neuron and a weight of the Lth layer according to the adjusted input neuron precision Sx(l)   and the adjusted weight precision Sw(l)    and 
represent an output neuron gradient of the Lth layer obtained from computations according to the adjusted output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   for subsequent computations.

wherein obtaining, by the controller unit, gradient update precision T according to the input neuron precision Sx(l)    the weight precision Sw(l)   , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: the controller unit performs computations on the input neuron precision Sx(l)   the weight precision Sw(l)    and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   according to a preset formula to obtain the gradient update precision T, 
wherein the preset formula is 
T= Sx(l) + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l)

3. The module of claim 2, wherein adjusting, by the controller unit, the input neuron precision Sx(l)    , the weight precision Sw(l)    and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: the controller unit keeps the input neuron precision Sx(l)    and the weight precision Sw(l)    unchanged, and decreases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   .

4. The module of claim 3, wherein when the controller unit decreases the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   , the controller unit increases a bit width of a fixed point data format representing the output neuron gradient.

6. The module of claim 4, wherein increasing, by the controller unit, the bit width of the fixed point data format representing the output neuron gradient includes: the controller unit increases the bit width of the fixed point data format representing the output neuron gradient according to a first preset stride N1, wherein the first preset stride N1 can be 1, 2, 4, 6, 7, 8, or another positive integer.
7. The module of claim 4, wherein 
increasing, by the controller unit, the bit width of the fixed point data format representing the output neuron gradient includes: the controller unit increases the bit width of the fixed point data format representing the output neuron gradient with an increment of 2 times.
8. The module of claim 1, wherein the controller unit is further configured to: obtain the preset precision Tr according to a method of machine learning, or obtain the preset precision Tr according to a count of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer, wherein the greater the count of output neurons, the count of samples during batch processing, and the learning rate of the L-1th layer are, the greater the preset precision Tr is.

9. A neural network operation method, comprising: 
obtaining input neuron precision Sx(l)    , weight precision Sw(l)    , and output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   of an Lth layer of a neural network; 
obtaining gradient update precision T by performing computations according to the input neuron precision Sx(l)    , the weight precision Sw(l)     , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   
 if the gradient update precision T is greater than preset precision Tr, adjusting the input neuron precision Sx(l)     , the weight precision  Sw(l)    , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   
to minimize an absolute value of a difference between the gradient update precision T and the preset precision Tr; 
representing an output neuron and a weight of the Lth layer according to the adjusted input neuron precision Sx(l)      and the weight precision Sw(l)      ; and 
representing an output neuron gradient of the Lth layer obtained from computations according to the adjusted output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   for subsequent computations.
10. The method of claim 9, 
wherein obtaining the gradient update precision T by performing computations according to the input neuron precision Sx(l)     the weight precision Sw(l)     and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: performing computations on the input neuron precision Sx(l)      , the weight precision Sw(l)     , and
 the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   according to a preset formula to obtain the gradient update precision T,
wherein the preset formula is 
T= Sx(l) + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l)

11. The method of claim 10, 
wherein adjusting the input neuron precision Sx(l)     the weight precision Sw(l)     , and 
the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   includes: keeping the input neuron precision Sx(l)     and the weight precision Sw(l)      unchanged, and decreasing the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   .


12. The method of claim 11, 
wherein when decreasing the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   , the method further includes increasing a bit width of a fixed point data format representing the output neuron gradient.


14. The method of claim 12, wherein increasing the bit width of the fixed point data format representing the output neuron gradient includes: increasing the bit width of the fixed point data format representing the output neuron gradient according to a first preset stride N1, wherein the first preset stride N1 can be 1, 2, 4, 6, 7, 8, or another positive integer.

15. The method of claim 12, wherein increasing the bit width of the fixed point data format representing the output neuron gradient includes: increasing the bit width of the fixed point data format representing the output neuron gradient with an increment of 2 times.

16. The method of claim 9, further comprising: obtaining the preset precision Tr according to a method of machine learning, or obtaining the preset precision Tr according to a count of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer, wherein the greater the count of output neurons, the count of samples during batch processing, and the learning rate of the L-1th layer are, the greater the preset precision Tr is.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 9 are  is/are rejected under 35 U.S.C. 103 as being unpatentable over Gaborski 5052043 in view of  Jamie Hanlon “ Why is so much memory needed for deep neural networks?” (2017), https://www.graphcore.ai/posts/why-is-so-much-memory-needed-for-deep-neural-networks.
As to claim 1, Gaborski teaches a neural network operation module configured to perform operations of a multi-layer neural network, comprising (see fig.1 for the block diagram of a neural network operation module [400][170];see fig.4 for the details of the multi-layer neural network input layer 220, hidden layer 230, output layer 240, of the neural operation network module [400]): 
a controller unit [400] configured to obtain input neuron precision Sx(l) [ Θ9/ Θ10 /Θ11]/], weight precision Sw(l) [neural weights Wkj], and output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   [δj] of th layer (e.g. the output layer that produced actual output neuron values] of the multi-layer neural network (see the network of the input layer 220, hidden layer 230, output layer 240 in fig.4; see also the error [δj] is determined by the difference between the actual value tk and the target value Ok for each output neuron, col.16, lines 51-55,equation (6) ), 
wherein L is an integer greater than 0 (see the input layer 220, hidden layer 230, output layer 240 are three layers in fig.4), obtain gradient update precision T [ retrained actual output neuron values] according to the input neuron precision Sx(l) [ Θ9/ Θ10 /Θ11]/], the weight precision Sw(l) [neural weights Wkj], and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l) [δj], (See col.19, lines 49-69, col.20, lines 1-4 and fig.6 show the algorithm for adjusting the neural weight and bias values for retraining the output values in col.23, lines 13-22, after these errors have been completely determined, block 653 calculates new values for all the neural weight and bias values, as described above, and then adjusts all the values for the neural weights and biases accordingly.  Once execution is completed for this block, the neural network will have been re-trained using the results of the recognition of the actual unknown character presently applied to the network.  Execution then proceeds, via paths 655 and 658 to block 667; see also the recursive process of adjusting the network weights and bias values based upon error between the target output vector and the actual output vector that implements a process of gradient descent which minimizes the sum-squared error in col.17, lines 43-54), and 
if the gradient update precision T is greater than preset precision Tr (not explicitly shown, but see Note 1 below), adjust the input neuron precision Sx(l)  [bias changes ∆ Θj], the weight precision Sw(l)  [∆ Wkj]  and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   [δj] to minimize an absolute value of a difference (the error) between the gradient update precision T [actual the weights and bias for each neuron are adjusted in a direction and by an amount that minimizes the total network error for this input pattern. Once all the network weights have been adjusted for one training pattern, the next training pattern is presented to the network and the error determination and weight adjusting process iteratively repeats, and so on for each successive training pattern.  Typically, once the total network error for each of these patterns reaches a pre-defined limit, these iterations stop and training halts.  At this point, all the network weight and bias values are fixed at their then current values. See also citations below: fig.4 [260][270], col.19, lines 57-69, col.20, lines 1-4; col.23, lines 13-22 ); and 
an operating unit [260][270] configured to represent an input neuron and a weight of the Lth layer according to the adjusted input neuron precision Sx(l)  [bias changes ∆ Θj] and the weight precision Sw(l) [∆ Wkj]  and represent an output neuron gradient of the Lth layer [output layer] obtained from computations [retraining] according to the adjusted output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  [δj] for subsequent computations [re-training]  (see fig.4 [260][270], 
Note 1: Gaborski does not explicitly show if the gradient update precision T is greater than preset precision Tr, as clamed. However, Gaborski teaches the error (i.e. the gradient) between the neuron actual output value and target output value (see col.16, lines 51-56, col.19, lines 57-63). Examiner holds that this error encompasses the absolute value for both T (the actual output) is greater or less than Tr ( the targeted output). For example, the error (T-Tr) has the absolute value that is the same as numerical value of T-Tr for T is greater and/or less than Tr. Therefore, Gaborski implicitly teaches if the gradient update precision T [actual output value] is greater than preset precision Tr [target output value], as claimed.
Gaborski does not but Hanlon teaches a storage unit [memory] configured to store input neuron precision [input data], weight precision [weight parameters], and output neuron gradient precision [activations: error gradients]( See Hanlon, 6th paragraph), as claimed.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include a storage unit configured to store input neuron precision, weight precision, and output neuron gradient precision, as claimed, because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the  memory for storing the input data, the weight parameters, and the activations for calculating the gradients, as taught by  Hanlon, to a known device/method, such as the neural network of Gaborski in which the weights and bias (the input neurons)  for each neuron are th paragraph. MPEP 2143 KSR Example D).
As to claim 9, claim 9 is directed to a method that includes similar limitations of the system claim 1, and is rejected under the same reason as in claim 1 above. The details of the rejection are not being repeated herein.
Allowable Subject Matter
Claims 2-8, 10-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and pending on all applicable “101” set forth in this action. None of the prior art of record teaches:
a) Obtaining by the controller unit the gradient update precision T according to the input neuron precision Sx(l) , the weight precision Sw(l) , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  includes:  the controller unit performs computations on the input neuron precision Sx(l)  the weight precision Sw(l) , and the output neuron gradient precision S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)   according to a preset formula to obtain the gradient update precision T, wherein the preset formula is T= Sx(l)  + S 
    PNG
    media_image1.png
    18
    17
    media_image1.png
    Greyscale
  x(l)  - Sw(l)   (Claim 2. See similarly recited method claim 10)
b) The controller unit obtains the preset precision Tr according to a method of machine learning, or obtain the preset precision Tr according to a count of output neurons, a learning rate, and a count of samples during batch processing of an L-1th layer, wherein the greater the 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  
a)  Lin et al. 20160328645 is cited for the teaching of a fixed point neural network with weights (see fig.1, para [0030]-[0032]).
b) Tee 20190138372 is cited for the teaching of a multilayer neural network for perform the gradient descent method of solving for the weights of neurons (see [0273]).
c) Doeding et al 20150134581 is cited for the teaching of the back-propagation of the errors of the output neurons into the network using different processes (gradient descent, heuristic methods such as particle swarm optimization or evolution process) the synaptic weights of all neurons of the network are then changed so that the neural network approximates the desired functionality with an arbitrary degree of precision (see [0006]).
d) Wu et al. 20180322391 is cited for the method of computing the weight gradients of the neurons (see [0111][0112]).
e) Taesik et al.  “Speeding up Convolutional Neural Network Training with Dynamic Precision Scaling and Flexible Multiplier Accumulator “(2016 ACM) is cited for the teaching of the dynamic precision computation of a neural network training (see Section 2 PROPOSED APPROACH: CONCEPT).

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571 270 3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DANIEL H. PAN
Examiner
Art Unit 2182