DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-25 are pending and have been examined.

Information Disclosure Statement
The information disclosure statements (IDSs) submitted on 8/22/2018 and 5/4/2020 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements have been considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference characters not mentioned in the description: 
Reference character 1010 shown in Figure 8 is not found in the detailed description.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, 

Specification
The disclosure is objected to because of the following informalities: 
Reference character 1010 shown in Figure 8 is not are not described in applicant’s specification (see, e.g., paragraphs 40-48 describing FIG. 8, which do not mention reference character 1010 or the “Battery Port” depicted in FIG. 8 as corresponding to reference character 1010). Appropriate correction is required.
The use of the terms CAFFE, THEANO, APACHE SPARK®, MICROSOFT® AZURE®, PYTHON®, PERL®, JAVA® and SMALLTALK which are trade names or marks used in commerce, has been noted in this application. They should be capitalized wherever they appear and be accompanied by the generic terminology. For instance, the terms CAFFE, THEANO, APACHE SPARK, and MICROSOFT AZURE appear in paragraph 2 of the specification, and the terms PYTHON, PERL, JAVA and SMALLTALK appear in paragraph 14 of the specification.


Claim Objections
Claim 25 is objected to because of the following informalities: 
Line 1 of claim 25 recites “The at least one computer readable medium storage medium of claim 20”. However, claim 20 is directed to “At least one computer readable storage medium” (see, line 1 of claim 20). Thus, it appears that the preamble of claim 20 should read “The at least one computer readable . Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-5, 7-11, 13-18 and 20-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims below follows the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”). 
When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of 
Regarding independent claims 1, 7, 14 and 20, these claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claims 1 and 7 are both directed to an apparatus, claim 14 is directed to a method/process, and claim 20 is directed to computer readable storage medium, corresponding to an article of manufacture, which are all one of the four statutory categories of invention.
Step 2A Prong One Analysis: The claims are directed to an abstract idea. In particular, claims 1, 7, 14 and 20 recite, using respective similar language, inter alia: logic (claims 1 and 7), method steps (claim 14) and instructions (claim 20) to:
process one or more vectors with a sum of squares operation, and
determine a fixed-point approximation for the sum of squares operation.
Under its broadest reasonable interpretation in light of the specification, these processing and determining limitations encompass the mathematical concepts of calculating a sum of squares for vector values, and then calculating a fixed-point approximation for the result of the sum of squares operation as described in the specification in paragraphs 22 and 24-34.
Regarding determining a fixed-point approximation for the sum of squares operation, this “determining” step recites an “approximation” and summing operation which are mathematical concepts. 
If the claim limitations, under their broadest reasonable interpretations, cover mathematical relationships, mathematical formulas or equations, or mathematical calculations but for the recitation of generic computer components, then they fall within the “Mathematical Concepts” grouping of abstract ideas. Accordingly, claims 1, 7, 14 and 20 each recite an abstract idea. 
Therefore, the claims are directed to an abstract idea (mathematical concept). 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. Claims 1, 7, 14 and 20 do not recite any additional limitations or elements which integrate the abstract idea into a practical application. 
 In the context of claims 1, 7, 14 and 20, the “processing” (instructions) is considered to be mere instructions to apply the judicial exception (abstract idea)
 In particular, the claims only recite these additional elements – a “multi-layer neural network apparatus, comprising: a first computational layer; and a second computational layer communicatively coupled to the first computational layer, wherein one or more of the first and second computational layers include logic” (claim 1), a “semiconductor package apparatus, comprising: one or more substrates; and logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic and fixed-functionality hardware logic, the logic coupled to the one or more substrates” (claim 7), and at “least one computer readable storage medium, comprising a set of instructions” and “a computing device” (claim 20), which are recited at a high level of generality as mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (i.e., as generic computer components performing generic computer functions). See MPEP 2106.05(f).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
With respect to claim 1, the claim only recites the additional element of a “multi-layer neural network apparatus, comprising: a first computational layer; and a second computational layer communicatively coupled to the first computational layer”. Simply 



Regarding claims 2, 8, 15 and 21, these claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claims 2 and 8 are directed to apparatuses as depending from claims 1 and 7, respectively, claim 15 is directed to a method as depending from claim 14, and claim 21 is directed to a computer readable storage medium as depending from claim 20, thus the analysis for patent eligibility of claims 1, 7, 14 and 20, respectively, are incorporated herein.
Step 2A Prong 1: The claims each recite “provide overflow protection for the sum of squares operation.” This limitation does nothing to alter the fundamental nature of the claims as a mathematical concept. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mathematical concept of calculating an overflow for the sum of squared values (See, e.g., paragraphs 24, 26-30 and 32).
Step 2A Prong 2 Analysis: Mere instructions to apply the mental process electronically do not meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f). 
 The claims do not recite any additional elements that integrate the abstract idea into a practical application or provide significantly more than the abstract idea, and thus the claims are subject-matter ineligible. 
For example, claims 2, 8, 14 and 21 only recite the additional elements of “wherein the logic is further to: provide overflow protection for the sum of squares operation” and “a further set of instructions, which when executed by the computing device, cause the computing device to: provide overflow protection for the sum of squares operation”, which are mere instructions to apply the mathematical concept. Mere instructions to apply the mathematical concept electronically do not meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f). 
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Mere instructions to apply the mathematical concept electronically (i.e., with the recited “logic” and “instructions” recited in claims 2, 8, 14 and 21, respectively) do not amount to significantly more than the judicial exception. 


Regarding claims 3, 9, 16 and 22, these claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claims 3 and 9 are directed to apparatuses as depending from claims 1 and 7, respectively, claim 16 is directed to a method as depending from claim 14, and claim 22 is directed to a computer readable storage medium as depending from claim 20, thus the analysis for patent eligibility of claims 1, 7, 14 and 20, respectively, are incorporated herein.
Step 2A Prong 1: The claims each recite “provide batch normalization for the one or more vectors.” This limitation does nothing to alter the fundamental nature of the claims as a mathematical concept. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mathematical concept of calculating a normalization for a batch of vector values (See, e.g., paragraphs 22, 24, 30-31 and 33-34).
Step 2A Prong 2 Analysis: Mere instructions to apply the mental process electronically do not meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f). 
 The claims do not recite any additional elements that integrate the abstract idea into a practical application or provide significantly more than the abstract idea, and thus the claims are subject-matter ineligible. 
For example, claims 3, 9, 15 and 22 only recite the additional elements of “wherein the logic is further to: provide batch normalization for the one or more vectors” and “a further set of instructions, which when executed by the computing device, cause the computing device to: provide batch normalization for the one or more vectors”, which are mere instructions to apply the mathematical concept. Mere instructions to apply the 
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Mere instructions to apply the mathematical concept electronically (i.e., with the recited “logic” and “instructions” recited in claims 3, 9, 15 and 22, respectively) do not amount to significantly more than the judicial exception. 

Regarding claims 4, 10, 17 and 23, these claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claims 4 and 10 are directed to apparatuses as depending from claims 1 and 7, respectively, claim 17 is directed to a method as depending from claim 14, and claim 23 is directed to a computer readable storage medium as depending from claim 20, thus the analysis for patent eligibility of claims 1, 7, 14 and 20, respectively, are incorporated herein.
Step 2A Prong 1: The claims each recite “accumulate a running value corresponding to a square root of the sum of squares operation.” This limitation does nothing to alter the fundamental nature of the claims as a mathematical concept. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mathematical concept of calculating a running value for the square root of the sum of squares operation (See, e.g., paragraphs 22 and 33-34).
Step 2A Prong 2 Analysis: Mere instructions to apply the mental process electronically do not meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f). 
 The claims do not recite any additional elements that integrate the abstract idea into a practical application or provide significantly more than the abstract idea, and thus the claims are subject-matter ineligible. 
For example, claims 4, 10, 17 and 23 only recite the additional elements of “wherein the logic is further to: accumulate a running value corresponding to a square root of the sum of squares operation” and “a further set of instructions, which when executed by the computing device, cause the computing device to: accumulate a running value corresponding to a square root of the sum of squares operation”, which are mere instructions to apply the mathematical concept. Mere instructions to apply the mathematical concept electronically do not meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f). 
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Mere instructions to apply the mathematical concept electronically (i.e., with the recited “logic” and “instructions” recited in claims 4, 10, 17 and 23, respectively) do not amount to significantly more than the judicial exception. 

Regarding claims 5, 11, 18 and 24, these claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claims 5 and 11 are directed to apparatuses as depending from claims 4 and 10, respectively, claim 18 is directed to a method as depending from claim 17, and claim 24 is directed to a computer readable storage medium as depending from claim 23, thus the analysis for patent eligibility of intervening claims 4, 10, 17 and 23, respectively, and base claims 1, 7, 14 and 20, respectively, are incorporated herein.
Step 2A Prong 1: The claims each recite “determine a number of elements for the sum of squares operation based on a threshold value relative to a maximum fixed-point value; and accumulate the running value corresponding to a square root of the sum of squares operation based on the determined number of elements.” These limitations do nothing to alter the fundamental nature of the claims as a mathematical concept. Under their broadest reasonable interpretation in light of the specification, these limitations encompasses the mathematical concepts of calculating a number of elements for the sum of squares operation based on a threshold value relative to a maximum fixed-point value, and then calculating a running value for the square root of the sum of squares operation based on the calculated number of elements (See, e.g., paragraphs 22, 26 and 33-34).
Step 2A Prong 2 Analysis: Mere instructions to apply the mental process electronically do not meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f). 
 The claims do not recite any additional elements that integrate the abstract idea into a practical application or provide significantly more than the abstract idea, and thus the claims are subject-matter ineligible. 

Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Mere instructions to apply the mathematical concept electronically (i.e., with the recited “logic” and “instructions” recited in claims 5, 11, 18 and 24, respectively) do not amount to significantly more than the judicial exception. 

Claims 1-6 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. It appears independent claim 1 would reasonably be interpreted by one of ordinary skill as a system of software per se, failing to fall within a statutory category of invention. Applicants’ disclosure contains no explicit and deliberate definition for the term “a multi-layer neural network apparatus, 
Also, claims 2-6, which each depend directly indirectly from claim 1, and also each fail to recite actual structure or a hardware component/device, are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject under the same rationale as claim 1.

Claims 20-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Independent claim 20 recites “At least one computer readable storage medium, comprising a set of instructions”, each of claims 21-23 recite “The at least one computer readable storage medium of claim 20”, claim 24 recites “The at least one computer readable storage medium of claim 23” and claim 25 recites “The at least one computer readable medium storage medium of claim 20” [sic - The at least one computer readable non-transitory computer readable storage medium ... ".
Also, claims 21-25, which each depend directly indirectly from claim 20 and recite “The at least one computer readable storage medium of claim 20” (claims 21-23), “The at least one computer readable storage medium of claim 23” (claim 24), and “The at least one computer readable medium storage medium of claim 20” (claim 25) in their respective preambles (see, e.g., line 1 in each of claims 21-25), are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject under the same rationale as claim 20.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 4-8, 10-15, 17-21 and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Ito et al. (U.S. Patent Publication No. 2019/0339939 A1, hereinafter “Ito”) in view of non-patent literature Lian (“A framework for FPGA-based acceleration of neural network inference with limited numerical precision via high-level synthesis with streaming functionality. University of Toronto (Canada), 2016: i-103, hereinafter “Lian”).

With respect to claim 1, Ito discloses the invention as claimed including a multi-layer neural network apparatus, comprising:	
a first computational layer; and
a second computational layer communicatively coupled to the first computational layer (see, e.g., FIG. 3 – depicting a multi-layer neural network implementation/apparatus with convolution and sub-sampling layers communicatively coupled to each other (dashed lines) and paragraphs 62-63, “deep training in a neural network is illustrated with reference to FIG. 3. The neural network may be a hardware circuit” [i.e., a multi-layer neural network hardware circuit/apparatus], “The neural network of FIG. 3 performs convolution layer processing and pooling layer processing on an input image to extract image features and identify an image. That is, in FIG. 3, processing in the forward direction is illustrated. In FIG. 3, the processing of the convolution layer and the processing of the pooling layer are performed on the input image which is an input layer … The neural network in FIG. 3 outputs the identification result in the fully connected multilayer perceptron (MLP) that is the final layer. The pooling layer is also referred to as a sub-sampling layer. The final layer is also referred to as a fully connected layer.” [i.e., multi-layer neural network apparatus/hardware , wherein one or more of the first and second computational layers include logic to:
process one or more vectors with a sum of squares operation (see, e.g., paragraphs 90 and 76, “Here, Ini is input data (vector) and Ti is correct data (vector). As described below, the information processing device of the comparative example acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them in variables in the computer program, and automatically adjusts the fixed point position of the variable to be used in the deep training”, “Now, for example, when the sum of squares of difference values between the output value y, as the identification result and the correct data Ti is exemplified as the evaluation function as an evaluation function of an error, definition as (Formula 5) can be made. The training processing can be considered as processing of determining the weight w for minimizing the error evaluation function” [i.e., process vector values for Ti with a sum of squares operation]), and determine a fixed-point approximation (see, e.g., paragraphs 90 and 96, “the information processing device … acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them in variables in the computer program, and automatically adjusts the fixed point position of the variable”, “the information processing device calculates the difference in weight while propagating the error in the back method from the fully connected layer 1 (fc1) to the first convolution layer (Conv 1). The information processing device repeats k times of forward propagation and back propagation as described above using k sets of input data. The information processing device updates 
Although Ito substantially discloses the claimed invention, Ito is not relied on for explicitly disclosing determine a fixed-point approximation for the sum of squares operation.
In the same field, analogous art Lian teaches determine a fixed-point approximation for the sum of squares operation (see, e.g., pages 12 and 15, “The approximation error can be calculated … It computes the sum of squares of differences between each model output o and desired output t”, “in custom hardware, we have the ability to use a more cost-efficient fixed-point representation … We propose to use heterogeneous fixed-point representations during neural network inference” [i.e., determine a fixed-point approximation for the sum of squares operation’s result]).
Alternatively, Lian also teaches a multi-layer neural network apparatus, comprising:	
a first computational layer (see, e.g., pages 18-19 – Chapter 3, “We propose to use heterogeneous fixed-point representations during neural network inference, in hope of the maximizing computational throughput of our custom hardware”, “our experiments show that different parts of a neural network can have very diverse value ranges, e.g., one layer’s neuron values can range from −34.8 to 16.9, while another layer’s bias only ranges within ±3.2 × 10−5.” [i.e., a multi-layer neural network apparatus/hardware comprising first and second computational layers for computing values]); and
a second computational layer communicatively coupled to the first computational layer (see, e.g., pages 8 and 13-15 – Chapter 2, “In a fully-connected layer, every neuron is connected to all the neurons in its previous layer.”, “convolutional layer applies the filters on all feature maps in the second layer, and output neurons in fully-connected layers are connected to all input neurons in the previous layers.” [i.e., the first and second computational layers are communicatively coupled/connected – second layer is connected to previous, first layer]), wherein one or more of the first and second computational layers include logic to:
process one or more vectors with a sum of squares operation (see, e.g., pages 6-7 and 12 – Chapter 2, “Training of a neural network is the process of finding a set of parameters (weights and bias) that minimize the model’s approximation error on the training dataset. The approximation error can be calculated … It computes the sum of squares of differences between each model output o and desired output t” [i.e., process vector of values with a sum of squares operation], “the fully-connected layer … takes in the feature maps as an input vector of 6400 (4×4×40) neurons, and produces 10 outputs: one corresponding to each digit class” [i.e., one of the layers processes a vector to produce outputs]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ito to incorporate the teachings of Lian to provide “an FPGA-based acceleration solution for DNN inference, realized on a SoC device where software controls the execution and off-loads compute-intensive operations to the hardware accelerator.” where “limited precision data representations are investigated for DNN computations, and incorporated in the accelerator design.” 

With respect to independent claim 7, Ito discloses the invention as claimed including a semiconductor package apparatus, comprising:
one or more substrates; and
logic coupled to the one or more substrates (Aside from repeating the claim language, see, e.g., paragraphs 15 and 57, the specification does not provide examples of, or define what is meant by “a semiconductor package apparatus” comprising “logic coupled to the one or more substrates”. In the context of computer technology and hardware, the plain meaning of a semiconductor package, or a chip package is the housing that holds an integrated circuit - See https://www.pcmag.com/encyclopedia/term/package, the plain meaning of logic is the sequence of operations performed by hardware or software where hardware logic is contained in the electronic circuits - See https://www.pcmag.com/encyclopedia/term/logic, and the plain meaning of a substrate is the base layer of a structure such as a chip, multichip module (MCM), printed circuit board or disk platter. See https://www.pcmag.com/encyclopedia/term/substrate. Therefore, “a semiconductor package apparatus” comprising “logic coupled to the one , wherein the logic is at least partly implemented in one or more of configurable logic and fixed-functionality hardware logic, the logic coupled to the one or more substrates (see, e.g., paragraphs 62, 122, 145 and 214, “The neural network may be a hardware circuit”, “processor 10 includes … an operator for scalar operator (arithmetic logic unit (ALU)) 141, and an accumulator 132 that adds the result of the operator 131 for vector operation”, “The non-sign most significant bit detector is, for example, a logic circuit”, “The integrated circuit includes an LSI, an application specific integrated circuit (ASIC), and a programmable logic device (PLD). The PLD includes, for example, a field-programmable gate array (FPGA)” [i.e., the logic is implemented as configurable/programmable logic or fixed-functionality/application specific logic]), to:
process one or more vectors with a sum of squares operation (see, e.g., paragraphs 90 and 76, “Here, Ini is input data (vector) and Ti is correct data (vector). As described below, the information processing device of the comparative example acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them in variables in the computer program, and automatically adjusts the fixed point position of the variable to be used in the deep training”, “Now, for example, when the sum of squares of difference values between the output value y, as the identification result and the correct data Ti is exemplified as the evaluation function as an evaluation function of an error, definition as (Formula 5) can be made. The training processing can be considered as processing of determining the weight w for minimizing the error evaluation function” [i.e., process vector values for Ti with a sum of squares operation]), and determine a fixed-point approximation (see, e.g., paragraphs 90 and 96, “the information processing device … acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them in variables in the computer program, and automatically adjusts the fixed point position of the variable”, “the information processing device calculates the difference in weight while propagating the error in the back method from the fully connected layer 1 (fc1) to the first convolution layer (Conv 1). The information processing device repeats k times of forward propagation and back propagation as described above using k sets of input data. The information processing device updates the fixed point position of each variable based on the number of times of overflow of the 
Although Ito substantially discloses the claimed invention, Ito is not relied on for explicitly disclosing determine a fixed-point approximation for the sum of squares operation.
In the same field, analogous art Lian teaches determine a fixed-point approximation for the sum of squares operation (see, e.g., pages 12 and 15, “The approximation error can be calculated … It computes the sum of squares of differences between each model output o and desired output t”, “in custom hardware, we have the ability to use a more cost-efficient fixed-point representation … We propose to use heterogeneous fixed-point representations during neural network inference” [i.e., determine a fixed-point approximation for the sum of squares operation’s result]).
Alternatively, Lian also teaches one or more of configurable logic and fixed-functionality hardware logic (see, e.g., FIG. 5.6 depicting circuit datapath and logic and pages 2, 18, 45 and 86 – Chapters 1, 3 and 5, “describing hardware components in a hardware description language (HDL) (e.g. VHDL or Verilog) allows a designer to adopt existing tools for RTL and logic synthesis into the target technology.”, “We propose to use heterogeneous fixed-point representations during neural network inference, in hope of the maximizing computational throughput of our custom hardware”, “we describe work done to support function pipelining in LegUp, including the pipelined function interface, FIFO support and stall logic implementation.”, “We port the design onto a larger device, an Arria V SoC FPGA (on the Arria V SoC development board), ∼5.5× more logic elements” [i.e., configurable and fixed-functionality hardware logic]) to:
process one or more vectors with a sum of squares operation (see, e.g., pages 6-7 and 12 – Chapter 2, “Training of a neural network is the process of finding a set of parameters (weights and bias) that minimize the model’s approximation error on the training dataset. The approximation error can be calculated … It computes the sum of squares of differences between each model output o and desired output t” [i.e., process vector of values with a sum of squares operation], “the fully-connected layer … takes in the feature maps as an input vector of 6400 (4×4×40) neurons, and produces 10 outputs: one corresponding to each digit class” [i.e., one of the layers processes a vector to produce outputs]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ito to incorporate the teachings of Lian to provide “an FPGA-based acceleration solution for DNN inference, realized on a SoC device where software controls the execution and off-loads compute-intensive operations to the hardware accelerator.” where “limited precision data representations are investigated for DNN computations, and incorporated in the accelerator design.” (See, e.g., Lian, page ii, Abstract). Doing so would have allowed Ito to use Lian’s acceleration solution and device as “custom hardware for accelerating the DNN computation with a low power budget” and “To minimize the hardware cost,” as suggested by Lian (See, e.g., Lian, page ii, Abstract). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

With respect to independent claim 14, Ito discloses the invention as claimed including a method of machine learning (see, e.g., paragraphs 2, 81, 90, 103 and 200, and claim 14 - “An information processing method comprising: performing, by a computer, deep training”, “The embodiment relates to an operation processing device, an information processing device including the operation processing device, a method”, “In the training processing of the neural network by the maximum gradient descent method”, “Here, the mini-batch is a combination of k pieces of data obtained by dividing a set of input data to be learned” [i.e., machine learning], “an information processing method executed by the information processing device 1”, “the information processing device 1 calculates the difference in weight while propagating the error in the reverse method from the fully connected layer 2 (fc2) to the first convolution layer (Conv_1).” [i.e., a method of machine learning]), comprising:
processing one or more vectors with a sum of squares operation (see, e.g., paragraphs 90 and 76, “Here, Ini is input data (vector) and Ti is correct data (vector). As described below, the information processing device of the comparative example acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them in variables in the computer program, and automatically adjusts the fixed point position of the variable to be used in the deep training”, “Now, for example, when the sum of squares of difference values between the output value y, as the identification result and the correct data Ti is exemplified as the evaluation function as an evaluation function of an error, definition as (Formula 5) can be made. The training processing can be , and determining a fixed-point approximation (see, e.g., paragraphs 90 and 96, “the information processing device … acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them in variables in the computer program, and automatically adjusts the fixed point position of the variable”, “the information processing device calculates the difference in weight while propagating the error in the back method from the fully connected layer 1 (fc1) to the first convolution layer (Conv 1). The information processing device repeats k times of forward propagation and back propagation as described above using k sets of input data. The information processing device updates the fixed point position of each variable based on the number of times of overflow of the counter variable corresponding to each variable after the k mini-batches are finished.” [i.e., determine a fixed-point approximation/position for each variable]). 
Although Ito substantially discloses the claimed invention, Ito is not relied on for explicitly disclosing determining a fixed-point approximation for the sum of squares operation.
In the same field, analogous art Lian teaches determining a fixed-point approximation for the sum of squares operation (see, e.g., pages 12 and 15, “The approximation error can be calculated … It computes the sum of squares of differences between each model output o and desired output t”, “in custom hardware, we have the ability to use a more cost-efficient fixed-point representation … We propose to use 
Alternatively, Lian also teaches a multi-layer neural network apparatus, comprising:	
a first computational layer (see, e.g., pages 18-19 – Chapter 3, “We propose to use heterogeneous fixed-point representations during neural network inference, in hope of the maximizing computational throughput of our custom hardware”, “our experiments show that different parts of a neural network can have very diverse value ranges, e.g., one layer’s neuron values can range from −34.8 to 16.9, while another layer’s bias only ranges within ±3.2 × 10−5.” [i.e., a multi-layer neural network apparatus/hardware comprising first and second computational layers for computing values]); and
a second computational layer communicatively coupled to the first computational layer (see, e.g., pages 8 and 13-15 – Chapter 2, “In a fully-connected layer, every neuron is connected to all the neurons in its previous layer.”, “convolutional layer applies the filters on all feature maps in the second layer, and output neurons in fully-connected layers are connected to all input neurons in the previous layers.” [i.e., the first and second computational layers are communicatively coupled/connected – second layer is connected to previous, first layer]), wherein one or more of the first and second computational layers include logic to:
process one or more vectors with a sum of squares operation (see, e.g., pages 6-7 and 12 – Chapter 2, “Training of a neural network is the process of finding a set of parameters (weights and bias) that minimize the model’s approximation error on the training dataset. The approximation error can be calculated … It computes the sum 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ito to incorporate the teachings of Lian to provide “an FPGA-based acceleration solution for DNN inference, realized on a SoC device where software controls the execution and off-loads compute-intensive operations to the hardware accelerator.” where “limited precision data representations are investigated for DNN computations, and incorporated in the accelerator design.” (See, e.g., Lian, page ii, Abstract). Doing so would have allowed Ito to use Lian’s acceleration solution and device as “custom hardware for accelerating the DNN computation with a low power budget” and “To minimize the hardware cost,” as suggested by Lian (See, e.g., Lian, page ii, Abstract). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.
With respect to independent claim 20, Ito discloses the invention as claimed including at least one computer readable storage medium, comprising a set of instructions, which when executed by a computing device, cause the computing device (see, e.g., paragraphs 111 and 216-217, “The processor 10 executes instructions”, “A program that causes a computer or other machine or device (hereinafter referred to as a computer or the like) to realize any of the functions to:
process one or more vectors with a sum of squares operation (see, e.g., paragraphs 90 and 76, “Here, Ini is input data (vector) and Ti is correct data (vector). As described below, the information processing device of the comparative example acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them in variables in the computer program, and automatically adjusts the fixed point position of the variable to be used in the deep training”, “Now, for example, when the sum of squares of difference values between the output value y, as the identification result and the correct data Ti is exemplified as the evaluation function as an evaluation function of an error, definition as (Formula 5) can be made. The training processing can be considered as processing of determining the weight w for minimizing the error evaluation function” [i.e., process vector values for Ti with a sum of squares operation]), and determine a fixed-point approximation (see, e.g., paragraphs 90 and 96, “the information processing device … acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep 
Although Ito substantially discloses the claimed invention, Ito is not relied on for explicitly disclosing determine a fixed-point approximation for the sum of squares operation.
In the same field, analogous art Lian teaches determine a fixed-point approximation for the sum of squares operation (see, e.g., pages 12 and 15, “The approximation error can be calculated … It computes the sum of squares of differences between each model output o and desired output t”, “in custom hardware, we have the ability to use a more cost-efficient fixed-point representation … We propose to use heterogeneous fixed-point representations during neural network inference” [i.e., determine a fixed-point approximation for the sum of squares operation’s result]).
Alternatively, Lian also teaches a multi-layer neural network apparatus, comprising:	
a first computational layer (see, e.g., pages 18-19 – Chapter 3, “We propose to use heterogeneous fixed-point representations during neural network inference, in hope −5.” [i.e., a multi-layer neural network apparatus/hardware comprising first and second computational layers for computing values]); and
a second computational layer communicatively coupled to the first computational layer (see, e.g., pages 8 and 13-15 – Chapter 2, “In a fully-connected layer, every neuron is connected to all the neurons in its previous layer.”, “convolutional layer applies the filters on all feature maps in the second layer, and output neurons in fully-connected layers are connected to all input neurons in the previous layers.” [i.e., the first and second computational layers are communicatively coupled/connected – second layer is connected to previous, first layer]), wherein one or more of the first and second computational layers include logic to:
process one or more vectors with a sum of squares operation (see, e.g., pages 6-7 and 12 – Chapter 2, “Training of a neural network is the process of finding a set of parameters (weights and bias) that minimize the model’s approximation error on the training dataset. The approximation error can be calculated … It computes the sum of squares of differences between each model output o and desired output t” [i.e., process vector of values with a sum of squares operation], “the fully-connected layer … takes in the feature maps as an input vector of 6400 (4×4×40) neurons, and produces 10 outputs: one corresponding to each digit class” [i.e., one of the layers processes a vector to produce outputs]).


	Regarding claims 2, 8, 15 and 21, as discussed above, Ito in view of Lian teaches the apparatuses of claims 1 and 7, the method of claim 14, and the computer readable storage medium of claim 20.
	Ito further discloses provide overflow protection for the sum of squares operation (see, e.g., FIG. 5 – depicting “S3: WHEN OVERFLOW OCCURS DURING TRAINING, PERFORM SATURATION PROCESSING” for the “correct data” and paragraphs 76, 95 and 100, “when the sum of squares of difference values between the output value y, as the identification result and the correct data Ti is exemplified as the evaluation function as an evaluation function of an error, definition as (Formula 5) can be made.”, “the information processing device accumulates errors … The information 

Regarding claims 4, 10, 17 and 23, as discussed above, Ito in view of Lian teaches the apparatuses of claims 1 and 7, the method of claim 14, and the computer readable storage medium of claim 20.
Ito further discloses accumulate a running value corresponding to a square root of the sum of squares operation (see, e.g., FIG. 7, step C5 for “STATISTICAL INFORMATION ACCUMULATION” and paragraphs 80-81 and 90, “The left side of (Formula 10) indicates an error of the l-th layer. The right side of (Formula 10) is a total of the result of multiplying the error of the 1+1-th layer by the variable wi,j 1 of the weight between the pixel i of the l-th layer and the pixel j of the I+ 1-th layer. This total is the total for the pixel j of the 1+1-th layer related to the pixel i of the I-th layer.”, “the information processing device of the comparative example acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them” [i.e., accumulating a total/running value corresponding to a square root of the sum of squares – the accumulating the result of the sum of squares]).


Ito further discloses determine a number of elements for the sum of squares operation based on a threshold value relative to a maximum fixed-point value (see, e.g., paragraphs 76, 81 and 89, “Now, for example, when the sum of squares of difference values between the output value y, as the identification result and the correct data Ti is exemplified as the evaluation function as an evaluation function of an error, definition as (Formula 5) can be made. The training processing can be considered as processing of determining the weight w for minimizing the error evaluation function exemplified in (Formula 5)”, “When (Formula 5) is partially differentiated with weight wi,j … by the maximum gradient descent method, the gradient of the evaluation function E of the error … when the maximum gradient descent method is applied to the error evaluation function E” [i.e., determine a number of elements/difference values for the sum of squares operation based on the identification result/threshold relative to a maximum gradient descent value], “The information processing device of the comparative example includes a processor capable of executing the process of the dynamic fixed point number.” [i.e., the maximum value is a fixed-point value]); and
accumulate the running value corresponding to a square root of the sum of squares operation based on the determined number of elements (see, e.g., paragraphs 76, 80-81 and 90, “when the sum of squares of difference values between the output value y, as the identification result and the correct data Ti is exemplified as the evaluation function as an evaluation function of an error”, “The left side of (Formula i,j 1 of the weight between the pixel i of the l-th layer and the pixel j of the I+ 1-th layer. This total is the total for the pixel j of the 1+1-th layer related to the pixel i of the I-th layer.”, “the information processing device of the comparative example acquires the number of times of overflow of each variable of each layer for each predetermined number of mini-batches during deep training, accumulates them” [i.e., accumulate the total/running value corresponding to a square root of the sum of squares based on the determined number of elements/difference values for the sum of squares]).

Regarding claims 6, 12, 19 and 25, as discussed above, Ito in view of Lian teaches the apparatuses of claims 1 and 7, the method of claim 14, and the computer readable storage medium of claim 20.
Ito further discloses provide one or more of supervised and unsupervised learning for one or more of ... an image processing, … and a machine learning application (see, e.g., FIG. 4, depicting processing of “INPUT IMAGE” and paragraphs 62, 74 and 90, “The neural network of FIG. 3 performs convolution layer processing and pooling layer processing on an input image to extract image features and identify an image.”, “in the neural network of FIG. 4, the recognition processing in the forward direction is performed by the convolution layer that performs the convolution operation on the input image” [i.e., an image processing application], “The deep training is performed divided into processing units called mini-batches. Here, the mini-batch is a combination of k pieces of data obtained by dividing a set of input data to be learned” 
Although Ito substantially discloses the claimed invention, Ito is not relied on for explicitly disclosing provide one or more of supervised and unsupervised learning for one or more of a speech processing application, an image processing, a pattern processing application, and a machine learning application.
In the same field, analogous art Lian teaches provide one or more of supervised and unsupervised learning for one or more of a speech processing application, an image processing, a pattern processing application, and a machine learning application (see, e.g., pages ii, 1 and 11-12 – Abstract and Chapters 1-2, “Deep neural networks (DNNs) have gained prominence recently by producing state-of-the-art results in pattern recognition, speech synthesis, customer preference elicitation, and other machine learning tasks”, “inference of a DNN refers to the use of an already-trained model to make a prediction for an input that was not seen during training. For example, in the case of image recognition, when the DNN is presented with an image of a tree, the DNNs outputs recognize it as so.” [i.e., an image processing application], “In this dissertation, we experiment with two image recognition benchmarks: a small model designed to recognize the MNIST dataset [26] of handwritten digits and a more sophisticated image classification model for the ImageNet dataset”, “ImageNet [27] is a large image dataset organized primarily by the Standford [sic – Stanford] Vision Lab [4]. The dataset contains more than 15 million high-resolution images belonging to around 22,000 categories. The images are collected from the 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ito to incorporate the teachings of Lian to provide “an FPGA-based acceleration solution for DNN inference, realized on a SoC device where software controls the execution and off-loads compute-intensive operations to the hardware accelerator.” where “limited precision data representations are investigated for DNN computations, and incorporated in the accelerator design.” (See, e.g., Lian, page ii, Abstract). Doing so would have allowed Ito to use Lian’s acceleration solution and device as “custom hardware for accelerating the DNN computation with a low power budget” and “To minimize the hardware cost,” as suggested by Lian (See, e.g., Lian, page ii, Abstract). This is an example of “use of known technique to improve similar devices (methods, or products) in the same way.” See MPEP 2143.

Claims 3, 9, 16 and 22 are rejected are rejected under 35 U.S.C. 103 as being unpatentable over Ito in view of Lian as applied to claims 1, 7, 14 and 20 above, and further in view of Chen et al. (International Patent Application Pub. No. WO 2018/140294 A1, cited in Applicant’s IDS submitted on 5/04/2020, hereinafter “Chen”).

Although Ito in view of Lian substantially teaches the claimed invention, Ito in view of Lian is not relied on to teach provide batch normalization for the one or more vectors.
In the same field, Chen teaches provide batch normalization for the one or more vectors (see, e.g., paragraphs 55-56, “A normalization sub-layer 316 represents integer batch normalization (IBN) sublayer, which normalizes input tensor within a mini-batch with mean and variance. Different from conventional batch normalization performed in floating-point domain, all intermediate results involved in the sub-layer 316 are either 32-bit integers or low resolution fixed-point values. Since integer is a special fixed-point number, the IBN sub-layer 316 only includes corresponding fixed-point operations. Subsequently, the quantization sublayer 318 converts the output of the IBN sub-layer 316 to a predefined fixed-point format.”, “Specifically, for the IBN sub-layer 316, the input may be fixed-point input in a mini-batch … including N elements. To obtain normalized output … , the sum … and the sum of squares … of all inputs can be determined.” [i.e., integer batch normalization (IBN) sublayer provides batch normalization for the input tensor/vector]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ito in view of Lian to incorporate the teachings of Chen in order to provide “a solution for training a convolutional neural network” where “parameters of the neural network are stored in a fixed-point format, 

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, is considered pertinent to applicant's disclosure.
For example, Luu et al. (International Patent Application Pub. No. WO 2016/186813 A1, cited in Applicant’s IDS submitted on 5/04/2020, hereinafter “Luu”) discloses that “normalization units, e.g., normalization units 626, 630, use outputs from the staggered groups to generate a corresponding component, e.g., a square of activation values inside registers of the staggered group, used to compute a normalized 
The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.K.B./Examiner, Art Unit 2125 

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125