DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in response to the amendments and arguments filed on 02/08/2022.
Claims 1-3, 5-13, and 15-20 are currently pending.
Claims 1, 5, 11, and 15 have been amended.
Claims 4 and 14 have been cancelled.

Response to Amendment
The previous objection to claims 1 and 11 is withdrawn in view of Applicant’s amendment.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1 and 11 have been considered but are moot because the new grounds of rejection are necessitated by Applicant’s amendments to the claims requiring multiplying both the initial weight values and initial bias values by a positive constant depending on the layers of the multi-layer neural network.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 5-13, and 15-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stemmer (WO 2016039651 A1) and further in view of Deisher (US 20190042935 A1).

With respect to claim 1, Stemmer teaches An arithmetic processing device (pg. 4 ln. 23-31, “The material disclosed herein may be implemented in hardware, firmware, software, or any combination thereof. The material disclosed herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., earner waves, infrared signals, digital signals, etc.), and others.”)
to realize a multi-layer convolutional neural network circuit (pg. 11 ln. 13-19, “FIG. 3 is an illustrative diagram of example neural network 204, arranged in accordance with at least some implementations of the present disclosure. Neural network 204 may include any suitable neural network such as an artificial neural network, a deep neural network, a convoluiional neural network, or the like. As shown in FIG. 3, neural network 204 may include an input layer 301, hidden layers 302-305, and an output layer 306. Neural network 204 is illustrated as having three input nodes, hidden layers with four nodes each, and six output nodes for the sake of clarity of presentation.”)
to perform a process with fixed-point number format, (pg. 5, ln. 18-19, "In some embodiments discussed herein, neural network weights (e.g., parameters) may be represented as fixed point integer values such as 8 bit fixed point integer values.")
comprising: a processing circuitry (pg. 4 ln. 23-31, “The material disclosed herein may be implemented in hardware, firmware, software, or any combination thereof. The material disclosed herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., earner waves, infrared signals, digital signals, etc.), and others.”)
and a memory, (pg. 4 ln. 23-31, “The material disclosed herein may be implemented in hardware, firmware, software, or any combination thereof. The material disclosed herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., earner waves, infrared signals, digital signals, etc.), and others.”)
the processing circuitry conducting: a learning process to perform weight learning or bias learning using learning data stored in the memory to calculate initial weight values and initial bias values for layers of the multi-layer convolutional neural network circuit; (pg. 10 ln. 29-32, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself.”)
a trial recognition process to perform a recognition process to part of the learning data or of input data using the initial weight values and the initial bias values; (pg. 10 ln. 29 - pg 11. ln. 3, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200.”; Fig. 2: The trial recognition process uses neural network 204 (using weights determined during the learning process) to produce classification scores 205 and further resulting in recognized word sequence 208 as a product of a recognition process.)
a processing treatment process to multiply the initial weight values and the initial bias values by a positive constant depending on the layers to calculate processed weight values and processed bias values for the layers; (Stemmer discloses a processing treatment process to multiply the initial weight values by a positive constant depending on the layers to calculate processed weight values for the layers. Multiplying the initial bias values by the positive constant will be taught below, by Deisher. pg. 14 ln. 25-32, “Process 400 may continue at decision operation 404, "C ≤ L?", where it may be determined whether the correction count, C, is less than or equal to the corrections limit, L. As shown, if the correction count, C, is not less than or equal to the corrections limit, L, process 400 may continue at operation 405, "Increase S", where the scaling factor, S, may be increased by any suitable amount. For example, scaling factor, S, may be increased by a factor of 2 such that S = Sx2). In such examples, process 400 may continue at operation 403, where, as discussed, a corrections count, C, may be determined (continuing the above example, the weights when multiplied by the scaling factor (now 2) must be less than 128 to not require correction).” ; Fig. 4, 5: Illustrate the process of determining a scaling factor and applying corrections. Step 403 is related to modifying S (positive constant) so that neural network weight values may be fitted to a fixed point integer representation by multiplying them with S; pg. 2 ln. 11-15, “FIG. 4 is a flow diagram illustrating an example process for determining a scaling factor for a layer of a neural network; FIG. 5 is a flow diagram illustrating an example process for converting weights of a layer of a neural network to fixed point integer values based on a scaling factor and generating correction values for one or more of the weights;” The process of determining the scaling factor is conducted per layer.)
and a recognition process using the processed weight values and the processed bias values. (pg. 10 ln. 29 - pg 11. ln. 3, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200.”; Fig. 9: Classification scores are the results of a recognition process. Neural Network 901 is supplied with Neural Network Weights, Biases, & Corrections (processed weight values and bias values.)
	But Stemmer does not explicitly teach a processing treatment process to multiply the initial weight values and the initial bias values by a positive constant depending on the layers to calculate processed weight values and processed bias values for the layers.
	Deisher, however, does teach a processing treatment process to multiply the initial weight values and the initial bias values by a positive constant depending on the layers to calculate processed weight values and processed bias values for the layers. (Deisher discloses a processing treatment process to multiply the initial bias values by a positive constant depending on the layers to calculate processed bias values for the layers. Multiplying the initial weight values by the positive constant has been taught above, by Stemmer. Para. [0034], “At block 322, the floating-point bias of layer 0 is obtained, where the floating-point bias was determined during training. At block 312, the biases from layer 0 (L0) are scaled. To upscale the biases, a target maximum in (TargetMaxIn) is divided by the maximum absolute value (MaxIn) and multiplied by the target weight layer 0 (TargetWeightL0) divided by the maximum weight layer 0 (MaxWeightL0). In examples, there is a target range for the input and a target range for the weight. The target range for the output and the biases is the target range for the input multiplied by the target range for the weight. As a result, the biases are scaled up through multiplication by [TargetMaxIn/MaxIn]* [TargetWgtL0/MaxWgtL0]. The target range for the biases is TargetMaxIn* TargetWgtL0. At block 324, the scaled up floating point biases are rounded and converted to an integer form. At block 326, the dynamically scaled integer bias is provided to layer 0 (L0) of the neural network and can now be used for future execution of layer 0 of the neural network.”; para. [0024], “In this manner, updated scale factors are propagated through the graph of the neural network. The values can then be rounded to the nearest integer value. The present techniques may be applied to neural networks of any size, with any number of layers. In embodiments, a neural network is re-quantized in a layer by layer fashion.” The initial bias values for the layer in Deisher are scaled by multiplying them by a positive constant, and the resulting processed bias values are then to be used for execution of the layer of the neural network. In multi-layer neural networks, this process is done layer by layer, meaning the positive constant depends on the layers.)
	It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the arithmetic processing device of Stemmer with a processing treatment process to multiply the initial bias values by a positive constant depending on the layers to calculate processed bias values for the layers in order to calculate initial desired scale factors of a plurality of inputs, weights and a bias. (Deisher, Abstract)

With respect to claim 2, Stemmer teaches the arithmetic processing device according to claim 1, and Stemmer also teaches wherein the positive constant is determined by a result of the recognition process of the trial recognition process. (Fig. 4, 5 disclose an iterative process for determining the scaling factor (positive constant). As is apparently in steps 403-405, the scaling factor (positive constant) is modified with respect to the weight values, which are a results of Stemmer’s training, previously established to include the trial recognition process.)

	With respect to claim 3, Stemmer teaches the arithmetic processing device according to claim 1, and Stemmer also teaches wherein the positive constant depends on the initial weight values or the initial bias values. (pg. 14 ln. 13-19, “Process 400 may continue at operation 403, "Determine Correction Count, C, as the Number of Weights that do not fit the Fixed Point Integer Representation when Multiplied by S", where a correction count, C, may be determined as a number of weights for the layer that do not fit the fixed point integer representation when multiplied by S. For example, as discussed, weights for a layer may be converted from 32 bit floating point values to fixed point integer values. In some examples, the weights may be converted to 8 bit signed fixed point integer vales (e.g., having a scaling factor, S).” The scaling factor S is determined by attempting to fit the weight values for a given layer into fixed point integer representation.)


With respect to claim 5, Stemmer teaches the arithmetic processing device according to claim 1, and Stemmer also teaches wherein the positive constant is determined from an intermediate value in a process of each layer, (pg. 14 ln. 13-19, “Process 400 may continue at operation 403, "Determine Correction Count, C, as the Number of Weights that do not fit the Fixed Point Integer Representation when Multiplied by S", where a correction count, C, may be determined as a number of weights for the layer that do not fit the fixed point integer representation when multiplied by S. For example, as discussed, weights for a layer may be converted from 32 bit floating point values to fixed point integer values. In some examples, the weights may be converted to 8 bit signed fixed point integer vales (e.g., having a scaling factor, S).”; Fig. 4, 5 disclose an iterative process for determining a scaling factor. As is apparently in steps 403-405, the scaling factor (positive constant) is modified with respect to the weight values, which are a results of Stemmer’s training, previously established to include the trial recognition process. According to Applicant’s specification, “In the present specification, all values related to an arithmetic process performed in each-layer processing, that is, input values, weight values, bias values, values obtained by any one of addition, subtraction, multiplication and division of those values, values obtained further by any one of addition, subtraction, multiplication and division of those arithmetic results, and final values in those processes are referred to as intermediate values.” The weights are intermediate values used in determining the scaling factor (positive constant).)
the intermediate value being obtained as a result of the recognition process of the trial recognition process. (pg. 14 ln. 13-19, “Process 400 may continue at operation 403, "Determine Correction Count, C, as the Number of Weights that do not fit the Fixed Point Integer Representation when Multiplied by S", where a correction count, C, may be determined as a number of weights for the layer that do not fit the fixed point integer representation when multiplied by S. For example, as discussed, weights for a layer may be converted from 32 bit floating point values to fixed point integer values. In some examples, the weights may be converted to 8 bit signed fixed point integer vales (e.g., having a scaling factor, S).”; Fig. 4, 5 disclose an iterative process for determining a scaling factor. As is apparently in steps 403-405, the scaling factor (positive constant) is modified with respect to the weight values, which are a results of Stemmer’s training, previously established to include the trial recognition process. According to Applicant’s specification, “In the present specification, all values related to an arithmetic process performed in each-layer processing, that is, input values, weight values, bias values, values obtained by any one of addition, subtraction, multiplication and division of those values, values obtained further by any one of addition, subtraction, multiplication and division of those arithmetic results, and final values in those processes are referred to as intermediate values.” The weights are intermediate values used in determining the scaling factor (positive constant) and, as previously established for claim 1, the initial weight values are determined during the trial recognition process.)

With respect to claim 6, Stemmer teaches the arithmetic processing device according to claim 5, and Stemmer also teaches wherein the positive constant is a value calculated from a maximum value of absolute values of intermediate values in a process in each layer, (pg 6, ln. 8-14, “For example, weights of the neural network may be converted from floating point values to 8 bit signed fixed point integer values with an associated scaling factor. The scaling factor may be determined based on a predetermined limit of a coiTection count for a particular layer of the neural network. For example, the predetermined limit may be the number of nodes in the layer increased by a factor (e.g., a factor of 3, 4, or 5 or the like). The scaling factor may then be determined as a maximum scaling factor value that provides for a maximum number of corrected weights for the neural network layer that is just below the predetermined limit." The scaling factor is adjusted to ensure that the maximum weight value for the layer fits into the fixed bit integer representation.)
the intermediate values being obtained as a result of the recognition process of the trial recognition process, (According to Applicant’s specification, “In the present specification, all values related to an arithmetic process performed in each-layer processing, that is, input values, weight values, bias values, values obtained by any one of addition, subtraction, multiplication and division of those values, values obtained further by any one of addition, subtraction, multiplication and division of those arithmetic results, and final values in those processes are referred to as intermediate values.” The initial weight values, being determined during the trial recognition process, are intermediate values.) 
and from a positive number common to the layers of the multi-layer convolutional neural network circuit. (pg. 14 ln. 25 – pg. 15 ln. 6, “Process 400 may continue at decision operation 404, "C ≤ L?", where it may be determined whether the correction count, C, is less than or equal to the corrections limit, L. As shown, if the correction count, C, is not less than or equal to the corrections limit, L, process 400 may continue at operation 405, "Increase S", where the scaling factor, S, may be increased by any suitable amount. For example, scaling factor, S, may be increased by a factor of 2 such that S = Sx2). In such examples, process 400 may continue at operation 403, where, as discussed, a corrections count, C, may be determined (continuing the above example, the weights when multiplied by the scaling factor (now 2) must be less than 128 to not require correction). The corrections count, C, may, via continued iterations as needed, be increased until the correction count, C, is greater than the corrections limit, L, as discussed with respect to decision operation 404 and process 400 may continue at operation 406, "Decrease S", where the scaling factor, S, may be decreased by the amount the scaling factor is increased by at operation 405. For example, scaling factor, S, may be decreased by a factor of 2 at operation 406 such that S = S/2. As shown, process may, subsequent to operation 406, end at ending operation 407.” In Stemmer, scaling factors are increased and decreased by multiplying and dividing by 2, a positive number.  Similarly, in Applicant's invention, the positive constant is an integer power of 2. Positive constant = 2^x.)

With respect to claim 7, Stemmer teaches the arithmetic processing device according to claim 6, and Stemmer also teaches wherein the positive number common to the layers of the multi-layer convolutional neural network circuit is an integer power of 2. (pg. 14 ln. 25 – pg. 15 ln. 6, “Process 400 may continue at decision operation 404, "C ≤ L?", where it may be determined whether the correction count, C, is less than or equal to the corrections limit, L. As shown, if the correction count, C, is not less than or equal to the corrections limit, L, process 400 may continue at operation 405, "Increase S", where the scaling factor, S, may be increased by any suitable amount. For example, scaling factor, S, may be increased by a factor of 2 such that S = Sx2). In such examples, process 400 may continue at operation 403, where, as discussed, a corrections count, C, may be determined (continuing the above example, the weights when multiplied by the scaling factor (now 2) must be less than 128 to not require correction). The corrections count, C, may, via continued iterations as needed, be increased until the correction count, C, is greater than the corrections limit, L, as discussed with respect to decision operation 404 and process 400 may continue at operation 406, "Decrease S", where the scaling factor, S, may be decreased by the amount the scaling factor is increased by at operation 405. For example, scaling factor, S, may be decreased by a factor of 2 at operation 406 such that S = S/2. As shown, process may, subsequent to operation 406, end at ending operation 407.” In Stemmer, scaling factors are increased and decreased by multiplying and dividing by 2, a positive number.  Similarly, in Applicant's invention, the positive constant is an integer power of 2. Positive constant = 2^x.)

With respect to claim 8, Stemmer teaches the arithmetic processing device according to claim 7, and Stemmer also teaches wherein the positive number common to the layers of the multi-layer convolutional neural network circuit is 1. (pg. 14 ln. 25 – pg. 15 ln. 6, “Process 400 may continue at decision operation 404, "C ≤ L?", where it may be determined whether the correction count, C, is less than or equal to the corrections limit, L. As shown, if the correction count, C, is not less than or equal to the corrections limit, L, process 400 may continue at operation 405, "Increase S", where the scaling factor, S, may be increased by any suitable amount. For example, scaling factor, S, may be increased by a factor of 2 such that S = Sx2). In such examples, process 400 may continue at operation 403, where, as discussed, a corrections count, C, may be determined (continuing the above example, the weights when multiplied by the scaling factor (now 2) must be less than 128 to not require correction). The corrections count, C, may, via continued iterations as needed, be increased until the correction count, C, is greater than the corrections limit, L, as discussed with respect to decision operation 404 and process 400 may continue at operation 406, "Decrease S", where the scaling factor, S, may be decreased by the amount the scaling factor is increased by at operation 405. For example, scaling factor, S, may be decreased by a factor of 2 at operation 406 such that S = S/2. As shown, process may, subsequent to operation 406, end at ending operation 407.” In Stemmer, scaling factors are increased and decreased by multiplying and dividing by 2 (factors of 2).  In applicant's invention, the positive constant is an integer power of 2. Positive constant = 2^x.  Further, the positive constant may be based on a positive number, where the positive number is 1.  The claim language does not specify how the positive number is used to calculate positive constant.  For example, positive number can be x, then positive constant = 2^1 when x = positive number recited in claim 8.  Since Stemmer teaches the scaling factor is a 'factor of 2' and selects '2' as the factor, this is the same as '2^x' where x = positive number = 1. It is obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use a variable (claimed positive number) to keep track of the factor of 2 in Stemmer when calculating the scaling factors.  Using a variable to represent the factor of 2 will allow the user to configure the factor of 2 used to generate the scaling factor by adjusting the variable.)

With respect to claim 9, Stemmer teaches the arithmetic processing device according to claim 1, and Stemmer also teaches wherein the process of the multi-layer convolutional neural network circuit is performed with fixed-point number calculation with a same number of bits over at least two layers. (pg. 5, ln. 18-19, "In some embodiments discussed herein, neural network weights (e.g., parameters) may be represented as fixed point integer values such as 8 bit fixed point integer values." In some embodiments of Stemmer, neural network weights may be represented entirely with 8 bit fixed point integers, meaning the same number of bits is used over at least two layers.)

With respect to claim 10, Stemmer teaches the arithmetic processing device according to claim 1, and Stemmer also teaches wherein the process of the multi-layer convolutional neural network circuit is performed with fixed-point number calculation with a same number of bits over entire layers. (pg. 5, ln. 18-19, "In some embodiments discussed herein, neural network weights (e.g., parameters) may be represented as fixed point integer values such as 8 bit fixed point integer values." In some embodiments of Stemmer, neural network weights may be represented entirely with 8 bit fixed point integers, meaning the same number of bits is used over entire layers.)

With respect to claim 11, Stemmer teaches An arithmetic processing system (pg. 4 ln. 23-31, “The material disclosed herein may be implemented in hardware, firmware, software, or any combination thereof. The material disclosed herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., earner waves, infrared signals, digital signals, etc.), and others.”)
to realize a multi-layer convolutional neural network circuit (pg. 11 ln. 13-19, “FIG. 3 is an illustrative diagram of example neural network 204, arranged in accordance with at least some implementations of the present disclosure. Neural network 204 may include any suitable neural network such as an artificial neural network, a deep neural network, a convoluiional neural network, or the like. As shown in FIG. 3, neural network 204 may include an input layer 301, hidden layers 302-305, and an output layer 306. Neural network 204 is illustrated as having three input nodes, hidden layers with four nodes each, and six output nodes for the sake of clarity of presentation.”)
to perform a process with fixed-point number format, (pg. 5, ln. 18-19, "In some embodiments discussed herein, neural network weights (e.g., parameters) may be represented as fixed point integer values such as 8 bit fixed point integer values.")
comprising a first device, (pg. 4 ln. 23-31, “The material disclosed herein may be implemented in hardware, firmware, software, or any combination thereof. The material disclosed herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., earner waves, infrared signals, digital signals, etc.), and others.”; pg. 10 ln. 29 – pg. 11 ln. 6, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200. In some examples, weight conversions and correction value determinations may be performed via system 200 itself. In other examples, any of the pre-training, the weight conversions, or the correction value determinations may be performed by a separate system such that system 200 implements the determined weights, biases, and correction values." The separate system of Stemmer is a first device.)
a second device, (pg. 4 ln. 23-31, “The material disclosed herein may be implemented in hardware, firmware, software, or any combination thereof. The material disclosed herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., earner waves, infrared signals, digital signals, etc.), and others.”; pg. 10 ln. 29 – pg. 11 ln. 6, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200. In some examples, weight conversions and correction value determinations may be performed via system 200 itself. In other examples, any of the pre-training, the weight conversions, or the correction value determinations may be performed by a separate system such that system 200 implements the determined weights, biases, and correction values." System 200 in Stemmer is a second device.)
and a memory (pg. 4 ln. 23-31, “The material disclosed herein may be implemented in hardware, firmware, software, or any combination thereof. The material disclosed herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., earner waves, infrared signals, digital signals, etc.), and others.”)
the first device conducting: a learning process to perform weight learning or bias learning using learning data stored in the memory to calculate initial weight values and initial bias values for layers of the multi-layer convolutional neural network circuit; (pg. 10 ln. 29-32, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself.”; pg. 10 ln. 29 – pg. 11 ln. 6, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200. In some examples, weight conversions and correction value determinations may be performed via system 200 itself. In other examples, any of the pre-training, the weight conversions, or the correction value determinations may be performed by a separate system such that system 200 implements the determined weights, biases, and correction values." The pre-training (including learning process) may be performed by a separate system (first device).)
a trial recognition process to perform a recognition process to part of the learning data or of input data using the initial weight values and the initial bias values; (pg. 10 ln. 29 - pg 11. ln. 3, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200.”; Fig. 2: The trial recognition process uses neural network 204 (using weights determined during the learning process) to produce classification scores 205 and further resulting in recognized word sequence 208 as a product of a recognition process.; pg. 10 ln. 29 – pg. 11 ln. 6, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200. In some examples, weight conversions and correction value determinations may be performed via system 200 itself. In other examples, any of the pre-training, the weight conversions, or the correction value determinations may be performed by a separate system such that system 200 implements the determined weights, biases, and correction values." The pre-training (including trial recognition process) may be performed by a separate system (first device).)
and a processing treatment process to multiply the initial weight values and the initial bias values by a positive constant depending on the layers to calculate processed weight values and processed bias values for the layers, (Stemmer discloses a processing treatment process to multiply the initial weight values by a positive constant depending on the layers to calculate processed weight values for the layers. Multiplying the initial bias values by the positive constant will be taught below, by Deisher. pg. 14 ln. 25-32, “Process 400 may continue at decision operation 404, "C ≤ L?", where it may be determined whether the correction count, C, is less than or equal to the corrections limit, L. As shown, if the correction count, C, is not less than or equal to the corrections limit, L, process 400 may continue at operation 405, "Increase S", where the scaling factor, S, may be increased by any suitable amount. For example, scaling factor, S, may be increased by a factor of 2 such that S = Sx2). In such examples, process 400 may continue at operation 403, where, as discussed, a corrections count, C, may be determined (continuing the above example, the weights when multiplied by the scaling factor (now 2) must be less than 128 to not require correction).”; Fig. 4, 5: Illustrate the process of determining a scaling factor and applying corrections. Step 403 is related to modifying S (positive constant) so that neural network weight values may be fitted to a fixed point integer representation by multiplying them with S.; pg. 10 ln. 29 – pg. 11 ln. 6, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200. In some examples, weight conversions and correction value determinations may be performed via system 200 itself. In other examples, any of the pre-training, the weight conversions, or the correction value determinations may be performed by a separate system such that system 200 implements the determined weights, biases, and correction values." The weight conversions and correction value determinations (processing treatment process) may be performed by a separate system (first device). pg. 2 ln. 11-15, “FIG. 4 is a flow diagram illustrating an example process for determining a scaling factor for a layer of a neural network; FIG. 5 is a flow diagram illustrating an example process for converting weights of a layer of a neural network to fixed point integer values based on a scaling factor and generating correction values for one or more of the weights;” The process of determining the scaling factor is conducted per layer.)
and the second device conducting a recognition process using the processed weight values and the processed bias values. (pg. 10 ln. 29 - pg 11. ln. 3, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200.”; Fig. 9: Classification scores are the results of a recognition process. Neural Network 901 is supplied with Neural Network Weights, Biases, & Corrections (processed weight values and bias values.; pg. 10 ln. 29 – pg. 11 ln. 6, "Furthermore, neural network 204 and/or statistical models 207 may be pre-trained based on training sets or the like prior to implementation via system 200 to determine weights and/or biases of neural network. In some examples, pre -training may be implemented via system 200 itself. Also, as is discussed further herein, in some examples, weights of neural network 204 may be converted to fixed point integer values having an associated scaling factor and correction values may be determined for some of the converted weights prior to implementation via system 200. In some examples, weight conversions and correction value determinations may be performed via system 200 itself. In other examples, any of the pre-training, the weight conversions, or the correction value determinations may be performed by a separate system such that system 200 implements the determined weights, biases, and correction values." System 200 (second device) implements the determined weights, biases, and correction values (recognition process).)
But Stemmer does not explicitly teach a processing treatment process to multiply the initial weight values and the initial bias values by a positive constant depending on the layers to calculate processed weight values and processed bias values for the layers.
	Deisher, however, does teach a processing treatment process to multiply the initial weight values and the initial bias values by a positive constant depending on the layers to calculate processed weight values and processed bias values for the layers. (Deisher discloses a processing treatment process to multiply the initial bias values by a positive constant depending on the layers to calculate processed bias values for the layers. Multiplying the initial weight values by the positive constant has been taught above, by Stemmer. Para. [0034], “At block 322, the floating-point bias of layer 0 is obtained, where the floating-point bias was determined during training. At block 312, the biases from layer 0 (L0) are scaled. To upscale the biases, a target maximum in (TargetMaxIn) is divided by the maximum absolute value (MaxIn) and multiplied by the target weight layer 0 (TargetWeightL0) divided by the maximum weight layer 0 (MaxWeightL0). In examples, there is a target range for the input and a target range for the weight. The target range for the output and the biases is the target range for the input multiplied by the target range for the weight. As a result, the biases are scaled up through multiplication by [TargetMaxIn/MaxIn]* [TargetWgtL0/MaxWgtL0]. The target range for the biases is TargetMaxIn* TargetWgtL0. At block 324, the scaled up floating point biases are rounded and converted to an integer form. At block 326, the dynamically scaled integer bias is provided to layer 0 (L0) of the neural network and can now be used for future execution of layer 0 of the neural network.”; para. [0024], “In this manner, updated scale factors are propagated through the graph of the neural network. The values can then be rounded to the nearest integer value. The present techniques may be applied to neural networks of any size, with any number of layers. In embodiments, a neural network is re-quantized in a layer by layer fashion.” The initial bias values for the layer in Deisher are scaled by multiplying them by a positive constant, and the resulting processed bias values are then to be used for execution of the layer of the neural network. In multi-layer neural networks, this process is done layer by layer, meaning the positive constant depends on the layers.)
	It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the arithmetic processing device of Stemmer with a processing treatment process to multiply the initial bias values by a positive constant depending on the layers to calculate processed bias values for the layers in order to calculate initial desired scale factors of a plurality of inputs, weights and a bias. (Deisher, Abstract)

With respect to claim 12, it is substantially similar to claim 2 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 13, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 15, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 16, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 17, it is substantially similar to claim 7 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 18, it is substantially similar to claim 8 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 19, it is substantially similar to claim 9 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 20, it is substantially similar to claim 10 and is rejected in the same manner, the same art and reasoning applying.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK J TURNER whose telephone number is (571)272-8469. The examiner can normally be reached Monday-Thursday 9am-7pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.J.T./Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121