DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to preliminary amendments filed on 10/30/2018. In the current amendments, claims 18-19, 23-24, and 26-30 are cancelled. Claims 1-17, 20-22, 25, and 31-32 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 01/21/2019, 05/06/2019, 12/23/2019, 09/05/2020, and 09/09/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 119(e) as follows:
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA  35 U.S.C. 112, except for the best mode requirement.  See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994)
 filed on 10/27/2017) fail to provide adequate support or enablement in the manner provided by 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph for one or more claims of this application.  In particular, at least the following limitations do not have support in the prior-filed applications:
Claim 1: “restarting the operation, when the layer reports a hardware overflow, using an updated set of output radix points.”
Claim 31: “restarting the operation, when the layer reports a hardware overflow, using an updated set of output radix points.”
Claim 32: “restart the operation, when the layer reports a hardware overflow, using an updated set of output radix points.”
Therefore, the effective filing date of the present application is 10/31/2017.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-17, 20-22, 25, and 31-32 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  “starting” and “stopping”/”pausing” the recited element “the operation.” Claim 1 recites “restarting the operation” in the last limitation; however, the claim does not require that “the operation” has been started and stopped such that a “restarting” is reasonable. Although Claim 1 recites “an operation to be performed” (emphasis added) in line 5, the “operation” is never actually performed in the claim. Therefore, essential steps associated with performing the “operation” and pausing the “operation” are omitted prior to “restarting the operation.” Dependent claims of claim 1 are rejected based on the same rationale. For examination purposes, “restarting the operation” has been interpreted as “starting the operation.”
Claim 7 recites “when it has a fixed output range” (emphasis added). It is unclear to which element “it” refers. For examination purposes, “it” has been interpreted as referring to any element or act in the claim.
Claim 9 recites “when it is a mathematically determinative operation” (emphasis added). It is unclear to which element “it” refers. For examination purposes, “it” has been interpreted as referring to any element or act in the claim.
Claim 11 recites “when it is a min pooling operation” (emphasis added). It is unclear to which element “it” refers. For examination purposes, “it” has been interpreted as referring to any element or act in the claim.
Claim 12 recites “when it is a mathematically non-determinative operation” (emphasis added). It is unclear to which element “it” refers. For examination purposes, “it” has been interpreted as referring to any element or act in the claim.
Claim 12 recites the limitation "the sample data result" in line 2-3.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the sample data result" has been interpreted as "a sample data result."

Claim 31 recites “an operation to be performed” (emphasis added) in line 6; however, it is unclear whether “an operation” refers to the “operations” in the recitation of “one or more processors to perform operations of...” in line 3. For examination purposes, “an operation to be performed” has been interpreted as distinct from the “operations” in line 3. 
Claim 31 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  The omitted steps are:  “starting” and “stopping”/”pausing” the recited element “the operation.” Claim 31 recites “restarting the operation” in the last limitation; however, the claim does not require that “the operation” has been started and stopped such that a “restarting” is reasonable. Although Claim 31 recites “an operation to be performed” (emphasis added) in line 6, the “operation” is never actually performed in the claim. Therefore, essential steps associated with performing the “operation” and pausing the “operation” are omitted prior to “restarting the operation.” For examination purposes, “restarting the operation” has been interpreted as “starting the operation.”
Claim 32 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  “start” and “stop”/”pause” the recited element “the operation.” Claim 32 recites “restart the operation” in the last limitation; however, the claim does not require that “the operation” has been started and stopped such that a “restart” is reasonable. Although Claim 32 recites “an operation to be performed” (emphasis added) in line 7, the “operation” is never actually performed in the claim. Therefore, essential steps associated with performing the “operation” and pausing the “operation” are omitted prior to “restarting the operation.” For examination purposes, “restart the operation” has been interpreted as “start the operation.”
Furthermore, MPEP 2173.05(h) specifies the following,
“Treatment of claims reciting alternatives is not governed by the particular format used (e.g., alternatives may be set forth as "a material selected from the group consisting of A, B, and C" or "wherein the material is A, B, or C"). See, e.g., the Supplementary Examination Guidelines for Determining Compliance with 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications ("Supplementary Guidelines"), 76 Fed. Reg. 7162, 7166 (February 9, 2011). Claims that set forth a list of alternatives from which a selection is to be made are typically referred to as Markush claims, after the appellant in Ex parte Markush, 1925 Dec. Comm’r Pat. 126, 127 (1924). The listing of specified alternatives within a Markush claim is referred to as a Markush group or Markush grouping. Abbott Labs v. Baxter Pharmaceutical Products, Inc., 334 F.3d 1274, 1280-81, 67 USPQ2d 1191, 1196-97 (Fed. Cir. 2003) (citing to several sources that describe Markush groups)...A Markush grouping is a closed group of alternatives, i.e., the selection is made from a group "consisting of" (rather than "comprising" or "including") the alternative members. Abbott Labs., 334 F.3d at 1280, 67 USPQ2d at 1196. If a Markush grouping requires a material selected from an open list of alternatives (e.g., selected from the group "comprising" or "consisting essentially of" the recited alternatives), the claim should generally be rejected under 35 U.S.C. 112(b) as indefinite because it is unclear what other alternatives are intended to be encompassed by the claim” (emphasis added).

Claim 8 recites “wherein the operation with a fixed output range includes one or more of a sine operation, a cosine operation, a hyperbolic tangent operation, a softmax operation, and a sigmoid operation” (emphasis added).
Claim 10 recites “wherein the mathematically determinative operation includes one or more of a max pooling operation, an average pooling operation, a drop out operation, a concatenation operation, a square root operation, and a rectified linear unit (ReLU) operation” (emphasis added).
includes one or more of an addition operation, a multiplication operation, a convolution operation, a batch norm operation, an exponential linear unit (ELU) operation, or a dense layer operation” (emphasis added).
Each of the limitations identified above in claims 8, 10, and 13 recites a grouping of alternatives (“Markush group”) using the transitional phrase “includes” (synonymous with “including”), which is considered inclusive or open-ended. See MPEP2111.03 (I) (“The transitional term "comprising", which is synonymous with "including," "containing," or "characterized by," is inclusive or open-ended and does not exclude additional, unrecited elements or method steps”).
Therefore, each of claims 8, 10, and 13 is “rejected under 35 U.S.C. 112(b) as indefinite because it is unclear what other alternatives are intended to be encompassed by the claim.” See MPEP 2173.05(h). For examination purposes, it has been interpreted that each of claims 8, 10, and 13 requires one alternative listed in each of the groupings. A recommended amendment is to amended “includes” to a transitional phrase that indicates the “closed” characteristic of the grouping.
Each dependent claim is rejected based on the same rationale as the claim from which it depends.









Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-6, 9-13, 15-17, 22, 31, and 32 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Koster et al. (US 2017/0316307 A1).
Regarding Claim 1,
Koster et al. teaches A computer-implemented method for computational manipulation comprising (pg. 1 [0003] teaches performing neural network computations (computational manipulation); pg. 8 [0082]-[0083] teaches computer-implemented method):
obtaining a first tensor (pg. 2 [0019]: “FIG. 1 illustrates a processing unit for executing instructions that process input tensors to generate output tensors, in accordance with an embodiment” teaches obtaining and processing a first tensor); 
generating a first set of weights for the first tensor (pg. 3 [0030]: “network characteristics include the weights of the connection between nodes of the neural network. The network characteristics may be any values or parameters associated with connections of nodes of the neural network” and pg. 3 [0031]: “receiving the node characteristics and network characteristics includes determining the node characteristics and network characteristics...the node characteristics and network characteristics are provided as one or more matrices or tensors” teach determining (generating) network characteristics, including the weights of a neural network, for a tensor);
evaluating an operation to be performed by a layer within a deep neural network on the first tensor using the first set of weights (pg. 4 [0041]: “The tensor computation module 520 performs tensor computations specified in the tensor instructions. Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent, and so on” teaches various computational operations, such as convolution, that can be performed by a layer of the deep neural network on a first tensor using the weights (see pg. 3 [0031] about how tensors contain weights); pg. 5 [0050]-[0051]: “The tensor computation module 520 performs 730 the tensor computation specified in the tensor instruction...The tensor statistics module 530 collects 740 statistics describing the values of each output tensor. The tensor decimal position determination module 540 determines 750 a new value of the decimal position for each output tensor based on the statistics collected for the output tensor” teaches evaluating the tensor computations to be performed by a layer of the deep neural network; Fig. 4 and pg. 3 [0029]: “The nodes of the neural network may represent input, intermediate, and output data and may be organized as input nodes, hidden nodes, and output nodes. The nodes may also be grouped together in various hierarchy levels” teach a neural network with multiple levels of nodes in which each of the nodes can be “intermediate” or “hidden” nodes, thus rendering the neural network can be a deep neural network because it has multiple layers between the input and output layers (for example, Fig. 4 can contain L0 and L1 as intermediate layers); pg. 7 [0079] further teaches deep learning applications of neural networks);
Fig. 6B and pg. 4 [0047]: “the example shown in FIG. 6B shows a decimal position value 630a associated with one subset of the tensor 610b and another decimal position value 630b associated with another subset of the values of the tensor 610b” teach determining a set of output decimal position values (elements 630a and 630b), which corresponds to a set of output radix points, for the first tensor and the computation operation associated with a layer of the deep neural network; also see Fig. 4 and pg. 3 [0029]);
calculating an output tensor for the layer within the deep neural network using the set of output radix points, the first tensor, and the first set of weights (pg. 5 [0051]: “The tensor decimal position determination module 540 receives a previous value of the decimal position for the plurality of values of the output tensor, such that the previous value was determined prior to performing the tensor computation (for example, based on certain initialization procedure or based on a previous iteration that executes the sequence of tensor instructions.) The tensor decimal position determination module 540 determines whether to adjust the received value of the decimal position for the output tensor based on the collected statistics” teaches calculating an output tensor using determined decimal position values (radix points) from a previous iteration; pg. 3 [0036]- [0037] teaches generating an output, using forward and backward propagation, based on a set of weights and previous input (pg. 3 [0031] teaches values, including weights, of neural network are represented by tensors)); and
restarting the operation, when the layer reports a hardware overflow, using an updated set of output radix points (pg. 6 [0069]: “If the tensor decimal position determination module 540 detects that an overflow occurs while executing a tensor instruction, the tensor decimal position determination module 540 moves the decimal position for the output tensor for the tensor instruction to the right by N bits. The processing unit 100 executes the sequence of instructions again and during the next execution of the instruction, the tensor decimal position determination module 540 checks if an overflow occurs again during execution of the tensor instruction. If the tensor decimal position determination module 540 detects an overflow again, the tensor decimal position determination module 540 again moves the decimal position for the output tensor for the tensor instruction to the right by N bits. This process is repeated until the tensor decimal position determination module 540 does not detect an overflow during an execution of the tensor instruction” teaches restarting the tensor instruction (operation) using an updated set of decimal positions (radix points) when the neural network layer reports an overflow (also see pg. 6 [0067]); pg. 6 [0068] teaches an example of hardware overflow; Fig. 4 teaches weight values are established for each level (layer) of neural network).
Regarding Claim 2,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the determining is further based on a radix point for the first tensor (Fig. 6B teaches that determining a set of output radix points is based on at least one radix point for the tensor; see elements 630a and 630b).
Regarding Claim 3,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the determining is further based on metadata for the first tensor (Fig. 7 Step 750 teaches that determining a set of output radix points is based on the collected statistics (corresponds to metadata); pg. 4 [0042] teaches that the statistics (metadata) are for the first tensor; Fig. 6B teaches a set of decimal positions (radix points)).
Regarding Claim 4,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the determining is further based on the first set of weights (Fig. 6B teaches a set of decimal positions (radix points) is determined based on the values in the tensor; 3 [0031] teaches that tensor values can represent weights).
Regarding Claim 5,
Koster et al. teaches the method of claim 4. 
Koster et al. further teaches wherein the determining is further based on a radix point for the first set of weights (pg. 6 [0069]: “If the tensor decimal position determination module 540 detects that an overflow occurs while executing a tensor instruction, the tensor decimal position determination module 540 moves the decimal position for the output tensor for the tensor instruction to the right by N bits” teaches the determining of a set of radix points can be based on a radix point; Fig. 6B teaches a set of decimal positions (radix points) is associated with the values in the tensor; 3 [0031] teaches that tensor values can represent weights).
Regarding Claim 6,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the determining is further based on a preceding radix point for a preceding output tensor (pg. 6 [0069]: “If the tensor decimal position determination module 540 detects that an overflow occurs while executing a tensor instruction, the tensor decimal position determination module 540 moves the decimal position for the output tensor for the tensor instruction to the right by N bits. The processing unit 100 executes the sequence of instructions again and during the next execution of the instruction, the tensor decimal position determination module 540 checks if an overflow occurs again during execution of the tensor instruction. If the tensor decimal position determination module 540 detects an overflow again, the tensor decimal position determination module 540 again moves the decimal position for the output tensor for the tensor instruction to the right by N bits. This process is repeated until the tensor decimal position determination module 540 does not detect an overflow during an execution of the tensor instruction” teaches determining radix points based on a preceding radix point for a preceding output tensor in an iterative manner).

Regarding Claim 9,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the determining employs a greater of function, a max of function, or a sum of function on radix points from the first tensor for the operation to be performed when it is a mathematically determinative operation (pg. 5 [0053]: “the tensor decimal position determination module 540 selects a decimal position that provides the highest precision for the plurality of values of the tensor without causing a overflow during a subsequent execution of the tensor instruction. For example, assume that an example value of the metric representing the aggregate measure is 8.0. This example value may be represented as any one of the following binary representations: 00001000. (decimal position 0), 0001000.0 (decimal position 1), 000100.00 ( decimal position 2), or 01000.000 ( decimal position 3). In this example, the tensor decimal position determination module 540 selects the representation 01000. 000 that has a decimal position value of 3 since that provides the highest precision without causing an overflow” teaches selecting the highest decimal position (corresponds to using a “greater of” or “max” function) from a group of positions for tensor instructions (operations); pg. 4 [0041]: “Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent” teaches that the tensor instructions (operations) can be various operations, including rectification activation functions (correspond to a rectified linear unit (ReLU) operation)).
Regarding Claim 10,
Koster et al. teaches the method of claim 9.
Koster et al. further teaches wherein the mathematically determinative operation includes one or more of a max pooling operation, an average pooling operation, a drop out operation, a concatenation operation, a square root operation, and a rectified linear unit (ReLU) operation (pg. 4 [0041]: “Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent” teaches that the tensor instructions (operations) can be various operations, including rectification activation functions (correspond to a rectified linear unit (ReLU) operation)).
Regarding Claim 11,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the determining employs a minimum function on radix points from the first tensor for the operation to be performed when it is a min pooling operation (Fig. 8 Step 840: “Determine a cutoff value C as a function of metric value Mand standard deviation S; e.g., C = M + k * S, where k is a constant” and “Determine the smallest decimal position that will fit the cutoff value C” teaches determining the smallest decimal position based on a function (corresponds to employing minimum function on radix points); pg. 4 [0041]: “Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent” teaches that the tensor instructions (operations) can be various operations, including a “minimum” operation (corresponds to min pooling operation); since min pooling refers to the process of finding the minimum number in a group of 
Regarding Claim 12,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the determining employs running sample data through the layer and setting the radix point at least one digit greater than the sample data result for the operation to be performed when it is a mathematically non-determinative operation (pg. 5 [0052]: “Details of the step determining 750 the decimal position are further described in FIG. 8. In an embodiment, the tensor decimal position determination module 540 determines 750 the decimal position for an output tensor based on an aggregate measure of the sizes of the values of the output tensor” and pg. 5 [0053]: “the tensor decimal position determination module 540 selects a decimal position that provides the highest precision for the plurality of values of the tensor without causing a overflow during a subsequent execution of the tensor instruction. For example, assume that an example value of the metric representing the aggregate measure is 8.0. This example value may be represented as any one of the following binary representations: 00001000. (decimal position 0), 0001000.0 (decimal position 1), 000100.00 ( decimal position 2), or 01000.000 ( decimal position 3). In this example, the tensor decimal position determination module 540 selects the representation 01000. 000 that has a decimal position value of 3 since that provides the highest precision without causing an overflow” teach the decimal position (radix point) is set based on the output tensor’s aggregate measure (corresponds to a sample data result) in which the decimal position can be set at least one digit greater than the representation of the aggregate measure- for example, the decimal position can be set at 3, which is at least one digit greater than position 1 or position 2 in the example; Fig. 7 teaches processing sample data through the neural network using tensor instructions/computations; pg. 4 [0041]: “Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent” teaches that the tensor instructions (operations) can be various operations, including addition, multiplication, and convolution).
Regarding Claim 13,
Koster et al. teaches the method of claim 12.
Koster et al. further teaches wherein the mathematically non-determinative operation includes one or more of an addition operation, a multiplication operation, a convolution operation, a batch norm operation, an exponential linear unit (ELU) operation, or a dense layer operation (pg. 4 [0041]: “Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent” teaches that the tensor instructions (operations) can be various operations, including addition, multiplication, and convolution).
Regarding Claim 15,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the set of output radix points is updated by deep neural network training (Fig. 6B teaches a set of decimal positions (radix points) is determined based on the values in the tensor; 3 [0031] teaches that tensor values can represent weights; pg. 3 [0036]-[0037] teaches weights are updated based on neural network training; pg. 7 [0079] further teaches deep learning applications of neural networks; also see Fig. 4 and pg. 3 [0029]).

Regarding Claim 16,
Koster et al. teaches the method of claim 15.
Koster et al. further teaches wherein the deep neural network training includes forward propagation of the set of output radix points (Fig. 3 Step 320 teaches training a deep neural network using forward propagation; pg. 4 [0041]: “The tensor data store 545 may also stores a value representing the decimal position for the plurality of values of the tensor. In an embodiment, the tensor data store 545 stores multiple decimal positions, each decimal position for a subset of values of the tensor, as illustrated in FIG. 6B” teaches neural network data values and computations are represented by tensors with decimal positions (radix points)).
Regarding Claim 17,
Koster et al. teaches the method of claim 15.
Koster et al. further teaches wherein the deep neural network training includes backward propagation of error gradients for the set of output radix points (pg. 3 [0036] teaches backpropagation of error gradients of a deep neural network; pg. 4 [0041]: “The tensor data store 545 may also stores a value representing the decimal position for the plurality of values of the tensor. In an embodiment, the tensor data store 545 stores multiple decimal positions, each decimal position for a subset of values of the tensor, as illustrated in FIG. 6B” teaches neural network data values and computations are represented by tensors with decimal positions (radix points)).
Regarding Claim 22,
Koster et al. teaches the method of claim 1.
Koster et al. further teaches wherein the first tensor is a multidimensional matrix (Fig. 6B teaches first tensor is a multidimensional matrix).


Regarding Claim 31,
Koster et al. teaches A computer program product embodied in a non-transitory computer readable medium for computational manipulation, the computer program product comprising code which causes one or more processors to perform operations of: (pg. 1 [0003] teaches performing neural network computations (computational manipulation); pg. 8 [0082]-[0083] teaches machine-readable medium, instructions, and processor):
obtaining a first tensor (pg. 2 [0019]: “FIG. 1 illustrates a processing unit for executing instructions that process input tensors to generate output tensors, in accordance with an embodiment” teaches obtaining and processing a first tensor); 
generating a first set of weights for the first tensor (pg. 3 [0030]: “network characteristics include the weights of the connection between nodes of the neural network. The network characteristics may be any values or parameters associated with connections of nodes of the neural network” and pg. 3 [0031]: “receiving the node characteristics and network characteristics includes determining the node characteristics and network characteristics...the node characteristics and network characteristics are provided as one or more matrices or tensors” teach determining (generating) network characteristics, including the weights of a neural network, for a tensor);
evaluating an operation to be performed by a layer within a deep neural network on the first tensor using the first set of weights (pg. 4 [0041]: “The tensor computation module 520 performs tensor computations specified in the tensor instructions. Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent, and so on” teaches various computational operations, such as convolution, that can be performed by a layer of the deep neural network on a first tensor using the pg. 5 [0050]-[0051]: “The tensor computation module 520 performs 730 the tensor computation specified in the tensor instruction...The tensor statistics module 530 collects 740 statistics describing the values of each output tensor. The tensor decimal position determination module 540 determines 750 a new value of the decimal position for each output tensor based on the statistics collected for the output tensor” teaches evaluating the tensor computations to be performed by a layer of the deep neural network; Fig. 4 and pg. 3 [0029]: “The nodes of the neural network may represent input, intermediate, and output data and may be organized as input nodes, hidden nodes, and output nodes. The nodes may also be grouped together in various hierarchy levels” teach a neural network with multiple levels of nodes in which each of the nodes can be “intermediate” or “hidden” nodes, thus rendering the neural network can be a deep neural network because it has multiple layers between the input and output layers (for example, Fig. 4 can contain L0 and L1 as intermediate layers); pg. 7 [0079] further teaches deep learning applications of neural networks);
determining a set of output radix points for the layer within the deep neural network based on the first tensor and the operation (Fig. 6B and pg. 4 [0047]: “the example shown in FIG. 6B shows a decimal position value 630a associated with one subset of the tensor 610b and another decimal position value 630b associated with another subset of the values of the tensor 610b” teach determining a set of output decimal position values (elements 630a and 630b), which corresponds to a set of output radix points, for the first tensor and the computation operation associated with a layer of the deep neural network; also see Fig. 4 and pg. 3 [0029]);
calculating an output tensor for the layer within the deep neural network using the set of output radix points, the first tensor, and the first set of weights (pg. 5 [0051]: “The tensor decimal position determination module 540 receives a previous value of the decimal position for the plurality of values of the output tensor, such that the previous value was determined prior to performing the tensor computation (for example, based on certain initialization procedure or based on a previous iteration that executes the sequence of tensor instructions.) The tensor decimal position determination module 540 determines whether to adjust the received value of the decimal position for the output tensor based on the collected statistics” teaches calculating an output tensor using determined decimal position values (radix points) from a previous iteration; pg. 3 [0036]- [0037] teaches generating an output, using forward and backward propagation, based on a set of weights and previous input (pg. 3 [0031] teaches values, including weights, of neural network are represented by tensors)); and
restarting the operation, when the layer reports a hardware overflow, using an updated set of output radix points (pg. 6 [0069]: “If the tensor decimal position determination module 540 detects that an overflow occurs while executing a tensor instruction, the tensor decimal position determination module 540 moves the decimal position for the output tensor for the tensor instruction to the right by N bits. The processing unit 100 executes the sequence of instructions again and during the next execution of the instruction, the tensor decimal position determination module 540 checks if an overflow occurs again during execution of the tensor instruction. If the tensor decimal position determination module 540 detects an overflow again, the tensor decimal position determination module 540 again moves the decimal position for the output tensor for the tensor instruction to the right by N bits. This process is repeated until the tensor decimal position determination module 540 does not detect an overflow during an execution of the tensor instruction” teaches restarting the tensor instruction (operation) using an updated set of decimal positions (radix points) when the neural network layer reports an overflow (also see pg. 6 [0067]); pg. 6 [0068] teaches an example of hardware overflow; Fig. 4 teaches weight values are established for each level (layer) of neural network).
Regarding Claim 32,
Koster et al. teaches A computer system for computational manipulation comprising: a memory which stores instructions; one or more processors attached to the memory wherein the one or more pg. 1 [0003] teaches performing neural network computations (computational manipulation); pg. 8 [0082]-[0084] teaches computer system, instructions, processor, and memory):
obtain a first tensor (pg. 2 [0019]: “FIG. 1 illustrates a processing unit for executing instructions that process input tensors to generate output tensors, in accordance with an embodiment” teaches obtaining and processing a first tensor); 
generate a first set of weights for the first tensor (pg. 3 [0030]: “network characteristics include the weights of the connection between nodes of the neural network. The network characteristics may be any values or parameters associated with connections of nodes of the neural network” and pg. 3 [0031]: “receiving the node characteristics and network characteristics includes determining the node characteristics and network characteristics...the node characteristics and network characteristics are provided as one or more matrices or tensors” teach determining (generating) network characteristics, including the weights of a neural network, for a tensor);
evaluate an operation to be performed by a layer within a deep neural network on the first tensor using the first set of weights (pg. 4 [0041]: “The tensor computation module 520 performs tensor computations specified in the tensor instructions. Examples of tensor computations performed by the tensor computation module 520 include matrix multiplication of tensors, dot product of tensors, multiplication of tensors, addition of tensors, multiplication of a tensor by a scalar, activation functions (sigmoid, rectification) and reductions (sum along an axis), convolution, maximum, minimum, logarithm, sine, cosine, tangent, and so on” teaches various computational operations, such as convolution, that can be performed by a layer of the deep neural network on a first tensor using the weights (see pg. 3 [0031] about how tensors contain weights); pg. 5 [0050]-[0051]: “The tensor computation module 520 performs 730 the tensor computation specified in the tensor instruction...The tensor statistics module 530 collects 740 statistics describing the values of each output tensor. The tensor decimal position determination module 540 determines 750 a new value of the decimal position for each output tensor based on the statistics collected for the output tensor” teaches evaluating the tensor computations to be performed by a layer of the deep neural network; Fig. 4 and pg. 3 [0029]: “The nodes of the neural network may represent input, intermediate, and output data and may be organized as input nodes, hidden nodes, and output nodes. The nodes may also be grouped together in various hierarchy levels” teach a neural network with multiple levels of nodes in which each of the nodes can be “intermediate” or “hidden” nodes, thus rendering the neural network can be a deep neural network because it has multiple layers between the input and output layers (for example, Fig. 4 can contain L0 and L1 as intermediate layers); pg. 7 [0079] further teaches deep learning applications of neural networks);
determine a set of output radix points for the layer within the deep neural network based on the first tensor and the operation (Fig. 6B and pg. 4 [0047]: “the example shown in FIG. 6B shows a decimal position value 630a associated with one subset of the tensor 610b and another decimal position value 630b associated with another subset of the values of the tensor 610b” teach determining a set of output decimal position values (elements 630a and 630b), which corresponds to a set of output radix points, for the first tensor and the computation operation associated with a layer of the deep neural network; also see Fig. 4 and pg. 3 [0029]);
calculate an output tensor for the layer within the deep neural network using the set of output radix points, the first tensor, and the first set of weights (pg. 5 [0051]: “The tensor decimal position determination module 540 receives a previous value of the decimal position for the plurality of values of the output tensor, such that the previous value was determined prior to performing the tensor computation (for example, based on certain initialization procedure or based on a previous iteration that executes the sequence of tensor instructions.) The tensor decimal position determination module 540 determines whether to adjust the received value of the decimal position for the output tensor based on the collected statistics” teaches calculating an output tensor using determined decimal position values (radix points) from a previous iteration; pg. 3 [0036]- [0037] teaches generating an output, using forward and backward propagation, based on a set of weights and previous input (pg. 3 [0031] teaches values, including weights, of neural network are represented by tensors)); and
restart the operation, when the layer reports a hardware overflow, using an updated set of output radix points (pg. 6 [0069]: “If the tensor decimal position determination module 540 detects that an overflow occurs while executing a tensor instruction, the tensor decimal position determination module 540 moves the decimal position for the output tensor for the tensor instruction to the right by N bits. The processing unit 100 executes the sequence of instructions again and during the next execution of the instruction, the tensor decimal position determination module 540 checks if an overflow occurs again during execution of the tensor instruction. If the tensor decimal position determination module 540 detects an overflow again, the tensor decimal position determination module 540 again moves the decimal position for the output tensor for the tensor instruction to the right by N bits. This process is repeated until the tensor decimal position determination module 540 does not detect an overflow during an execution of the tensor instruction” teaches restarting the tensor instruction (operation) using an updated set of decimal positions (radix points) when the neural network layer reports an overflow (also see pg. 6 [0067]); pg. 6 [0068] teaches an example of hardware overflow; Fig. 4 teaches weight values are established for each level (layer) of neural network).






Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 7-8 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Koster et al. (US 2017/0316307 A1) in view of Blaiech et al. (“Implementation of a Multi-Layer Perceptron Neural Networks in Multi -Width Fixed Point Coding”).
Regarding Claim 7,
Koster et al. teaches the method of claim 1.
Koster et al. does not appear to explicitly teach wherein the determining employs a fixed radix point for the operation to be performed when it has a fixed output range.
However, Blaiech et al. teaches wherein the determining employs a fixed radix point for the operation to be performed when it has a fixed output range (pg. 281 third full paragraph:
    PNG
    media_image1.png
    326
    500
    media_image1.png
    Greyscale
teaches determining a fixed decimal position (radix point) for an operation in processing a neural network based on the output’s fixed output range; also see Fig. 1 and pg. 280 Section A (teaches a neural network with hidden layer)).
Koster et al. and Blaiech et al. are analogous art to the claimed invention because they are directed to implementation of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the determining employs a fixed radix point for the operation to be performed when it has a fixed output range as taught by Blaiech et al. to the disclosed invention of Koster et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “to estimate in the first place the dynamic data enough to gain the size of the integer part and evaluate the accuracy” based on fixed output range and fixed decimal position to implement an “optimization methodology which aims to reduce the size of the information while retaining the performance of the MLP model,” in which the optimization methodology is converting floating point coding learning to fixed point coding learning (Blaiech et al. pg. 281 first & third full paragraph; pg. 280 third full paragraph; Fig. 1).

Regarding Claim 8,
Koster et al. in view of Blaiech et al. teaches the method of claim 7.
Blaiech et al. further teaches wherein the operation with a fixed output range includes one or more of a sine operation, a cosine operation, a hyperbolic tangent operation, a softmax operation, and a sigmoid operation (pg. 280 last full paragraph to pg. 281: “We have used the hyperbolic tangent as a transfer function in the hidden and output layers while a linear function is adopted in the input layer” teach a hyperbolic tangent operation, which is part of the neural network implementation/learning process with fixed output range, see Fig. 1).
Koster et al. and Blaiech et al. are analogous art to the claimed invention because they are directed to implementation of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the operation with a fixed output range includes one or more of a sine operation, a cosine operation, a hyperbolic tangent operation, a softmax operation, and a sigmoid operation as taught by Blaiech et al. to the disclosed invention of Koster et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “to estimate in the first place the dynamic data enough to gain the size of the integer part and evaluate the accuracy” based on fixed output range and fixed decimal position to implement an “optimization methodology which aims to reduce the size of the information while retaining the performance of the MLP model,” in which the optimization methodology is converting floating point coding learning to fixed point coding learning (Blaiech et al. pg. 281 first & third full paragraph; pg. 280 third full paragraph; Fig. 1).
Regarding Claim 14,
Koster et al. teaches the method of claim 1.
Koster et al. does not appear to explicitly teach wherein the determining transposes floating-point operation radix points and fixed-point operation radix points.
However, Blaiech et al. teaches wherein the determining transposes floating-point operation radix points and fixed-point operation radix points (Fig. 1 teaches converting from floating-point operation radix points to fixed-point operation radix points). 
Koster et al. and Blaiech et al. are analogous art to the claimed invention because they are directed to implementation of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the determining transposes floating-point operation radix points and fixed-point operation radix points as taught by Blaiech et al. to the disclosed invention of Koster et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “to estimate in the first place the dynamic data enough to gain the size of the integer part and evaluate the accuracy” based on fixed output range and fixed decimal position to implement an “optimization methodology which aims to reduce the size of the information while retaining the performance of the MLP model,” in which the optimization methodology is converting floating point coding learning to fixed point coding learning (Blaiech et al. pg. 281 first & third full paragraph; pg. 280 third full paragraph; Fig. 1).

Claims 20, 21, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Koster et al. (US 2017/0316307 A1) in view of Gupta et al. (“Deep Learning with Limited Numerical Precision”).
Regarding Claim 20,
Koster et al. teaches the method of claim 1. 
Koster et al. does not appear to explicitly teach wherein the first tensor includes a fixed-point tensor.
However, Gupta et al. teaches wherein the first tensor includes a fixed-point tensor (pg. 3 Section 3.1: “As will be evident in the sections to follow, the rounding mode adopted while converting a number (presumably represented using the float or a higher precision...fixed-point format) into a lower precision fixed-point representation turns out to be a matter of important consideration while performing computations on fixed-point numbers” teaches representing deep neural network values using fixed-point numbers; pg. 8 first full paragraph teaches data are represented in matrix, which is a type of tensor).
Koster et al. and Gupta et al. are analogous art to the claimed invention because they are directed to implementation of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the first tensor includes a fixed-point tensor as taught by Gupta et al. to the disclosed invention of Koster et al.
One of ordinary skill in the arts would have been motivated to make this modification because “the substitution of floating-point units with fixed-point arithmetic circuits comes with significant gains in the energy efficiency and computational throughput” and because “[f]or low-precision fixed-point computations, where conventional rounding schemes fail, adopting stochastic rounding during deep neural network training delivers results nearly identical as 32-bit floating-point computations” (Gupta et al. pg. 9 Section 6).
Regarding Claim 21,
Koster et al. in view of Gupta et al. teaches the method of claim 20.
Gupta et al. further teaches further comprising translating a floating-point input tensor into fixed-point values for use as the first tensor (pg. 3 Section 3.1: “As will be evident in the sections to follow, the rounding mode adopted while converting a number (presumably represented using the float or a higher precision...fixed-point format) into a lower precision fixed-point representation turns out to be a matter of important consideration while performing computations on fixed-point numbers” teaches converting (translating) a float-point numeric representation to fixed-point numeric representation for deep neural network values; pg. 8 first full paragraph teaches data are represented in matrix, which is a type of tensor).
Koster et al. and Gupta et al. are analogous art to the claimed invention because they are directed to implementation of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising translating a floating-point input tensor into fixed-point values for use as the first tensor as taught by Gupta et al. to the disclosed invention of Koster et al.
One of ordinary skill in the arts would have been motivated to make this modification because “the substitution of floating-point units with fixed-point arithmetic circuits comes with significant gains in the energy efficiency and computational throughput” and because “[f]or low-precision fixed-point computations, where conventional rounding schemes fail, adopting stochastic rounding during deep neural network training delivers results nearly identical as 32-bit floating-point computations” (Gupta et al. pg. 9 Section 6).
Regarding Claim 25,
Koster et al. teaches the method of claim 1. 
Koster et al. does not appear to explicitly teach wherein the first tensor comprises deep neural network user training data.
Gupta et al. teaches wherein the first tensor comprises deep neural network user training data (pg. 5 first full paragraph teaches user training data for deep neural network; pg. 8 first full paragraph teaches data are represented in matrix, which is a type of tensor).
Koster et al. and Gupta et al. are analogous art to the claimed invention because they are directed to implementation of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the first tensor comprises deep neural network user training data as taught by Gupta et al. to the disclosed invention of Koster et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “construct a fully connected neural network with 2 hidden layers, each containing 1000 units with ReLU activation function and train this network to recognize the handwritten digits from the MNIST dataset” (Gupta et al. pg. 5 first full paragraph) to evaluate the effectiveness of Gupta’s system, which has the following advantages: “the substitution of floating-point units with fixed-point arithmetic circuits comes with significant gains in the energy efficiency and computational throughput” and “[f]or low-precision fixed-point computations, where conventional rounding schemes fail, adopting stochastic rounding during deep neural network training delivers results nearly identical as 32-bit floating-point computations” (Gupta et al. pg. 9 Section 6).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Yang et al. (US 2017/0061279 A1) teaches flexible fixed point representation of neural network.
Burger et al. (US 2019/0057303 A1) teaches matrix vector unit for neural network model processing.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/YING YU CHEN/               Examiner, Art Unit 2125