DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 1 and 8 objected to because of the following informalities:  Both claim 1 (line 6) and claim 8 (line 8) describe “parameters are associated with a CONV layer of the corresponding NN payer” when “parameters are associated with a CONV layer of the corresponding NN layer” (examiner emphasis added) appears to be intended.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-6, 8-13, and 15-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ren et al., (US 2020/0302288 A1, hereinafter Ren).
Regarding claims 1 and 8, taking claim 8 as exemplary:
Ren shows:
“A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor,” (Paragraph [0078]: “FIG. 9 depicts an example of internal hardware that may be included in any electronic device or computing system for implementing various methods in the embodiments described in FIGS. 1-8. An electrical bus 900 serves as an information highway interconnecting the other illustrated components of the hardware. Processor 905 is a central processing device of the system, configured to perform calculations and logic operations required to execute programming instructions. As used in this document and in the claims, the terms “processor” and “processing device” may refer to a single processor or any number of processors in a set of processors that collectively perform a process, whether a central processing unit (CPU) or a graphics processing unit (GPU) or a combination of the two. Read only memory (ROM), random access memory (RAM), flash memory, hard drives, and other devices capable of storing electronic data constitute examples of memory devices 925. A memory device, also referred to as a computer-readable medium, may include a single device or a collection of devices across which data and/or instructions are stored.” – The medium and processor of Ren is the medium and processor.)
“cause the processor to perform operations for training a neural network (NN),” (Paragraph [0020]: “FIG. 1 illustrates an example training system in accordance with various examples described herein. In some example, a training system 100 may include a training network 101 to train an AI model. The system 100 may upload the AI model 112 to an AI chip in an AI system 114. In some examples, an AI model may include a convolutional neural network (CNN) that is trained to perform AI tasks, e.g., voice or image recognition tasks.” And in paragraph [0026]: “the training system 100 may include a convolution quantization unit 106 and/or activation quantization unit 108, each of which may be configured to update the weights of a CNN model to adapt to an AI chip” – The trained and training of a CNN model of Ren is the training of a neural network.)
“the operations comprising: for each of a plurality of NN layers of the NN, merging batch normalization (BN) layer parameters with convolutional (CONV) layer parameters,” (Paragraph [0020]: “FIG. 1 illustrates an example training system in accordance with various examples described herein. In some example, a training system 100 may include a training network 101 to train an AI model. The system 100 may upload the AI model 112 to an AI chip in an AI system 114. In some examples, an AI model may include a convolutional neural network (CNN) that is trained to perform AI tasks, e.g., voice or image recognition tasks.” And in paragraph [0058]: “performing batch normalization merge at 704 may include updating the weights and biases of the CNN model by merging the batch normalization into the convolution layers such that the input values of a convolution layer Y=W*X+b are effectively normalized”  – The batch normalization weights and biases of layers of a CNN of Ren is the for each of a plurality of NN layers of the NN, merging batch normalization (BN) layer parameters with convolutional (CONV) layer parameters.)
“wherein the BN layer parameters are associated with a BN layer of a corresponding NN layer and the NN layer parameters are associated with a CONV layer of the corresponding NN payer;” (Paragraph [0058]: “performing batch normalization merge at 704 may include updating the weights and biases of the CNN model by merging the batch normalization into the convolution layers such that the input values of a convolution layer Y=W*X+b are effectively normalized” – The updating of the weights and biases of a CNN through merging of Ren is the wherein the BN layer parameters are associated with a BN layer of a corresponding NN layer and the NN layer parameters are associated with a CONV layer of the corresponding NN payer.)
“and forming a merged BN and CONV (BN/CONV) layer to compute merged BN layer and CONV layer functions using the merged BN and CONV layer parameters.” (Paragraph [0058]: “performing batch normalization merge at 704 may include updating the weights and biases of the CNN model by merging the batch normalization into the convolution layers such that the input values of a convolution layer Y=W*X+b are effectively normalized” – The batch normalization of weights and biases of Ren is the merged BN and CONV layer parameters.)

Regarding claims 2 and 9, taking claim 9 as exemplary:
Ren shows the method and the non-transitory machine-readable medium of claims 1 and 8 as claimed and above.
And Ren shows “wherein the operations further comprise: merging a batch normalization transform function with a convolutional kernel computation function.” (Paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The batch normalization merge of convolution layers that have kernels of Ren is the merging a batch normalization transform function with a convolutional kernel computation function.)

Regarding claims 3 and 10, taking claim 10 as exemplary:
Ren shows the method and the non-transitory machine-readable medium of claims 2 and 9 as claimed and above.
And Ren shows “The machine-readable medium of claim 9, wherein computing the merged BN layer and NN layer functions includes computing the merged batch normalization transform function and convolutional kernel computation function.” (Paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The batch normalization merge of convolution layers that have kernels of Ren is the computing the merged BN layer and NN layer functions includes computing the merged batch normalization transform function and convolutional kernel computation function.)

Regarding claims 4 and 11, taking claim 11 as exemplary:
Ren shows the method and the non-transitory machine-readable medium of claims 2 and 9 as claimed and above.
And Ren shows “wherein merging BN layer parameters with CONV layer parameters includes merging a batch normalization constant with weights of the convolutional kernel computation function.” (Paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The batch normalization of layers of a CNN that includes weights and kernel of Ren is the wherein merging BN layer parameters with CONV layer parameters includes merging a batch normalization constant with weights of the convolutional kernel computation function.)

Regarding claims 5 and 12, taking claim 12 as exemplary:
Ren shows the method and the non-transitory machine-readable medium of claims 2 and 9 as claimed and above.
And Ren shows “wherein merging BN layer parameters with CONV layer parameters includes merging batch normalization bias with a bias of the convolutional kernel computation function.” (Paragraph [0019]: “Examples of “AI model” include data containing one or more parameters that, when loaded inside an AI chip, are used for executing the AI chip. For example, an AI model for a given CNN may include the weights, biases, and other parameters for one or more convolutional layers of the CNN. Here, the weights and parameters of an AI model are interchangeable.” And in paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The use of kernels and bias with the batch normalization and merging of Ren is the merging BN layer parameters with CONV layer parameters includes merging batch normalization bias with a bias of the convolutional kernel computation function.)

Regarding claims 6 and 13, taking claim 13 as exemplary:
Ren shows the method and the non-transitory machine-readable medium of claims 1 and 8 as claimed and above.
And Ren shows “wherein the operations further comprise: merging a rectified linear unit (RELU) layer function with the merged BN layer and CNN layer functions to form a merged BN/CONV/RELU layer.” (Paragraph [0022]: “In a non-limiting example, the AI chip may include convolutional, Pooling, and ReLU layers in a CNN model. In such case, the AI chip may perform all computations in an AI task. In other examples, the AI chip may include a subset of the convolutional, Pooling, and ReLU layers in a CNN model. In such case, the AI chip may perform certain computations in an AI task, leaving the remaining, computations in the AI task performed in a CPU/GPU or other host processors outside the AI chip.” And in paragraph [0047]: “With further reference to FIG. 4, the process 400 may further include quantizing the output of the CNN at 406. In some examples, quantizing the output of the CNN may include quantizing at least one activation layer. In some examples, an activation layer in an AI chip may include a rectified linear unit (ReLU) of a CNN. The quantization of the activation layer may be based on the hardware constraints of the AI chip so that the quantized output of the activation layer can mimic the characterization of the physical AI chip.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The merging with batch normalization of a layer of an AI model with the ReLU of the CNN of the AI chip of Ren is merging a rectified linear unit (RELU) layer function with the merged BN layer and CNN layer functions to form a merged BN/CONV/RELU layer.)

Regarding claim 15:
Ren shows:
“A data processor, comprising: one or more memories to receive and store input data; and a processing core coupled to the one or more memories to classify the input data using a neural network (NN) having a plurality of NN layers,” (Paragraph [0078]: “FIG. 9 depicts an example of internal hardware that may be included in any electronic device or computing system for implementing various methods in the embodiments described in FIGS. 1-8. An electrical bus 900 serves as an information highway interconnecting the other illustrated components of the hardware. Processor 905 is a central processing device of the system, configured to perform calculations and logic operations required to execute programming instructions. As used in this document and in the claims, the terms “processor” and “processing device” may refer to a single processor or any number of processors in a set of processors that collectively perform a process, whether a central processing unit (CPU) or a graphics processing unit (GPU) or a combination of the two. Read only memory (ROM), random access memory (RAM), flash memory, hard drives, and other devices capable of storing electronic data constitute examples of memory devices 925. A memory device, also referred to as a computer-readable medium, may include a single device or a collection of devices across which data and/or instructions are stored.” In paragraph [0020]: “FIG. 1 illustrates an example training system in accordance with various examples described herein. In some example, a training system 100 may include a training network 101 to train an AI model. The system 100 may upload the AI model 112 to an AI chip in an AI system 114. In some examples, an AI model may include a convolutional neural network (CNN) that is trained to perform AI tasks, e.g., voice or image recognition tasks.” In paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” In paragraph [0022]: “the AI chip in the AI system 114 may include an embedded cellular neural network that has memory containing the multiple parameters in the CNN. In some scenarios, the memory in an AI chip may be a one-time-programmable (OTP) memory that allows a user to load a CNN model into the physical AI chip,” and in paragraph [0026]: “the training system 100 may include a convolution quantization unit 106 and/or activation quantization unit 108, each of which may be configured to update the weights of a CNN model to adapt to an AI chip” – The processor of Ren is the processor. The memories, including the memory on the AI chip, of Ren are the memories. The CNN of Ren is the neural network with a plurality of network layers. The recognition tasks of Ren is the classification of input data.)
“wherein each of the NN layers includes a merged batch normalization (BN) transform and convolutional (CONV) kernel computation layer using a set of merged BN and CONV parameters.” (Paragraph [0019]: “Examples of “AI model” include data containing one or more parameters that, when loaded inside an AI chip, are used for executing the AI chip. For example, an AI model for a given CNN may include the weights, biases, and other parameters for one or more convolutional layers of the CNN. Here, the weights and parameters of an AI model are interchangeable.” Paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The batch normalization merge of convolution layers that have kernels of Ren is the merging a batch normalization transform function with a convolutional kernel computation function.)

Regarding claim 16:
Ren shows the system of claim 15 as claimed and specified above.
And Ren shows “wherein the one or more memories store the merged BN and CONV parameters.” (Paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” And in paragraph [0022]: “the AI chip in the AI system 114 may include an embedded cellular neural network that has memory containing the multiple parameters in the CNN. In some scenarios, the memory in an AI chip may be a one-time-programmable (OTP) memory that allows a user to load a CNN model into the physical AI chip” – The storing of the model on the AI chip is the one or more memories store the merged BN and CONV parameters.)

Regarding claim 17:
Ren shows the system of claim 15 as claimed and specified above.
And Ren shows “wherein the processing core is configured to compute merged BN transform and CONV kernel computation functions.” (Paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The batch normalization merge of convolution layers that have kernels of Ren is the computing the merged BN layer and NN layer functions includes computing the merged batch normalization transform function and convolutional kernel computation function.)

Regarding claim 18:
Ren shows the system of claim 17 as claimed and specified above.
And Ren shows “wherein the merged BN and CONV parameters include a merged BN constant with weights of a CONV kernel computation.” (Paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The batch normalization of layers of a CNN that includes weights and kernel of Ren is the wherein merging BN layer parameters with CONV layer parameters includes merging a batch normalization constant with weights of the convolutional kernel computation function.)

Regarding claim 19:
Ren shows the system of claim 15 as claimed and specified above.
And Ren shows “wherein the merged BN and CONV parameters include a merged BN bias with a bias of a CONV kernel computation.” (Paragraph [0019]: “Examples of “AI model” include data containing one or more parameters that, when loaded inside an AI chip, are used for executing the AI chip. For example, an AI model for a given CNN may include the weights, biases, and other parameters for one or more convolutional layers of the CNN. Here, the weights and parameters of an AI model are interchangeable.” And in paragraph [0021]: “In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The use of kernels and bias with the batch normalization and merging of Ren is the merging BN layer parameters with CONV layer parameters includes merging batch normalization bias with a bias of the convolutional kernel computation function.)

Regarding claim 20:
Ren shows the system of claim 15 as claimed and specified above.
And Ren shows “wherein the processing core is trained to implement the NN with merged BN transform, CONV kernel computation, and rectified linear unit (RELU) layers.” (Paragraph [0022]: “In a non-limiting example, the AI chip may include convolutional, Pooling, and ReLU layers in a CNN model. In such case, the AI chip may perform all computations in an AI task. In other examples, the AI chip may include a subset of the convolutional, Pooling, and ReLU layers in a CNN model. In such case, the AI chip may perform certain computations in an AI task, leaving the remaining, computations in the AI task performed in a CPU/GPU or other host processors outside the AI chip.” And in paragraph [0047]: “With further reference to FIG. 4, the process 400 may further include quantizing the output of the CNN at 406. In some examples, quantizing the output of the CNN may include quantizing at least one activation layer. In some examples, an activation layer in an AI chip may include a rectified linear unit (ReLU) of a CNN. The quantization of the activation layer may be based on the hardware constraints of the AI chip so that the quantized output of the activation layer can mimic the characterization of the physical AI chip.” And in paragraph [0059]: “In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” – The merging with batch normalization of a layer of an AI model with the ReLU of the CNN of the AI chip of Ren is implementation of the NN with merged BN transform, CONV kernel computation, and rectified linear unit (RELU) layers.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 7 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ren in view of ANNAU et al., (US 2020/0218982 A1, hereinafter Annau).
Regarding claims 7 and 14, taking claim 14 as exemplary:
Ren shows the method and the non-transitory machine-readable medium of claims 1 and 8 as claimed and above.
And Ren shows “wherein the operations further comprise storing the merged BN and CONV layer parameters ..., which can be utilized subsequently by the merged BN/CONV layer during inference.” (Paragraph [0022]: “the AI chip in the AI system 114 may include an embedded cellular neural network that has memory containing the multiple parameters in the CNN. In some scenarios, the memory in an AI chip may be a one-time-programmable (OTP) memory that allows a user to load a CNN model into the physical AI chip,” and in paragraph [0059]: “the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.” And in paragraph [0076]: “In an example application, an AI chip may be installed in a camera and store the trained weights and/or other parameters of the CNN model, such as those trained/quantized/updates weights generated in any of units in the training network 101 (in FIG. 1) or any of the processes 200 (FIG. 2), 400 (FIG. 4), 600 (FIG. 6A), 700 (FIG. 7) or 800 (FIG. 8). The AI chip may be configured to receive a captured image from the camera, perform an image recognition task based on the captured image and the stored CNN model, and present the recognition result on an output device, such as a display.” – The storing of the model on an AI chip and installing the trained weights and parameters of a model that has been normalized and merged onto a camera for image recognition tasks of Ren is the the operations further comprise storing the merged BN and CONV layer parameters ..., which can be utilized subsequently by the merged BN/CONV layer during inference.)
But Ren does not appear to explicitly recite the use of parameters “as a metadata in a metafile.”
However, Annau teaches the use of parameters “as metadata in a metafile.” (Paragraph [0045]: “Parameters representing a machine learning model can be stored in memory/storage (212) and operated on by instructions executed by the CPU(s) (202) and GPU(s) (204). For a neural network, for example, the parameters representing the neural network can include descriptions of nodes, connections between nodes, groupings, weights for connections between nodes, biases for nodes, and/or activation functions for nodes. Parameters representing a machine learning model can be specified in source code, executable code, metadata, configuration data, data structures and/or files. The parameters representing the machine learning model can be stored in a regular-precision format or, as explained below, in a lower-precision format.” – The storing of the model as meta data including the parameters of Annau is the storing of parameters as metadata in a metafile.)
Ren and Annau are analogous in the arts because both Ren and Annau describe implementing machine learning models.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of Ren and Annau before him or her, to modify the teachings of Ren to include the teachings of Annau in order to be able to specify a model in source code and/or metadata in a file so as to increase the flexibility, usability, and marketability of Ren by being able to store the model in file for more flexible use.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
KARMATHA et al., (US 2020/0234447 A1), part of the prior art made of record, describes the use of layers with weights and biases and kernels and the use of an RELU of claims 1, 6, 8, 13, and 15 though the use of kernels with a convolution layers and the use of a batch normalization for merging with weight and biases of parent layers with an ReLU in paragraphs [0125] and [0148].
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHANE D WOOLWINE whose telephone number is (571)272-4138. The examiner can normally be reached M-F 9:30-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHANE D. WOOLWINE
Primary Examiner
Art Unit 2124



/SHANE D WOOLWINE/Primary Examiner, Art Unit 2124