DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/25/2018, 02/15/2019, 06/16/2020, 06/10/2021, 12/22/2021 and 03/18/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-4, 6, 8-9, 11-17 and 20  are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhao et al (U.S. 20180060580 A1 ; Zhao).

Regarding claim 1, Zhao discloses an apparatus, (Abstract: “a system for training a machine learning model to detect malicious container files.”) comprising:
 a memory (Paragraph 57: “a memory”) ; and 
a processor operatively coupled to the memory, the processor (Paragraph 57: “ computer systems are also described that can include one or more processors and one or more memories coupled to the one or more processors.”) configured to 
extract a set of features (Paragraph 27: “include one or more features of a corresponding container file including, for example, a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs), and/or any other type of data characterizing or associated with the container file.”, “a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs),” read as a set of features.) from a potentially malicious file, (Fig.2B ; Paragraph 48: “the container file 100 may include the first file 122, the second file 124, the third file 126, and the fourth file 128. The training files may further be accompanied by a correct classification (e.g., as malicious or benign) for each training file.” And Paragraph 49: “ the convolutional neural network 200 may be adapted to process feature vectors corresponding to the files in each training file.”)
the processor configured to provide the set of features (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”) as an input to a normalization layer  (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240) of a neural network used to classify the potentially malicious file, (Fig.2B ; Paragraph 48: “the container file 100 may include the first file 122, the second file 124, the third file 126, and the fourth file 128. The training files may further be accompanied by a correct classification (e.g., as malicious or benign) for each training file.”)
 the processor configured to implement the normalization layer (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240) by calculating a set of parameters (Paragraph 27: “ the features included in the feature vectors may include, for example, strings, n-grams, entropies, and file size.”, it shows “strings, n-grams, entropies, and file size” read as a set of parameters.) associated with the set of features (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”) ; (Paragraph 31: “Each of the j number of kernels may include a combination of features. Applying a kernel to a group of feature vectors may include computing a dot product between the features included in the kernel and the features included in the group of feature vectors.” it shows “ computing a dot product”  read as “calculating”.)
and normalizing the set of features (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”) based on the set of parameters (Paragraph 27: “strings, n-grams, entropies, and file size.”) to define a set of normalized features, ( Paragraph 36: “, the pooling layer 220 may identify the maximum features by at least applying the following maximum pooling function to each feature map (e.g., the first feature map 222, the second feature map 224) …”; Paragraph 37: “The maximum features identified by the pooling layer 220 may be further processed by the dense layer 230, which may be a fully connected layer. The output from the dense layer 230 may be further processed by the dropout layer 240. In some example embodiments, the dropout layer 240 may be configured to drop out (e.g., randomly) at least a portion (e.g., half) of the output from the dense layer 230 in order to remove sampling noise introduced by the previous layers of the convolution neural network 200 (e.g., the convolution layer 210, the pooling layer 220) and prevent overfitting at the activation layer 250.”, it shows the “output of drop out layer” is read as “ a set of normalized features”)
the processor configured to provide the set of normalized features (Paragraph 37: ““output of drop out layer” is read as “ a set of normalized features”) and the set of parameters ((Paragraph 27: “strings, n-grams, entropies, and file size.”) as inputs to an activation layer (Fig.2B, activation layer 250 ) of the neural network such that the activation layer (Fig.2B, activation layer 250) produces an output (1) based on the set of normalized features (Paragraph 37: ““output of drop out layer” is read as “ a set of normalized features”) and the set of (Paragraph 27: “strings, n-grams, entropies, and file size.”) and (2) used to produce a maliciousness classification of the potentially malicious file. (Fig. 2B and Paragraph 38: “the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240. For instance, the activation layer 250 may apply the following rectifier or ramp function to the output from the dropout layer 240.”; Paragraph 39 : “ the output 204 of the convolutional neural network 200 may be a classification of the container file 100. For instance, the convolutional neural network 200 may provide a classification of the container file 100 (e.g., as malicious or benign) based on the prominent features that are identified from across the n number of files in the container file 100.”)
	
Regarding claim 2, Zhao discloses the set of parameters (Paragraph 27: “strings, n-grams, entropies, and file size.”) includes a mean of the set of values (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)” it shows that  “file name” is read as “ mean” after processed/computed/ calculated.) and a standard deviation of the set of values. (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)” “path or location” is read as “standard deviation” after processed/computed/ calculated.)

Regarding claim 3, Zhao discloses  the maliciousness classification of the potentially malicious file indicates whether the potentially malicious file is malicious or benign. (Paragraph 39 : “ the output 204 of the convolutional neural network 200 may be a classification of the container file 100. For instance, the convolutional neural network 200 may provide a classification of the container file 100 (e.g., as malicious or benign) based on the prominent features that are identified from across the n number of files in the container file 100.”)

Regarding claim 4, Zhao discloses the maliciousness classification of the potentially malicious file classifies the potentially malicious file as a type of malware. (Paragraph 39: “GNU Privacy Guard (GnuPG), Roshal Archive (RAR), cURL, and Syslnternals SDelete may all be individually benign files having legitimate utility. However, a container file that includes these files may be malicious as a whole (e.g., a ransomware package).”); 
	
Regarding claim 6, Zhao discloses a non-transitory processor-readable medium storing code representing instructions to be executed by a processor, (Paragraph 8: “A memory, which can include a non-transitory computer-readable or machine-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein.”)
the code comprising code to cause the processor (Paragraph 59: “ These computer programs, which can also be referred to programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor,”) to:
receive, at a normalization layer (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240)  of a neural network used to classify a file, a set of values (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”) associated with the file; (Fig.2B ; Paragraph 48: “the container file 100 may include the first file 122, the second file 124, the third file 126, and the fourth file 128. The training files may further be accompanied by a correct classification (e.g., as malicious or benign) for each training file.” And Paragraph 49: “ the convolutional neural network 200 may be adapted to process feature vectors corresponding to the files in each training file.”)
calculate, at the normalization layer (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240), a mean of the set of values; (Paragraph 31: “Each of the j number of kernels may include a combination of features. Applying a kernel to a group of feature vectors may include computing a dot product between the features included in the kernel and the features included in the group of feature vectors.” it shows “ computing a dot product”  read as “calculating”.)
calculate, at the normalization layer,(Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240)  a standard deviation of the set of values; , (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it show “path or location” read as “standard deviation”) ; (Paragraph 31: “Each of the j number of kernels may include a combination of features. Applying a kernel to a group of feature vectors may include computing a dot product between the features included in the kernel and the features included in the group of feature vectors.” it shows “ computing a dot product”  read as “calculating”.)
normalize, (Paragraph 36: “prevent overfitting”) at the normalization layer, the set of values based on the mean (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”,it shows “file name” read as “a mean”)  and the standard deviation,(Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it shows “path or location” read as “standard deviation”)  to define a set of normalized values; (Paragraph 36: “In some example embodiments, the dropout layer 240 may be configured to drop out (e.g., randomly) at least a portion (e.g., half) of the output from the dense layer 230 in order to remove sampling noise introduced by the previous layers of the convolution neural network 200 (e.g., the convolution layer 210, the pooling layer 220) and prevent overfitting at the activation layer 250.”, it shows the “output of drop out layer” is read as “ a set of normalized features”)
 provide to an activation layer (Fig 2B, “activation layer 250”) of the neural network the set of normalized values, (Fig.2B, output from the dropout layer 240”) the mean, (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”,it shows “file name” read as “a mean”)  and the standard deviation; (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it shows “path or location” read as “standard deviation”)  and (Fig. 2B and Paragraph 38: “the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240. For instance, the activation layer 250 may apply the following rectifier or ramp function to the output from the dropout layer 240.”)
calculate, at the activation layer (Fig 2B, “activation layer 250”) and based on the set of normalized values,(Fig.2B, output from the dropout layer 240”) the mean, (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it shows “file name” read as “a mean”)  and the standard deviation, (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it shows “path or location” read as “standard deviation”)  a set of results used to classify the file. (Fig. 2B and Paragraph 38: “the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240. For instance, the activation layer 250 may apply the following rectifier or ramp function to the output from the dropout layer 240.”; Paragraph 39 : “ the output 204 of the convolutional neural network 200 may be a classification of the container file 100. For instance, the convolutional neural network 200 may provide a classification of the container file 100 (e.g., as malicious or benign) based on the prominent features that are identified from across the n number of files in the container file 100.”; it shows “ apply the following rectifier or ramp function” is read as “calculate”.)

Regarding claim 8, Zhao discloses the set of normalized values includes a number of values equal to a number of results in the set of results. (Fig.2B and Paragraph 38: “, the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240.”, it shows that the number of values of activation layer is equal a number of output 204). 

Regarding claim 9,  Zhao discloses the code further comprising code to cause the processor to: 
extract the set of values from the file, (Fig.2B and Paragraph 27: “include one or more features of a corresponding container file including, for example, a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs), and/or any other type of data characterizing or associated with the container file.”) 
the set of results being used to produce a maliciousness classification of the file. (Paragraph 39 : “ the output 204 of the convolutional neural network 200 may be a classification of the container file 100. For instance, the convolutional neural network 200 may provide a classification of the container file 100 (e.g., as malicious or benign) based on the prominent features that are identified from across the n number of files in the container file 100.”)

Regarding claim 11,  Zhao discloses  the set of results is used to classify the file by producing a maliciousness classification of the file. (Paragraph 39 : “ the output 204 of the convolutional neural network 200 may be a classification of the container file 100. For instance, the convolutional neural network 200 may provide a classification of the container file 100 (e.g., as malicious or benign) based on the prominent features that are identified from across the n number of files in the container file 100.”)

Regarding claim 12,  Zhao discloses a method, (Paragraph 4: “Systems, methods, and articles of manufacture, including computer program products, are provided for training a machine learning model to detect malicious container files.”)  comprising: 
receiving a set of values (Fig.2B and Paragraph 27: “include one or more features of a corresponding container file including, for example, a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs), and/or any other type of data characterizing or associated with the container file.”, “a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs),” read as a set of values.)at a normalization layer (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240) of a neural network implemented in a processor of a compute device; (Paragraph 57: “ computer systems are also described that can include one or more processors and one or more memories coupled to the one or more processors.”) 
calculating, at the processor, a set of parameters (Paragraph 27: “ the features included in the feature vectors may include, for example, strings, n-grams, entropies, and file size.”, it shows “strings, n-grams, entropies, and file size” read as a set of parameters.) associated with the set of values; (Paragraph 27: “a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs), and/or any other type of data characterizing or associated with the container file.”, “a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs),” read as a set of values.); (Paragraph 31: “Each of the j number of kernels may include a combination of features. Applying a kernel to a group of feature vectors may include computing a dot product between the features included in the kernel and the features included in the group of feature vectors.” it shows “ computing a dot product”  read as “calculating”.)
normalizing, (Paragraph 36: “prevent overfitting”) at the processor, the set of values (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs) ”) based on the set of parameters (Paragraph 27: “strings, n-grams, entropies, and file size.”)   to define a set of normalized values;(Fig.2b and  Paragraph 36: “, the pooling layer 220 may identify the maximum features by at least applying the following maximum pooling function to each feature map (e.g., the first feature map 222, the second feature map 224) …”; Paragraph 37: “The maximum features identified by the pooling layer 220 may be further processed by the dense layer 230, which may be a fully connected layer. The output from the dense layer 230 may be further processed by the dropout layer 240. In some example embodiments, the dropout layer 240 may be configured to drop out (e.g., randomly) at least a portion (e.g., half) of the output from the dense layer 230 in order to remove sampling noise introduced by the previous layers of the convolution neural network 200 (e.g., the convolution layer 210, the pooling layer 220) and prevent overfitting at the activation layer 250.”, it shows the “output of drop out layer” is read as “ a set of normalized features”) and 
providing the set of normalized values (Paragraph 37: ““output of drop out layer” is read as “ a set of normalized features”)  and the set of parameters (Paragraph 27: “strings, n-grams, entropies, and file size.”)  to an activation layer (Fig.2B, activation layer 250)  of the neural network such that the activation layer (Fig.2B, activation layer 250)  identifies a set of results based on the set of normalized values and the set of parameters. (Fig. 2B and Paragraph 38: “the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240. For instance, the activation layer 250 may apply the following rectifier or ramp function to the output from the dropout layer 240.”; Paragraph 39: “ the output 204 of the convolutional neural network 200 may be a classification of the container file 100. For instance, the convolutional neural network 200 may provide a classification of the container file 100 (e.g., as malicious or benign) based on the prominent features that are identified from across the n number of files in the container file 100.”)

Regarding claim 13, Zhao discloses the set of parameters (Paragraph 27: “strings, n-grams, entropies, and file size.”) includes a mean of the set of values (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)” it shows that  “file name” is read as “ mean” after processed/computed/ calculated.) and a standard deviation of the set of values. (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)” “path or location” is read as “standard deviation” after processed/computed/ calculated.)

Regarding claim 14, Zhao discloses each value from the set of values represents a feature from a set of features extracted from a file. (Paragraph 27: “include one or more features of a corresponding container file including, for example, a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs), and/or any other type of data characterizing or associated with the container file.”, “a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs),” read as a set of features.)

Regarding claim 15, Zhao discloses  further comprising: 
extracting the set of values from a potentially malicious file, (Paragraph 27: “include one or more features of a corresponding container file including, for example, a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs), and/or any other type of data characterizing or associated with the container file.”, “a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs),” read as a set of values.)
the set of results being used to produce a maliciousness classification of the potentially malicious file. (Paragraph 39 : “ the output 204 of the convolutional neural network 200 may be a classification of the container file 100. For instance, the convolutional neural network 200 may provide a classification of the container file 100 (e.g., as malicious or benign) based on the prominent features that are identified from across the n number of files in the container file 100.”)

Regarding claim 16, Zhao discloses  the set of normalized values includes a number of values equal to a number of results in the set of results. (Fig.2B and Paragraph 38: “, the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240. f(x)=max(0, x), wherein x may be an output from the dropout layer 240 that is input into the activation layer 250.”, it shows that the number of values of  drop out layer is equal a number of  the output 204 ).
Regarding claim 17, Zhao discloses a number of inputs (Fig.2b, drop out layer) to the activation layer (Fig.2b, activation layer 250) is equal to a number of normalized values in the set of normalized values plus the number of parameters in the set of parameters, (Fig.2B, dense layer 230); (Paragraph 37: “The output from the dense layer 230 may be further processed by the dropout layer 240. In some example embodiments, the dropout layer 240 may be configured to drop out (e.g., randomly) at least a portion (e.g., half) of the output from the dense layer 230 in order to remove sampling noise introduced by the previous layers of the convolution neural network 200 (e.g., the convolution layer 210, the pooling layer 220) and prevent overfitting at the activation layer 250, Fig 2B show that a number of input in drop out layer 240 is equal to number of dense layer 230.”)
a number of outputs from the activation layer (Fig.2b, activation layer 250)  is equal to the number of normalized values in the set of normalized values.(Fig.2b, drop out layer); Fig.2B and Paragraph 38: “, the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240.”, read as the output 204 is equal the output of drop out layer as equal 1).

Regarding claim 20, Zhao discloses each value from the set of values is received sequentially at a node within the normalization layer, (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240) each normalized value is provided sequentially to the activation layer. (Fig.2B, activation layer 250 ) (Fig.2B shows that each node or value is provided sequentially to between each layer)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 5, 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al (U.S. 20180060580 A1 ; Zhao), in view of  Ioffe et al (U.S. 20160217368 A1 ; Ioffe)
Regarding claim 5, 10 and 19,  Zhao discloses the set of parameters (Paragraph 27: “strings, n-grams, entropies, and file size.”) includes a mean of the set of values (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)” it shows that  “file name” is read as “ mean” after processed/computed/ calculated.) and a standard deviation of the set of values. (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)” “path or location” is read as “standard deviation” after processed/computed/ calculated.)
However, Zhao does not discloses normalize each feature from the set of features by subtracting the mean from that feature and dividing by the standard deviation.
Ioffe discloses normalize each feature from the set of features by subtracting the mean from that feature and dividing by the standard deviation. (Paragraph 39: “the batch normalization layer computes, for each dimension, the mean and the standard deviation of the components of the lower layer outputs that correspond to the dimension.” ; Paragraph 42: “The batch normalization layer then normalizes each component of each of the lower level outputs using the average means and the average variances to generate a respective normalized output for each of the training examples in the batch.”, it shows “ the average” read as “subtracting and dividing”). 
It would have been obvious to a person of ordinary skill in the art before effective filling date to incorporate the “Batch normalization layer ” of Ioffe into the “Training a machine learning model for container file analysis.” of Zhao in order to improving higher learning rates to be effectively used during training and may reduce the impact of how parameters are initialized on the training process.

Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al (U.S. 20180060580 A1 ; Zhao), in view of  Ke Xu et al (“Deep Refiner: Multi-layer Android Malware Detection System Applying Deep Neural Networks.”; Ke Xu)

Regarding claim 7, Zhao discloses  the normalization layer is a first normalization layer (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240)  and the activation layer is a first activation layer,(Fig 2B, “activation layer 250”) the code further comprising code to cause the processor to:
 provide, to a normalization layer of the neural network, the set of results; values (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”)
calculate, at the normalization layer, a mean of the set of results; (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”,it shows “file name” read as “a mean”) ; (Paragraph 36: “The pooling layer 220 may identify maximum features from across all n number of feature vectors (e.g., across all files in the container file 100). For instance, the pooling layer 220 may identify the maximum features by at least applying the following maximum pooling function to each feature map (e.g., the first feature map 222, the second feature map 224)
calculate, at the normalization layer,(Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240)  a standard deviation of the set of results; , (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it show “path or location” read as “standard deviation”) ; (Paragraph 31: “Each of the j number of kernels may include a combination of features. Applying a kernel to a group of feature vectors may include computing a dot product between the features included in the kernel and the features included in the group of feature vectors.” it shows “ computing a dot product”  read as “calculating”.)
normalize, (Paragraph 36: “prevent overfitting”) at the normalization layer, the set of results based on the mean of the set of results (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”,it shows “file name” read as “a mean”)  and the standard deviation,(Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it shows “path or location” read as “standard deviation”)  to define a set of normalized results; (Paragraph 36: “In some example embodiments, the dropout layer 240 may be configured to drop out (e.g., randomly) at least a portion (e.g., half) of the output from the dense layer 230 in order to remove sampling noise introduced by the previous layers of the convolution neural network 200 (e.g., the convolution layer 210, the pooling layer 220) and prevent overfitting at the activation layer 250.”, it shows the “output of drop out layer” is read as “ a set of normalized features”) and 
 provide to an activation layer (Fig 2B, “activation layer 250”) of the neural network the set of normalized results, (Fig.2B, output from the dropout layer 240”) the mean of the set of results, (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”,it shows “file name” read as “a mean”)  and the standard deviation of the set of results; (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”, it shows “path or location” read as “standard deviation”)  and (Fig. 2B and Paragraph 38: “the activation layer 250 may generate the output 204 by at least applying one or more activation functions to the output from the dropout layer 240. For instance, the activation layer 250 may apply the following rectifier or ramp function to the output from the dropout layer 240.”)
	However, Zhao does not disclose a second normalization layer and a second activation layer.
	Ke Xu discloses a second normalization layer (Fig.6,LSTM hidden layer n)  and a second activation layer.(Fig.6, SoftMax layer) (Fig.1 , first detection layer and second detection layer ; Fig.6 Classification Phase in the Second Detection Layer of Deep Refiner.) 
	It would have been obvious to a person of ordinary skill in the art before effective filling
date to incorporate the “Deep Refiner: Multi-layer Android Malware Detection System Applying Deep Neural Networks.” of Ke xu into the “Training a machine learning model for container file analysis.” of Zhao in order to improving refined detection and catch up with the rapid evolution of both Android system and malware, Deep Refiner removes laborious human feature engineering and complicated feature extraction from the process.

Regarding claim 18, Zhao discloses the normalization layer is a first normalization layer, (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240) the activation layer is a first activation layer, (Fig 2B, “activation layer 250”)  and the set of parameters is a first set of parameters, (Paragraph 27: “ the features included in the feature vectors may include, for example, strings, n-grams, entropies, and file size.”, it shows “strings, n-grams, entropies, and file size” read as a set of parameters.)  the method further comprising: 
providing, to a normalization layer of the neural network, the set of results; (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”)
calculating, at the normalization layer, (Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240)  a set of parameters associated with the set of results; 
normalizing, at the normalization layer,(Fig.2B , a pooling layer 220, a dense layer 230, a dropout layer 240)  the set of results (Paragraph 27: “ a filename, path or location, size, owner, creator, embedded Universal Resource Locators (URLs)”) based on the  set or parameters (Paragraph 27: “ the features included in the feature vectors may include, for example, strings, n-grams, entropies, and file size.”, it shows “strings, n-grams, entropies, and file size” read as a set of parameters.)   to define a set of normalized results; ( Paragraph 36: “, the pooling layer 220 may identify the maximum features by at least applying the following maximum pooling function to each feature map (e.g., the first feature map 222, the second feature map 224) …”; Paragraph 37: “The maximum features identified by the pooling layer 220 may be further processed by the dense layer 230, which may be a fully connected layer. The output from the dense layer 230 may be further processed by the dropout layer 240. In some example embodiments, the dropout layer 240 may be configured to drop out (e.g., randomly) at least a portion (e.g., half) of the output from the dense layer 230 in order to remove sampling noise introduced by the previous layers of the convolution neural network 200 (e.g., the convolution layer 210, the pooling layer 220) and prevent overfitting at the activation layer 250.”, it shows the “output of drop out layer” is read as “ a set of normalized features”) and 
providing to an activation layer (Fig 2B, “activation layer 250”) of the neural network the set of normalized results (Paragraph 37: ““output of drop out layer” is read as “ a set of normalized features”) and the set of parameters. (Paragraph 27: “ the features included in the feature vectors may include, for example, strings, n-grams, entropies, and file size.”, it shows “strings, n-grams, entropies, and file size” read as a set of parameters.)
However, Zhao does not disclose a second normalization layer and a second activation layer  and a second set of parameters.
	Ke Xu discloses a second normalization layer (Fig.6,LSTM hidden layer n), a second activation layer.(Fig.6, SoftMax layer)  and a second set of parameters. (Section 2.3.2: Bytecode Vector Representation.) ; (Fig.1 , first detection layer and second detection layer ; Fig.6 Classification Phase in the Second Detection Layer of Deep Refiner. And section 2.3.3: “Deep Refiner builds the second detection model composing of multiple stacked LSTM hidden layers followed by a Max pooling layer and a SoftMax layer as illustrated in Figure 6.”)
	It would have been obvious to a person of ordinary skill in the art before effective filling
date to incorporate the “Deep Refiner: Multi-layer Android Malware Detection System Applying Deep Neural Networks.” of Ke xu into the “Training a machine learning model for container file analysis.” of Zhao in order to improving refined detection and catch up with the rapid evolution of both Android system and malware, Deep Refiner removes laborious human feature engineering and complicated feature extraction from the process.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ruckauer et al (U.S. 20180121802 A1), “METHOD OF CONVERTING NEURAL NETWORK AND RECOGNITION APPARATUS USING THE SAME”, teaches about method of converting an analog neural network (ANN) to a spiking neural network (SNN) normalizes first parameters of a trained ANN based on a reference activation that is set to be proximate to a maximum activation of artificial neurons included in the ANN, and determines second parameters of an SNN based on the normalized first parameters.
Saxe (U.S. 20170372071 A1), “METHODS AND APPARATUS FOR DETECTING WHETHER A STRING OF CHARACTERS REPRESENTS MALICIOUS ACTIVITY USING MACHINE LEARNING.”, teaches about methods and apparatus that can process strings related to artifacts, without the use of multiple resource-intensive models, and without manual coding of malicious indicators. Also, a processor can receive an input string associated with a potentially malicious artifact and convert each character in the input string into a vector of values to define a character matrix.
Philomin et al (U.S. 20060013475 A1), “Computer Vision System And Method Employing Illumination Invariant Neural Networks.”, teaches about classified using a normalized cross correlation (NCC) measure to compare two images acquired under non-uniform illumination conditions. An input pattern is classified to assign a tentative classification label and value. The input pattern is assigned to an output node in the radial basis function network having the largest classification value.
Socher et al (U.S. 20170046616 A1), “Three-dimensional (3D) convolution with 3D batch normalization.”, teaches about uses a 3D deep convolutional neural network architecture (DCNNA) equipped with so-called subnetwork modules which perform dimensionality reduction operations on 3D radiological volume before the 3D radiological volume is subjected to computationally expensive operations.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Duy A Tran whose telephone number is (571)272-4887. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward F Urban can be reached on (571)-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY TRAN/Examiner, Art Unit 2665                  

/BOBBAK SAFAIPOUR/Primary Examiner, Art Unit 2665                                                                                                                                                                                                                                                              /BOBBAK SAFAIPOUR/                                                      Primary Examiner, Art Unit 2665