Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This action is in response to amendments filed 1/28/2021. As per applicant’s request,
Claims 1, 4, 5, 12, and 15 have been amended. Claims 1-20 are currently pending in the application.
Response to Arguments
Applicant's arguments filed 1/28/2021 have been fully considered but they are not fully persuasive.
Applicant’s arguments regarding the claim objection of claim 5 of the previous office action have been fully considered, and due to the amendments to the claims filed 1/28/2021 are persuasive. The claim objection has been withdrawn.
Applicant’s arguments regarding the abstract and specification objections of the previous office action have been fully considered, and due to the substitute abstract and specification filed 1/28/2021 are persuasive. The abstract and specification objections has been withdrawn.
Applicant’s arguments regarding the 35 U.S.C. 112(b) claim rejections of claims 4 and 15 of the previous office action have been fully considered, and due to the amendments to the claims are persuasive. The claim rejections have been withdrawn.
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection relies on new prior art in combination with references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Specifically, “Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters” (2016) to Li et al. teaches amended claims 1-20 in combination with other prior art references. “Tikhonov, Ivanov and 
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
             The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “filter layer” in claim 1 lns. 4, 6 and claim 3 ln. 1; and “unit” in claim 1 ln. 4, claim 2 ln. 3 (first instance “each unit”), claim 9 lns. 1, 3, claim 12 ln. 5, and claim 18 ln. 3. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 1, 3, 5-6, 8, 10-12, 14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over “Group sparse regularization for deep neural networks”(Feb. 2017) to Scardapane et al., hereinafter Scardapane, in view of “Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters” (2016) to Li et al., hereinafter Li.
Regarding claim 1, Scardapane teaches: A neural network, comprising: an input layer; an output layer; and a filter layer, (Scardapane depicts a neural network in Fig. 1 on page 83, reproduced/annotated below, comprising an input layer, an output layer, and a filter layer. Throughout claim 1, Examiner is interpreting Fig. 1’s hidden layer as a filter layer, and the hidden layer’s connections as a filter layer’s connections.)

    PNG
    media_image1.png
    254
    388
    media_image1.png
    Greyscale

Fig. 1, Scardapane

each unit thereof configured to receive… a filter layer input… from a single preceding unit via a respective filter layer input connection, (Under the broadest reasonable interpretation, each unit h1 and h2 receives an input from a single preceding unit (e.g., x1) in addition to receiving an input from a second preceding unit (e.g., x2) via respective filter layer input connections.)
each filter layer input connection coupled to a different single preceding unit, (Under the broadest reasonable interpretation, each filter layer input connection to h1 is coupled to the different single preceding units x1 and x2. The same is true for filter layer input connection to h2.)
the filter layer configured to incentivize the neural network to learn to produce a target output from the output layer for a given input to the input layer while simultaneously learning weights for each filter layer input connection, (This limitation essentially describes training a neural network through backpropagation. Scardapane teaches such training on p. 82, col. 2, before Equation 2: “the network is trained by minimizing a standard regularized cost function:

    PNG
    media_image2.png
    84
    516
    media_image2.png
    Greyscale
”
W* denotes the learned weights and                         
                            
                                
                                    d
                                
                                
                                    i
                                
                            
                        
                     denotes the target output.)
the weights learned causing the filter layer to reduce a number of filter layer units that pass respective filter layer inputs as non-zero values. (Scardapane teaches on p. 83, col. 2, §3 ¶2: “The basic idea of this paper is to consider group-level sparsity, in order to force all outgoing connections from a single neuron (corresponding to a group) to be either simultaneously zero, or not.” Scardapane further states on p. 83, col. 2, para. 1: “If the variables of an input group are set to zero, the corresponding feature can be neglected during the prediction phase, effectively corresponding to a feature selection procedure. Then, if the variables in a hidden group are set to zero, we can remove the corresponding neuron, thereby obtaining a pruning effect and a thinner hidden layer.”)
	However, Scardapane does not explicitly teach: … [configured to receive] no more than a single filter layer input, the single filter layer input being [a filter layer input] received…
	But Li teaches: …[configured to receive] no more than a single filter layer input, the single filter layer input being [a filter layer input] received… (Li, Fig. 1(B) reproduced below, where the figure’s weighted input layer is interpreted as a filter layer)

    PNG
    media_image3.png
    315
    552
    media_image3.png
    Greyscale

Li, Fig. 1(B) – Shallow DFS
Li is in the same field of endeavor as Scardapane, namely sparse regularization for deep neural networks. Therefore, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to have modified Scardapane’s filter layer input connections to be one-to-one connections as taught Li’s Fig. 1(B), with a motivation to enhance feature selection (Li, Abstract: “[Sparse linear models] are simple, fast, and able to select features”)

Regarding claim 3, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein the filter layer is further configured to incentivize the neural network to minimize a regularized loss function that is expressed on the filter layer that combines an underlying loss function with a penalty function imposed on the weights learned. (Scardapane Eq. 2 on p. 82 teaches minimizing a regularized loss function that is expressed as an underlying loss function L(d,f) combined with a penalty function, R(w).)

Regarding claim 5, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein the filter layer is a feature selection layer that is an initial hidden layer of the neural network and each filter layer input is an input feature to the neural network (Scardapane Fig. 1 depicts the filter layer is an initial layer. Scardapane also teaches that the filter layer is a feature selection layer on p. 83, col. 2, para. 1: “If the variables of an input group are set to zero, the corresponding feature can be neglected during the prediction phase, effectively corresponding to a feature selection procedure. Then, if the variables in a hidden group are set to zero, we can remove the corresponding neuron, thereby obtaining a pruning effect and a thinner hidden layer.”)

Regarding claim 6, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein the filter layer is a last hidden layer of the neural network. (In Scardapane Fig. 1, the one and only filter layer is the last hidden layer.)

Regarding claim 8, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein each single preceding unit is located in a common preceding layer that precedes the filter layer in the neural network. (In Scardapane Fig. 1, each preceding unit is located in a common preceding layer (i.e., the input layer) that precedes the filter layer (i.e., the hidden layer) in the neural network.)

Regarding claim 9, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein at least two units of the filter layer are configured to receive respective filter layer inputs from corresponding single preceding units located in different preceding layers that precede the filter layer in the neural network. (Scardapane teaches multiple hidden layers on p. 83, col. 2, in list item 2, Hidden Groups Gh. In a network with at least 2 hidden layers, then the final hidden layer is a filter layer that receives input from both the input layer and the first hidden layer.)

Regarding claim 10, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein the filter layer is a first filter layer and the neural network further comprises at least one other filter layer. (Scardapane teaches in §4.2 ¶1: “In all cases, we use a simple network with two hidden layers having, respectively, 40 and 20 neurons. We run the optimization algorithm for 200 epochs, with mini-batches of 300 elements.” Examiner is interpreting the two hidden layers to be filter layers and the hidden layer connections to be filter layer connections.)

Regarding claim 11, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein the neural network is a densely connected neural network that includes the filter layer integrated therein. (In Scardapane Fig. 1, the neural network is a densely connected neural network.)

Regarding claim 12, Scardapane teaches: A method for filtering in a neural network, the method comprising: incentivizing the neural network, via a filter layer integrated within the neural network, to learn to produce a target output from an output layer for a given input to an input layer while simultaneously learning weights for each filter layer input connection, (Scardapane depicts a neural network in Fig. 1 on page 83 (reproduced below with annotations) comprising an input layer, an output layer, and a filter layer. Throughout claim 12, Examiner is interpreting Fig. 1’s hidden layer as a filter layer, and the hidden layer’s connections as a filter layer’s connections.

    PNG
    media_image1.png
    254
    388
    media_image1.png
    Greyscale

Fig. 1, Scardapane
The limitation “to learn to produce a target output from an output layer for a given input to an input layer while simultaneously learning weights for each filter layer input connection” essentially describes training a neural network through backpropagation. Scardapane teaches such training on p. 82, col. 2, before Equation 2: “the network is trained by minimizing a standard regularized cost function:

    PNG
    media_image2.png
    84
    516
    media_image2.png
    Greyscale

W* denotes the updated weight matrix and                         
                            
                                
                                    d
                                
                                
                                    i
                                
                            
                        
                     denotes the target output.)
each unit of the filter layer configured to receive… a filter layer input… from a single preceding unit via a respective filter layer input connection, (Under the broadest reasonable interpretation, each unit h1 and h2 receives an input from a single preceding unit (e.g., x1) in addition to receiving an input from a second preceding unit (e.g., x2) via respective filter layer input connections.)
each filter layer input connection coupled to a different single preceding unit; (Under the broadest reasonable interpretation, each filter layer input connection to h1 is coupled to the different single preceding units x1 and x2. The same is true for filter layer input connection to h2.)
and- 17 -2712854.v1Docket No. 4765.1074-000 (17-0493-US-ORG) learning the weights for each filter layer input connection to the filter layer, the weights learned causing the filter layer to reduce a number of filter layer units of the filter layer that pass respective filter layer inputs as non-zero values. (Scardapane teaches on p. 83, col. 2, section 3, para. 2: “The basic idea of this paper is to consider group-level sparsity, in order to force all outgoing connections from a single neuron (corresponding to a group) to be either simultaneously zero, or not.” Scardapane further states on p. 83, col. 2, para. 1: “If the variables of an input group are set to zero, the corresponding feature can be neglected during the prediction phase, effectively corresponding to a feature selection procedure. Then, if the variables in a hidden group are set to zero, we can remove the corresponding neuron, thereby obtaining a pruning effect and a thinner hidden layer.”)
However, Scardapane does not explicitly teach: … [configured to receive] no more than a single filter layer input, the single filter layer input being [a filter layer input] received…
	But Li teaches: … [configured to receive] no more than a single filter layer input, the single filter layer input being [a filter layer input] received… (Li, Fig. 1(B) reproduced below, where the figure’s weighted input layer is interpreted as a filter layer)

    PNG
    media_image3.png
    315
    552
    media_image3.png
    Greyscale

Li, Fig. 1(B) – Shallow DFS
Li is in the same field of endeavor as Scardapane, namely sparse regularization for deep neural networks. Therefore, it would have been obvious to one skilled in the art before the effective filing date (Li, Abstract: “[Sparse linear models] are simple, fast, and able to select features”)

Regarding claim 14, Scardapane in view of Li teaches: The method of Claim 12, 
Further, Scardapane teaches: further comprising expressing a regularized loss function on the filter layer that combines an underlying loss function with a penalty function imposed on the weights learned, wherein incentivizing the neural network includes incentivizing the neural network to minimize the regularized loss function. (Scardapane Eq. 2 on p. 82 teaches minimizing a regularized loss function that is expressed as an underlying loss function L(d,f) combined with a penalty function, R(w).)

Regarding claim 16, Scardapane in view of Li teaches: The method of Claim 12, 
Further, Scardapane teaches: wherein the filter layer is a feature selection layer that is an initial hidden layer of the neural network and each filter layer input is an input feature to the neural network. (Scardapane Fig. 1 depicts the filter layer is an initial layer. Scardapane also teaches that the filter layer is a feature selection layer on p. 83, col. 2, para. 1: “If the variables of an input group are set to zero, the corresponding feature can be neglected during the prediction phase, effectively corresponding to a feature selection procedure. Then, if the variables in a hidden group are set to zero, we can remove the corresponding neuron, thereby obtaining a pruning effect and a thinner hidden layer.”)

Regarding claim 17, Scardapane in view of Li teaches: The method of Claim 12, 
Further, Scardapane teaches: wherein the filter layer is a last hidden layer of the neural network (In Scardapane Fig. 1, the one and only filter layer is the last hidden layer.) or wherein the filter layer has input connections and output connections to respective units of internal layers that are neither input layers nor output layers.


Regarding claim 18, Scardapane in view of Li teaches: The method of Claim 12, 
Further, Scardapane teaches: further comprising including each single preceding unit in a common preceding layer that precedes the filter layer in the neural network or configuring at least two units of the filter layer to receive respective filter layer inputs- 18 - 2712854.v1Docket No. 4765.1074-000 (17-0493-US-O RG)from corresponding single preceding units included in different preceding layers that precede the filter layer in the neural network. (In Scardapane Fig. 1, each preceding unit is located in a common preceding layer (i.e., the input layer) that precedes the filter layer (i.e., the hidden layer) in the neural network.)

Regarding claim 19, Scardapane in view of Li teaches: The method of Claim 12, 
Further, Scardapane teaches: wherein the filter layer is a first filter layer and the method further comprises including at least one other filter layer in the neural network. (Scardapane teaches in §4.2 ¶1: “In all cases, we use a simple network with two hidden layers having, respectively, 40 and 20 neurons. We run the optimization algorithm for 200 epochs, with mini-batches of 300 elements.” Examiner is interpreting the two hidden layers to be filter layers and the hidden layer connections to be filter layer connections.)

Regarding claim 20, Scardapane in view of Li teaches: The method of Claim 12, 
Further, Scardapane teaches: wherein the neural network is a densely connected neural network that includes the filter layer integrated therein. (In Scardapane Fig. 1, the neural network is a densely connected neural network.)

Claims 2, 7, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Scardapane in view of Li, and further in view of U.S. Patent Application No. 20180285734 to Chen et al., hereinafter Chen.
Regarding claim 2, Scardapane in view of Li teaches: The neural network of Claim 1, 
Further, Scardapane teaches: wherein at least one of the weights learned has a negative value; (Equation 4 of Scardapane on p. 83, col. 1 takes the absolute magnitude of the weights, implying that some weights are negative.)  
wherein each unit of the filter layer is configured to apply a rectified linear unit (ReLU) activation function to force all outputs from the unit to zero in an event a weight learned for its respective filter layer input connection is negative (Scardapane teaches using a ReLU activation function on p. 84, col. 2, §4 ¶1: “In all cases, we use ReLu activation functions for the hidden layers of the network:                         
                            
                                
                                    g
                                
                                
                                    k
                                
                            
                            
                                
                                    s
                                
                            
                            =
                            
                                
                                    max
                                
                                ⁡
                                
                                    
                                        
                                            0
                                            ,
                                            s
                                        
                                    
                                
                            
                            ,
                             
                            1
                            ≤
                            k
                            ≤
                            H
                        
                    .”  If the input to a ReLU are negative, the output will be zero).
However, Scardapane in view of Li does not explicitly teach: a mean of the weights learned is a given average value; and
But Chen teaches: a mean of the weights learned is a given average value; and (Chen teaches averaging weights in ¶197: “Parameter averaging trains each node on a subset of the training data and sets the global parameters (e.g., weights, biases) to the average of the parameters from each node.”)
Chen is in the same field of endeavor as Scardapane and Li, namely training artificial neural networks. Therefore, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to have incorporated the teachings of Chen’s system into Scardapane and Li’s system by averaging the weights to yield predictable results in a neural network having a ReLU activation function (¶184). (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).)

Regarding claim 7, Scardapane in view of Li teaches: The neural network of Claim 1, 
 wherein the filter layer has input connections and output connections to respective units of internal layers that are neither input layers nor output layers.
But Chen teaches: wherein the filter layer has input connections and output connections to respective units of internal layers that are neither input layers nor output layers. (Chen Fig. 11A shows a CNN at least 5 layers deep)
Therefore, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to have incorporated the teachings of Chen’s system into the combination of Scardapane and Li’s system by making the neural network at least 5 layers deep (Chen Fig. 11A shows CNN at least 5 layers deep) with a motivation to reduce the dimensionality in a sparsely connected neural network (Chen ¶181 teaches sparsely connected convolution layers performing dimensionality reduction.)

Regarding claim 13, Scardapane in view of Li teaches: The method of Claim 12, 
Further, Scardapane teaches: further comprising applying a rectified linear unit (ReLU) activation function by each unit to force all outputs from the unit to zero in an event a weight learned for its respective filter layer input connection is negative, (Scardapane teaches using a ReLU activation function on p. 84, col. 2, first paragraph of section 4: “In all cases, we use ReLu activation functions for the hidden layers of the network:                         
                            
                                
                                    g
                                
                                
                                    k
                                
                            
                            
                                
                                    s
                                
                            
                            =
                            
                                
                                    max
                                
                                ⁡
                                
                                    
                                        
                                            0
                                            ,
                                            s
                                        
                                    
                                
                            
                            ,
                             
                            1
                            ≤
                            k
                            ≤
                            H
                        
                    .”  If the input to a ReLU are negative, the output will be zero) 
wherein at least one of the weights learned has a negative value and (Equation 4 of Scardapane on p. 83, col. 1 takes the absolute magnitude of the weights, implying that some weights are negative) 
wherein a mean of the weights learned is a given average value.
But Chen teaches: wherein a mean of the weights learned is a given average value. (Chen teaches averaging weights in ¶197: “Parameter averaging trains each node on a subset of the training data and sets the global parameters (e.g., weights, biases) to the average of the parameters from each node.”)
Therefore, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to have incorporated the teachings of Chen’s system into the combination of Scardapane and Li’s system by averaging the weights to yield predictable results in a neural network having a ReLU activation function (¶184).

Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Scardapane in view of Li, and further in view of “Tikhonov, Ivanov and Morozov regularization for support vector machine learning” (2016) to Oneto et al., hereinafter Oneto. 

Regarding claim 4, Scardapane in view of Li teaches: The neural network of Claim 3,
However, Scardapane in view of Li does not explicitly teach: wherein the regularized loss function is deemed minimized in an event (i) a difference between an actual output from the output layer and the target output is less than or equal to a given acceptable difference and (ii) the weights learned have an average value that matches a given value causing the penalty function to achieve the penalty function's minimum.
But Oneto teaches, by Eq. 4 and 8 shown below: wherein the regularized loss function is deemed minimized in an event (i) a difference between an actual output from the output layer and the (Eq. 8 bottom line, where                         
                            
                                
                                    L
                                
                                ^
                            
                            (
                            h
                            )
                        
                     is the difference and                         
                            
                                
                                    
                                        
                                            L
                                        
                                        ^
                                    
                                
                                
                                    M
                                    A
                                    X
                                
                            
                        
                     is a given difference.)
(ii) the weights learned have an average value that matches a given value causing the penalty function to achieve the penalty function's minimum.  (Eq. 8 top line, where finding the minimum of w and b achieves the penalty function’s minimum)

    PNG
    media_image4.png
    148
    948
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    163
    588
    media_image5.png
    Greyscale

Oneto is in the same field of endeavor as Scardapane and Li, namely, solving machine learning regularization problems. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Oneto’s system into the combination of Scardapane and Li’s system by solving the regularization in the form of Eq. 8 with a motivation to choose the simplest function (Oneto p. 107, second-to-last paragraph).

Regarding claim 15, Scardapane in view of Li teaches teaches: The method of Claim 14, 
However, Scardapane in view of Li does not explicitly teach: further comprising deeming the regularized loss function to be minimized in an event (i) a difference between an actual output from the output layer and the target output is less than or equal to a given difference and (ii) the weights learned 
But Oneto teaches, by Eq. 4 and 8 shown above: further comprising deeming the regularized loss function to be minimized in an event (i) a difference between an actual output from the output layer and the target output is less than or equal to a given difference and (Eq. 8 bottom line, where                         
                            
                                
                                    L
                                
                                ^
                            
                            (
                            h
                            )
                        
                     is the difference and                         
                            
                                
                                    
                                        
                                            L
                                        
                                        ^
                                    
                                
                                
                                    M
                                    A
                                    X
                                
                            
                        
                     is a given difference.)
(ii) the weights learned have an average value that matches a given value causing the penalty function to achieve the penalty function's minimum. (Eq. 8 top line, where finding the minimum of w and b achieves the penalty function’s minimum)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Oneto’s system into the combination of Scardapane and Li’s system by solving the regularization in the form of Eq. 8 with a motivation to choose the simplest function (Oneto p. 107, second-to-last paragraph).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASHER H. JABLON/Examiner, Art Unit 2122            
                                                                                                                                                                                            /ERIC NILSSON/Primary Examiner, Art Unit 2122