DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first 
inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.  

Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
According to the first part of the analysis, in the instant case, claims 1-7 are directed to a method claim, claims 8-14 are directed to an apparatus claim comprising one or more processors and memory, and claims 15-20 are directed to program products.  Thus, each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter).

Step 2A Prong 1:
Regarding claim 1, this claim recites: 
constructing a training function with a constraint for a neural network (This is a mental step of creating a training algorithm for training a neural network.  Note: the limitation “for a neural network” describes that the intended use of a training function.  The claim does not require the processing by the neural network.); and
finding a solution of a constrained optimization solution based on the training function to obtain connection weights of the neural network (This is a mental step of determining a solution based on the training function.  The solution includes connections weights/parameters of the neural network).



Step 2A Prong 2:
Regarding claim 1, the additional element of “one or more computing devices” recited in the preamble does not integrate the judicial exception into a practical application.  This additional element is merely using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).
Step 2B:
Regarding claim 1, the additional element of “one or more computing devices” recited in the preamble does not amount to significantly more than the judicial exception in the claim.  The additional element of “a computer system” is merely uses a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)). 


Step 2A Prong 1:
Regarding claim 8, this claim recites: 
a construction module stored in the memory and executed by the one or more processors that is configured to construct a training function with a constraint for a neural network (This is a mental step of creating a training algorithm for training a neural network.  Note: the limitation “for a neural network” describes that the intended use of a training function.  The claim does not require the processing by the neural network.); and
a solving module stored in the memory and executed by the one or more processors that is configured to solve constrained optimization solution based on the training function to obtain connection weights of the neural network (This is a mental step of determining a solution based on the training function.  The solution includes connections weights/parameters of the neural network.  Note: the underlined limitations are additional elements.).
Step 2A Prong 2:
Regarding claim 8, the additional element of “one or more computing devices comprising one or more processors and memory,” “a construction module stored in the memory and executed by the one or more processors,” and “a solving module stored in the memory and executed by the one or more processors” do not integrate the judicial exception into a practical application.  These additional elements are merely directed to using a computer as a tool to perform an abstract idea.  The modules are directed to software modules performing the abstract idea. (see MPEP 2106.05(f)).


Step 2B:
Regarding claim 8, the additional element of “one or more computing devices comprising one or more processors and memory,” “a construction module stored in the memory and executed by the one or more processors,” and “a solving module stored in the memory and executed by the one or more processors” do not amount to significantly more than the judicial exception in the claim.  These additional elements are merely directed to using a computer as a tool to perform an abstract idea.  The modules are directed to software modules performing the abstract idea. (see MPEP 2106.05(f)).

Step 2A Prong 1:
Regarding claim 15, this claim recites: 
constructing a training function with a constraint for a neural network (This is a mental step of creating a training algorithm for training a neural network.  Note: the limitation “for a neural network” describes that the intended use of a training function.  The claim does not require the processing by the neural network.); and
finding a solution of a constrained optimization solution based on the training function to obtain connection weights of the neural network (This is a mental step of determining a solution based on the training function.  The solution includes connections weights/parameters of the neural network).
Step 2A Prong 2:
Regarding claim 15, the additional element of “one or more computer readable media storing executions” recited in the preamble does not integrate the judicial exception into a practical application.  This additional element is merely directed to using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).
Step 2B:
Regarding claim 15, the additional element of “one or more computer readable media storing executions” recited in the preamble does not amount to significantly more than the judicial exception in the claim.  The additional element merely directed to using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).



Step 2A Prong 1:
Regarding claim 2, the limitation “wherein a solving algorithm used for solving the constrained optimization comprises one of a penalty function method, a multiplier method, a projected gradient method, a reduced gradient method, or a constrained variable-scale method”
is directed to different types pf optimization methods, which are mental processes.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 3, “further comprising: determining a solving algorithm used for solving the constrained optimization based on the constraint of the training function before finding the solution of the constrained optimization solution based on the training function to obtain the connection weights of the neural network.” is a mental process of performing mental steps in a sequence.
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 4, “performing equivalent transformation for the training function based on an indication function and a consistency constraint; decomposing the equivalently transformed training function; and solving the connection weights of the neural network for sub-problems obtained after the decomposing”, is a mental process of performing mental steps to transform a function with the aid of pen and paper.
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 5, “wherein performing the equivalent transformation for the training function based on the indication function and the consistency constraint comprises decoupling the training function” is a mental process of performing mental steps to decouple a function with the aid of pen and paper.  

Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 6, “performing iterative computations of the sub-problems obtained after the decomposing to obtain the connection weights of the neural network” is a mental process of performing computations to obtain connection weights with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 7, “wherein the equivalently transformed training function is decomposed using an alternating direction method of multipliers (ADMM)” is a mental process of performing ADMM algorithm to solve an optimization problem with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 9, this limitation “wherein a solving algorithm used for solving the constrained optimization is any one of the following algorithms: a penalty function method, a multiplier method, a projected gradient method, a reduced gradient method, or a constrained variable-scale method” is directed to different types pf optimization methods, which are mental processes.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 10, “a determination module configured to determine a solving algorithm used for solving the constrained optimization based on the constraint of the training function before finding the solution of the constrained optimization solution based on the training function to obtain the connection weights of the neural network” wherein, “ determine a solving algorithm used for solving the constrained optimization based on the constraint of the training function before finding the solution of the constrained optimization solution based on the training function to obtain the connection weights of the neural network” is a mental process of performing mental steps in a sequence.
Step 2A Prong 2:
Regarding claim 10, the additional element of “a determination module” does not integrate the judicial exception into a practical application.  This additional element is merely using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).
Step 2B: 
Regarding claim 10, the additional element of “a determination module” does not amount to significantly more than the judicial exception in the claim.  The additional element merely directed to using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).

Step 2A Prong 1:
Regarding claim 11, “a transformation unit configured to perform an equivalent transformation on the training function based on an indication function and a consistency constraint”, wherein,” perform an equivalent transformation on the training function based on an indication function and a consistency constraint” is a mental process of performing mental steps to transform a training function and determine the solution of the training function with the aid of pen and paper.
Step 2A Prong 2:
Regarding claim 11, the additional element of “a transformation unit” does not integrate the judicial exception into a practical application.  This additional element merely uses a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).
Step 2B: 
Regarding claim 11, the additional element of “a transformation unit” does not amount to significantly more than the judicial exception in the claim.  The additional element merely directed to using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).




Step 2A Prong 1:
Regarding claim 11, “a decomposition unit configured to decompose the equivalently transformed training function”, wherein, “decompose the equivalently transformed training function” is a mental process of performing mental steps to decompose the transformed training function with the aid of pen and paper.
Step 2A Prong 2:
Regarding claim 11, the additional element of “a decomposition unit” does not integrate the judicial exception into a practical application.  This additional element merely uses a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).
Step 2B: 
Regarding claim 11, the additional element of “a decomposition unit” does not amount to significantly more than the judicial exception in the claim.  The additional element merely directed to using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).

Regarding claim 11, “a solving unit configured to solve the connection weights of the neural network for sub-problems obtained after the decomposition”, wherein, “solve the connection weights of the neural network for sub-problems obtained after the decomposition” is a mental process of performing mental steps determine the solution of the training function with the aid of pen and paper.
Step 2A Prong 2: 
Regarding claim 11, the additional element of “a solving unit” does not integrate the judicial exception into a practical application.  This additional element merely uses a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).
Step 2B: 
Regarding claim 11, the additional element of “a solving unit” does not amount to significantly more than the judicial exception in the claim.  The additional element merely directed to using a computer as a tool to perform an abstract idea (see MPEP 2106.05(f)).





Step 2A Prong 1:
Regarding claim 12, “wherein performing the equivalent transformation for the training function based on the indication function and the consistency constraint comprises decoupling the training function” is a mental process of performing mental steps to decouple a function with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 13, “performing iterative computations of the sub-problems obtained after the decomposing to obtain the connection weights of the neural network” is a mental process of performing computations to obtain connection weights with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 14, “the equivalently transformed training function is decomposed using an alternating direction method of multipliers (ADMM)” is a mental process of performing ADMM algorithm to solve an optimization problem with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 16, this limitation “wherein a solving algorithm used for solving the constrained optimization comprises one of a penalty function method, a multiplier method, a projected gradient method, a reduced gradient method, or a constrained variable-scale method”
is directed to different types pf optimization methods, which are mental processes.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.



Step 2A Prong 1:
Regarding claim 17, “acts further comprising determining a solving algorithm used for solving the constrained optimization based on the constraint of the training function before finding the solution of the constrained optimization solution based on the training function to obtain the connection weights of the neural network.” is a mental process of performing mental steps in a sequence.
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 18, “finding the solution of the constrained optimization solution based on the training function to obtain the connection weights of the neural network comprises: performing equivalent transformation for the training function based on an indication function and a consistency constraint” is a mental process of performing mental steps to transform a function with the aid of pen and paper and “decomposing the equivalently transformed training function using an alternating direction method of multipliers (ADMM); and solving the connection weights of the neural network for sub-problems obtained after the decomposing” is a mental process of performing ADMM algorithm to solve an optimization problem with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.

Step 2A Prong 1:
Regarding claim 19, “performing the equivalent transformation for the training function based on the indication function and the consistency constraint comprises decoupling the training function” is a mental process of performing mental steps to decouple a function with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.



Step 2A Prong 1:
Regarding claim 20, “performing iterative computations of the sub-problems obtained after the decomposing to obtain the connection weights of the neural network” is a mental process of performing computations to obtain connection weights with the aid of pen and paper.  
Step 2A Prong 2: this claim does not include any additional elements.
Step 2B: this claim does not include any additional elements.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 rejected under 35 U.S.C. 103 as being unpatentable over Sheng et al. (hereinafter Sheng) US 20180068216 A1, in view of Yedidia et al. (hereinafter Yedidia) US 20160217380 A1
In regard to claim 1. Sheng discloses:  
“constructing a training function” (Sheng in at least ¶ [Abstract] “A big data processing method based on a deep learning model satisfying K-degree sparse constraints comprises: step 1), constructing a deep learning model satisfying K-degree sparse constraints using an un-marked training sample via a gradient pruning method, wherein the K-degree sparse constraints comprise a node K-degree sparse constraint and a level K-degree sparse constraint”).
Sheng further discloses: 
“obtain connection weights of the neural network”( Sheng in at least ¶ [0017] “step 103) inputting an unmarked training sample set                         
                            Y
                            =
                            {
                            x
                            .
                            s
                            u
                            b
                            .
                            i
                            .
                            s
                            u
                            p
                            .
                            t
                            }
                        
                     into the                         
                            h
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer, and adjusting a connection weight between the                         
                            h
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer and the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer and an offset weight of nodes in the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer during minimizing a cost function of the h.sup.th layer and the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                             
                            l
                            a
                            y
                            e
                            r
                        
                    ”). 
Sheng does not exclusively disclose: 
“A method implemented by one or more computing devices”
“training function with a constraint for a neural network”
However, Yedidia discloses:
“A method implemented by one or more computing devices” (Yedidia in at least ¶ [0026] “The processor 105 may further execute a plurality of engines that are configured to perform a functionality associated with generating the optimization solution. As illustrated, the processor 105 may execute a variable engine 125, a cost engine 130, a factor graph engine 135, a constraint engine 140, an optimization engine 145, and a signal generating engine 150. The engines will be described in further detail below”, and in at least ¶ [0028] “The factor graph engine 135 may generate the graphical model for the variables and the cost functions”).
Yedidia further discloses:
“training function with a constraint for a neural network” (Yedidia in at least ¶ [0029] “Initially, a message-passing version for the ADMM algorithm is derived. The message-passing algorithm used for the ADMM algorithm as described below may be applied to a completely general optimization problem and not only to convex optimization problems. The general optimization problem may be finding a configuration the values of some variables that minimize some objective function subject to some constraints”);
“finding a solution of a constrained optimization solution based on the training function” (Yedidia in at least ¶ [0030] “The processor 105 may receive N continuous variables via the variable engine 125 and represent these variables as a vector                         
                            r
                            ∈
                            R
                            .
                            s
                            u
                            p
                            .
                            N
                        
                    . The general optimization problem becomes one of minimizing an objective function E(r) subject to some constraints on r. The objective function may be, for example, a cost function. All the constraints are considered to be part of the objective function by introducing a cost function”). 

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches having a constraint on a training function and finding an optimum solution for a constrained problem. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications (Yedidia ¶ [0023]) and further, the ADMM algorithm may consider a variety of factors to further improve generating an optimization solution. Specifically, an approach to generating the optimization solution for quickly and flexibly developing hybrid cognitive capabilities that are efficient, scalable, and exploit knowledge  that may improve the speed and quality at which the optimization solution can be found for neural network applications with constrained training functions (Yedidia ¶ [0081]).
In regard to claim 8. Sheng discloses:
“constructing a training function” (Sheng in at least ¶ [Abstract] “A big data processing method based on a deep learning model satisfying K-degree sparse constraints comprises: step 1), constructing a deep learning model satisfying K-degree sparse constraints using an un-marked training sample via a gradient pruning method, wherein the K-degree sparse constraints comprise a node K-degree sparse constraint and a level K-degree sparse constraint”). 
Sheng further discloses: 
“obtain connection weights of the neural network”( Sheng in at least ¶ [0017] “step 103) inputting an unmarked training sample set                         
                            Y
                            =
                            {
                            x
                            .
                            s
                            u
                            b
                            .
                            i
                            .
                            s
                            u
                            p
                            .
                            t
                            }
                        
                     into the h.sup.th layer, and adjusting a connection weight between the                         
                            h
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer and the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer and an offset weight of nodes in the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer during minimizing a cost function of the h.sup.th layer and the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                             
                            l
                            a
                            y
                            e
                            r
                        
                    ”).  
Sheng does not exclusively disclose: 
 “An apparatus implemented by one or more computing devices comprising one or more processors and memory, the apparatus comprising:”
“a construction module stored in the memory and executed by the one or more processors that is configured to”
“a solving module stored in the memory and executed by the one or more processors that is configured to”  
“training function with a constraint for a neural network finding a solution of a constrained optimization solution based on the training function” 
However, Yedidia disclose: 
“An apparatus implemented by one or more computing devices comprising one or more processors and memory, the apparatus comprising: “ (Yedidia in at least ¶  [0026] “The processor 105 may further execute a plurality of engines that are configured to perform a functionality associated with generating the optimization solution. As illustrated, the processor 105 may execute a variable engine 125, a cost engine 130, a factor graph engine 135, a constraint engine 140, an optimization engine 145, and a signal generating engine 150. The engines will be described in further detail below”, in at least ¶ [0028] “The factor graph engine 135 may generate the graphical model for the variables and the cost functions”, and in at least ¶ (0025] “ FIG. 1 shows a device 100 for generating an optimization solution according to an exemplary embodiment. The device 100 may be any electronic component that is configured to receive and process data such that the generating is performed. The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The processor 105 may be configured to execute an optimization application. The memory 110 may store data that is received as well as data that is determined. The transceiver 115 and the I/O device 120 may provide a communicative connection to other electronic devices or users such that data may be received for processing. For example, a user may manually enter constraints involved in a particular optimization scenario or problem”);
“a construction module stored in the memory and executed by the one or more processors” (Yedidia in at least ¶ (0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, and in at least ¶ [0028] “The factor graph engine 135 may generate the graphical model for the variables and the cost functions”); 
 “a solving module stored in the memory and executed by the one or more processors” (Yedidia in at least ¶ [0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The processor 105 may be configured to execute an optimization application. The memory 110 may store data that is received as well as data that is determined”); 
“training function with a constraint for a neural network” (Yedidia in at least ¶ [0029] “Initially, a message-passing version for the ADMM algorithm is derived. The message-passing algorithm used for the ADMM algorithm as described below may be applied to a completely general optimization problem and not only to convex optimization problems. The general optimization problem may be finding a configuration the values of some variables that minimize some objective function subject to some constraints”, 
“finding a solution of a constrained optimization solution based on the training function” (Yedidia in at least ¶ [0030] “The processor 105 may receive N continuous variables via the variable engine 125 and represent these variables as a vector                         
                            r
                            ∈
                            R
                            .
                            s
                            u
                            p
                            .
                            N
                        
                    . The general optimization problem becomes one of minimizing an objective function E(r) subject to some constraints on r. The objective function may be, for example, a cost function. All the constraints are considered to be part of the objective function by introducing a cost function”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches having a constraint on a training function and finding an optimum solution for a constrained problem. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications (Yedidia ¶ [0023]) and further, the ADMM algorithm may consider a variety of factors to further improve generating an optimization solution. Specifically, an approach to generating the optimization solution for quickly and flexibly developing hybrid cognitive capabilities that are efficient, scalable, and exploit knowledge  that may improve the speed and quality at which the optimization solution can be found for neural network applications with constrained training functions (Yedidia ¶ [0081]).




In regard to claim 15. Sheng discloses:
“constructing a training function” (Sheng in at least ¶ [Abstract] “A big data processing method based on a deep learning model satisfying K-degree sparse constraints comprises: step 1), constructing a deep learning model satisfying K-degree sparse constraints using an un-marked training sample via a gradient pruning method, wherein the K-degree sparse constraints comprise a node K-degree sparse constraint and a level K-degree sparse constraint”)
Sheng further discloses: 
“obtain connection weights of the neural network”( Sheng in at least ¶ [0017] “step 103) inputting an unmarked training sample set                         
                            Y
                            =
                            {
                            x
                            .
                            s
                            u
                            b
                            .
                            i
                            .
                            s
                            u
                            p
                            .
                            t
                            }
                        
                     into the h.sup.th layer, and adjusting a connection weight between the                         
                            h
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer and the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer and an offset weight of nodes in the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                        
                     layer during minimizing a cost function of the h.sup.th layer and the                         
                            (
                            h
                            +
                            1
                            )
                            .
                            s
                            u
                            p
                            .
                            t
                            h
                             
                            l
                            a
                            y
                            e
                            r
                        
                    ”). 
Sheng does not exclusively disclose:
“One or more computer readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:  
“constructing training function with a constraint for a neural network; and finding a solution of a constrained optimization solution based on the training function” 
However, Yedidia disclose: 
“One or more computer readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to one or more processor performing acts comprising (Yedidia in at least ¶ [0026] “ The processor 105 may further execute a plurality of engines that are configured to perform a functionality associated with generating the optimization solution. As illustrated, the processor 105 may execute a variable engine 125, a cost engine 130, a factor graph engine 135, a constraint engine 140, an optimization engine 145, and a signal generating engine 150. The engines will be described in further detail below”, in at least ¶ [0028] “The factor graph engine 135 may generate the graphical model for the variables and the cost functions”, and in at least ¶ [0178] ” In a further example, the exemplary embodiments of the above described method may be embodied as a program containing lines of code stored on a non-transitory computer readable storage medium that, when compiled, may be executed on a processor or microprocessor”);
“constructing training function with a constraint for a neural network” (Yedidia in at least ¶ “[0029]” Initially, a message-passing version for the ADMM algorithm is derived. The message-passing algorithm used for the ADMM algorithm as described below may be applied to a completely general optimization problem and not only to convex optimization problems. The general optimization problem may be finding a configuration the values of some variables that minimize some objective function subject to some constraints”).
“finding a solution of a constrained optimization solution based on the training function” (Yedidia in at least ¶ [0030] “The processor 105 may receive N continuous variables via the variable engine 125 and represent these variables as a vector                         
                            r
                            ∈
                            R
                            .
                            s
                            u
                            p
                            .
                            N
                        
                    . The general optimization problem becomes one of minimizing an objective function E(r) subject to some constraints on r. The objective function may be, for example, a cost function. All the constraints are considered to be part of the objective function by introducing a cost function”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches having a constraint on a training function and finding an optimum solution for a constrained problem. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications (Yedidia ¶ [0023]) and further, the ADMM algorithm may consider a variety of factors to further improve generating an optimization solution. Specifically, an approach to generating the optimization solution for quickly and flexibly developing hybrid cognitive capabilities that are efficient, scalable, and exploit knowledge  that may improve the speed and quality at which the optimization solution can be found for neural network applications with constrained training functions (Yedidia ¶ [0081]).



In regards to 2. Sheng and Yedidia disclose a method of claim 1 (as mentioned above) wherein:
Yedidia further discloses:
“ a solving algorithm used for solving the constrained optimization comprises one of a penalty function method, a multiplier method, a projected gradient method, a reduced gradient method, or a constrained variable-scale method” (Yedidia in at least ¶ [0035] “Accordingly, solving constrained optimization problems is equivalent to a minimization problem that naturally splits into three pieces: (1) minimizing the original soft and hard cost functions f(x) on the left side of the graphical model 200, (2) minimizing the equality cost functions g(z) on the right side of the graphical model 200, and (3) ensuring that the x=z with the Lagrange multipliers y.”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network.  Yedidia teaches having a constraint on a training function and finding an optimum solution for a constrained problem and further teaches solving algorithm used for solving the constrained optimization. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications using standard solving algorithms to solve problems in an efficient manner [Yedidia [0140]).
In regards to 3. Sheng and Yedidia disclose a method of claim 1 (as mentioned above) wherein:
Yedidia further discloses:
“determining a solving algorithm used for solving the constrained optimization based on the constraint of the training function before finding the solution of the constrained optimization solution based on the training function to obtain the connection weights of the neural network ” (Yedidia in at least ¶ [0002] “ An optimization algorithm may be used to determine a solution for a problem in which the solution considers all variables and constraints related to the problem and provides a lowest cost configuration of the values for all the variables”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches finding an optimum solution for a constrained problem and further teaches determining a solving algorithm used for solving the constrained optimization. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner. As such, the selection process that considers all constraint and variables can provide lowest cost solution to the problem [edidia [0002]).
In regards to claim 4. Sheng and Yedidia disclose a method of claim 1 (as mentioned above) wherein:
Yedidia further discloses:
 “performing equivalent transformation for the training function based on an indication function and a consistency constraint” (Yedidia in at least ¶ (0032] “The graphical model 200 includes a set of hard or soft cost functions E1, E2, E3, and E4 as well as a set of variables rl, r2, r3, r4, r5, and r6. When a line connects a cost function node with a variable node, this indicates that the cost function depends upon the corresponding variable. In order to derive the message-passing algorithm for the ADMM algorithm, the problem may be manipulated into a series of equivalent forms before actually minimizing the objective. The first manipulation is a conversion of the problem over the variables                         
                            r
                            .
                            s
                            u
                            b
                            .
                            j
                        
                     into an equivalent problem that depends on variables                         
                            x
                            .
                            s
                            u
                            b
                            .
                            i
                            j
                        
                     that sit on the edges of a “normalized” Forney-style factor graph. The variables in the standard factor graph are replaced with equality constraints and each of the edge variables is a copy of the corresponding variable that was on its right”); 
“decomposing the equivalently transformed training function” (Yedidia 
 in at least ¶ [0030] “The general optimization problem becomes one of minimizing an objective function E(r) subject to some constraints on r. The objective function may be, for example, a cost function. All the constraints are considered to be part of the objective function by introducing a cost function                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                        
                     for each constraint such that                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                            =
                            0
                        
                     if the constraint is satisfied and                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                            =
                            ∞
                        
                     if the constraint is not satisfied. The original objective function E(r) may also be decomposable into a collection of local cost values”);
“solving the connection weights of the neural network for sub-problems obtained after the decomposing” (Yedidia in at least ¶ [0028] “In view of the manner in which the optimization functionality is utilized, the variable engine 125 may receive the set of variables while the cost engine 130 may receive the set of cost functions. The factor graph engine 135 may generate the graphical model for the variables and the cost functions. Through the connections between the cost functions and the variables, the constraint engine 140 may provide the lines of the graphical model to the factor graph engine 135. All this data may be received by the optimization engine 145 that generates the optimization solution”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches having a constraint on a training function and further teaches performing equivalent transformation training function and solving the connections weights after decomposing the transformed function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using decomposition of the training function into sub-problems using equivalent transformation. The sub-problem approach may guarantee that the constraints are satisfied and algorithm typically converges faster (Yedidia [0139]).
In regards to claim 5. Sheng and Yedidia disclose a method of claim 4 (as mentioned above) wherein:
Yedidia further discloses:
“the equivalent transformation for the training function based on the indication function and the consistency constraint comprises decoupling the training function” (Yedidia in at least ¶ [0032] “Thus, the edge variables attached to the same equality constraint must ultimately equal each other but they may temporarily be unequal while they separately attempt to satisfy different cost functions on the left”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function and further teaches decoupling operation of the transformed training function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using decomposition of the training function into sub-problems using equivalent transformation that includes the decoupling operation of the training function. The sub-problem approach may guarantee that the constraints are satisfied and algorithm typically converges faster and further, the decoupling operation (separately attempt to satisfy cost function) may use fine grained parallelized routines leading to additional performance gains (Yedidia [0062]).
In regards to claim 6. Sheng and Yedidia disclose a method of claim 4 (as mentioned above) wherein:
Yedidia further discloses:
“solving the connection weights of the neural network for the sub-problems obtained after the decomposing comprises performing iterative computations of the sub-problems obtained after the decomposing to obtain the connection weights of the neural network”, Yedidia in at least ¶  [0035] “ In summary, the original problem of minimizing E(r) has become equivalent to finding the minimum of the augmented Lagrangian”, in at least ¶ [0036] “ Therefore, the minimum of the Lagrangian may be determined through maximizing the dual function of the following:
                        
                            h
                            (
                            y
                            )
                            =
                            L
                            (
                            x
                            .
                            s
                            u
                            p
                            .
                            *
                            ,
                             
                            y
                            ,
                             
                            z
                            )
                        
                       (Equation 2)
where                         
                            (
                            x
                            .
                            s
                            u
                            p
                            .
                            *
                            ,
                            z
                            .
                            s
                            u
                            p
                            .
                            *
                            )
                        
                     are the values of                         
                            x
                        
                     and                         
                            z
                        
                     that minimize L for a particular choice of                         
                            y
                        
                    :”, and in at least ¶ [0037] “In order to maximize                         
                            h
                            (
                            y
                            )
                        
                    , a gradient ascent algorithm may be used. Thus, given values of                         
                            y
                            .
                            s
                            u
                            p
                            .
                            t
                        
                     at some iteration                         
                            t
                        
                    , an iterative computation may be performed”). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function that includes the decoupling operation of the transformed training function and further uses iterative computations for performing decomposition of the transformation function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using iterative computational routines.  Such iterative computations may lead to natural amenability to concurrent processing at multi-level of execution using well known multi-core and GPU processors and concurrency computational framework such as MapReduce leading to highly efficient computations (Yedidia [0058]).


In regards to claim 7. Sheng and Yedidia disclose a method of claim 4 (as mentioned above) wherein:
Yedidia further discloses:
“the equivalently transformed training function is decomposed using an alternating direction method of multipliers (ADMM)” (Yedidia in at least ¶ [0003]  “A conventional 
algorithm used for convex optimization is the Alternating Direction Method of Multipliers (ADMM”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function that includes the decoupling operation of the transformed training function and further uses ADMM method for performing decomposition. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications and ADMM is well known for solving distributed applications (Yedidia [0003]) and is known to converge faster when the problem is broken down into relatively large sub-problems that can be solved efficiently (Yedidia [0139]).
In regards to claim 9. Sheng and Yedidia disclose “An apparatus implemented by one or more computing devices” of claim 8 (as mentioned above) wherein:
Yedidia further discloses:
“a solving algorithm used for solving the constrained optimization is any one of the following algorithms: a penalty function method, a multiplier method, a projected gradient method, a reduced gradient method, or a constrained variable-scale method.”(Yedidia in at least ¶ [0035] “ Accordingly, solving constrained optimization problems is equivalent to a minimization problem that naturally splits into three pieces: (1) minimizing the original soft and hard cost functions f(x) on the left side of the graphical model 200, (2) minimizing the equality cost functions g(z) on the right side of the graphical model 200, and (3) ensuring that the x=z with the Lagrange multipliers y”). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches having a constraint on a training function and finding an optimum solution for a constrained problem and further teaches solving the optimization using any one of the standard algorithms such as: a penalty function method, a multiplier method, a projected gradient method, a reduced gradient method, or a constrained variable‐scale method. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications. The use of Lagrange multiplier method can be applied to generate iterative equations leading to iterative computations for obtaining higher computational efficiency (Yedidia [0038], [0036]).
In regards to claim 10. Sheng and Yedidia disclose “An apparatus implemented by one or more computing devices” of claim 8 (as mentioned above) wherein:
Yedidia further discloses:
“comprising a determination module configured to determine a solving algorithm used for solving the constrained optimization according to the constraint of the training function” (Yedidia in at least ¶ (0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, and in at least ¶ [0028] “The factor graph engine 135 may generate the graphical model for the variables and the cost functions”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches finding an optimum solution for a constrained problem and further teaches determining a solving algorithm used for solving the constrained optimization. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner. As such, the selection process that considers all constraint and variables can provide lowest cost solution to the problem [Yedidia [0002]).




In regards to claim 11. Sheng and Yedidia disclose “An apparatus implemented by one or more computing devices” of claim 8 (as mentioned above) wherein: 
Yedidia further discloses:
“a transformation unit configured to perform an equivalent transformation on the training function based on an indication function and a consistency constraint” (Yedidia in at least ¶ (0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, in at least ¶ [0028] “ The factor graph engine 135 may generate the graphical model for the variables and the cost functions”, and in at least ¶ [0032] “The graphical model 200 includes a set of hard or soft cost functions E1, E2, E3, and E4 as well as a set of variables rl, r2, r3, r4, r5, and r6. When a line connects a cost function node with a variable node, this indicates that the cost function depends upon the corresponding variable. In order to derive the message-passing algorithm for the ADMM algorithm, the problem may be manipulated into a series of equivalent forms before actually minimizing the objective. The first manipulation is a conversion of the problem over the variables                         
                            r
                            .
                            s
                            u
                            b
                            .
                            j
                        
                     into an equivalent problem that depends on variables                         
                            x
                            .
                            s
                            u
                            b
                            .
                            i
                            j
                        
                     that sit on the edges of a “normalized” Forney-style factor graph. The variables in the standard factor graph are replaced with equality constraints and each of the edge variables is a copy of the corresponding variable that was on its right”);
“a decomposition unit configured to decompose the equivalently transformed training function” 
(Yedidia in at least ¶ (0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, in at least ¶ [0028] “The factor graph engine 135 may generate the graphical model for the variables and the cost functions”, and in at least ¶ [0030] “The general optimization problem becomes one of minimizing an objective function E(r) subject to some constraints on r. The objective function may be, for example, a cost function. All the constraints are considered to be part of the objective function by introducing a cost function                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                        
                     for each constraint such that                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                            =
                            0
                        
                     if the constraint is satisfied and                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                            =
                            ∞
                        
                     if the constraint is not satisfied. The original objective function E(r) may also be decomposable into a collection of local cost values”);
 “a solving unit configured to solve the connection weights of the neural network for sub-problems obtained after the decomposition” 
( Yedidia in at least ¶ (0025] “ The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, in at least ¶ [0028] “ The factor graph engine 135 may generate the graphical model for the variables and the cost functions”,  and in at least  ¶ [0028] “ In view of the manner in which the optimization functionality is utilized, the variable engine 125 may receive the set of variables while the cost engine 130 may receive the set of cost functions. The factor graph engine 135 may generate the graphical model for the variables and the cost functions. Through the connections between the cost functions and the variables, the constraint engine 140 may provide the lines of the graphical model to the factor graph engine 135. All this data may be received by the optimization engine 145 that generates the optimization solution”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches having a constraint on a training function and further teaches performing equivalent transformation training function and solving the connections weights after decomposing the transformed function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using decomposition of the training function into sub-problems using equivalent transformation. The sub-problem approach may guarantee that the constraints are satisfied and algorithm typically converges faster (Yedidia [0139]).
In regards to claim 12. Sheng and Yedidia disclose “An apparatus implemented by one or more computing devices” of claim 11 (as mentioned above) wherein:
Yedidia further discloses:
“performing the equivalent transformation on the training function by the transformation unit comprising decoupling the training function” (Yedidia in at least ¶ (0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, in at least ¶ [0028] “ The factor graph engine 135 may generate the graphical model for the variables and the cost functions”, and in at least ¶ [0032] “Thus, the edge variables attached to the same equality constraint must ultimately equal each other but they may temporarily be unequal while they separately attempt to satisfy different cost functions on the left”).   
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function and further teaches decoupling operation of the transformed training function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using decomposition of the training function into sub-problems using equivalent transformation that includes the decoupling operation of the training function. The sub-problem approach may guarantee that the constraints are satisfied and algorithm typically converges faster and further, the decoupling operation (separately attempt to satisfy cost function) may use fine grained parallelized routines leading to additional performance gains (Yedidia [0062]).
In regards to claim 13. Sheng and Yedidia disclose “An apparatus implemented by one or more computing devices” of claim 11 (as mentioned above) wherein: 
Yedidia further discloses:
“solving the connection weights of the neural network by the solving unit comprises performing iterative computations of the sub-problems obtained after the decomposing to obtain the connection weights of the neural network” Yedidia in at least ¶ (0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, in at least ¶ [0035] “ In summary, the original problem of minimizing E(r) has become equivalent to finding the minimum of the augmented Lagrangian”, in at least ¶ [0036] “ Therefore, the minimum of the Lagrangian may be determined through maximizing the dual function of the following:
                        
                            h
                            (
                            y
                            )
                            =
                            L
                            (
                            x
                            .
                            s
                            u
                            p
                            .
                            *
                            ,
                             
                            y
                            ,
                             
                            z
                            )
                        
                       (Equation 2)
where                         
                            (
                            x
                            .
                            s
                            u
                            p
                            .
                            *
                            ,
                            z
                            .
                            s
                            u
                            p
                            .
                            *
                            )
                        
                     are the values of                         
                            x
                        
                     and                         
                            z
                        
                     that minimize L for a particular choice of                         
                            y
                        
                    :”, and in at least ¶ [0037] “In order to maximize                         
                            h
                            (
                            y
                            )
                        
                    , a gradient ascent algorithm may be used. Thus, given values of                         
                            y
                            .
                            s
                            u
                            p
                            .
                            t
                        
                     at some iteration                         
                            t
                        
                    , an iterative computation may be performed”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function that includes the decoupling operation of the transformed training function and further uses iterative computations for performing decomposition of the transformation function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner.  Such iterative computations may lead to natural amenability to concurrent processing at multi-level of execution using well known multi-core and GPU processors and concurrency computational framework such as MapReduce leading to highly efficient computations (Yedidia [0058]).
In regards to claim 14. Sheng and Yedidia disclose “An apparatus implemented by one or more computing devices” of claim 11 (as mentioned above) wherein:
“decomposition unit configured to decompose the equivalently transformed training function using an alternating direction method of multipliers (ADMM)” Yedidia in at least ¶ (0025] “The device 100 may include a processor 105, a memory arrangement 110, a transceiver 115, and an input/output (I/O) device 120. The memory 110 may store data that is received as well as data that is determined”, and in at least ¶ [0003] “A conventional algorithm used for convex optimization is the Alternating Direction Method of Multipliers (ADMM). Conventionally, the ADMM algorithm is a variant of an augmented Lagrangian scheme that uses partial updates for dual variables”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function that includes the decoupling operation of the transformed training function and further uses ADMM method for performing decomposition. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications and ADMM is well known for solving distributed applications (Yedidia [0003]) and is known to converge faster when the problem is broken down into relatively large sub-problems that can be solved efficiently (Yedidia [0139]).
In regards to claim 16. Sheng and Yedidia disclose the “One or more computer readable media” of claim 15 (as mentioned above) wherein: 
Yedidia further discloses:
“a solving algorithm used for solving the constrained optimization comprises one of a penalty function method, a multiplier method, a projected gradient method, a reduced gradient method, or a constrained variable-scale method”, Yedidia in at least ¶ [0035] “ Accordingly, solving constrained optimization problems is equivalent to a minimization problem that naturally splits into three pieces: (1) minimizing the original soft and hard cost functions f(x) on the left side of the graphical model 200, (2) minimizing the equality cost functions g(z) on the right side of the graphical model 200, and (3) ensuring that the x=z with the Lagrange multipliers y.”). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network.  Yedidia teaches having a constraint on a training function and finding an optimum solution for a constrained problem and further teaches solving algorithm used for solving the constrained optimization. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications using standard solving algorithms to solve problems in an efficient manner [Yedidia [0140]).
In regards to claim 17. Sheng and Yedidia disclose “One or more computer readable media” of claim 15 (as mentioned above) wherein:
Yedidia further discloses:
“determining a solving algorithm used for solving the constrained optimization based on the constraint of the training function before finding the solution of the constrained optimization solution based on the training function to obtain the connection weights of the neural network” (Yedidia in at least ¶[0002] “ An optimization algorithm may be used to determine a solution for a problem in which the solution considers all variables and constraints related to the problem and provides a lowest cost configuration of the values for all the variables”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches finding an optimum solution for a constrained problem and further teaches determining a solving algorithm used for solving the constrained optimization. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner. As such, the selection process that considers all constraint and variables can provide lowest cost solution to the problem [Yedidia [0002]).
In regards to claim 18. Sheng and Yedidia disclose “One or more computer readable media” of claim 15 (as mentioned above) wherein:
Yedidia further discloses:
“ performing equivalent transformation for the training function based on an indication function and a consistency constraint” (Yedidia in at least ¶ [0032] “The graphical model 200 includes a set of hard or soft cost functions E1, E2, E3, and E4 as well as a set of variables rl, r2, r3, r4, r5, and r6. When a line connects a cost function node with a variable node, this indicates that the cost function depends upon the corresponding variable. In order to derive the message-passing algorithm for the ADMM algorithm, the problem may be manipulated into a series of equivalent forms before actually minimizing the objective. The first manipulation is a conversion of the problem over the variables                         
                            r
                            .
                            s
                            u
                            b
                            .
                            j
                        
                     into an equivalent problem that depends on variables                         
                            x
                            .
                            s
                            u
                            b
                            .
                            i
                            j
                        
                     that sit on the edges of a “normalized” Forney-style factor graph. The variables in the standard factor graph are replaced with equality constraints and each of the edge variables is a copy of the corresponding variable that was on its right”);
“decomposing the equivalently transformed training function using an alternating direction method of multipliers (ADMM)” (Yedidia in at least [0003] “A conventional algorithm used for convex optimization is the Alternating Direction Method of Multipliers (ADMM). Conventionally, the ADMM algorithm is a variant of an augmented Lagrangian scheme that uses partial updates for dual variables”, and in at least ¶ [0030] “The general optimization problem becomes one of minimizing an objective function E(r) subject to some constraints on r. The objective function may be, for example, a cost function. All the constraints are considered to be part of the objective function by introducing a cost function                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                        
                     for each constraint such that                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                            =
                            0
                        
                     if the constraint is satisfied and                         
                            E
                            .
                            s
                            u
                            b
                            .
                            a
                            (
                            r
                            )
                            =
                            ∞
                        
                     if the constraint is not satisfied. The original objective function E(r) may also be decomposable into a collection of local cost values”);
“solving the connection weights of the neural network for sub-problems obtained after the decomposing” (Yedidia in at least ¶ [0028] “In view of the manner in which the optimization functionality is utilized, the variable engine 125 may receive the set of variables while the cost engine 130 may receive the set of cost functions. The factor graph engine 135 may generate the graphical model for the variables and the cost functions. Through the connections between the cost functions and the variables, the constraint engine 140 may provide the lines of the graphical model to the factor graph engine 135. All this data may be received by the optimization engine 145 that generates the optimization solution”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches having a constraint on a training function and further teaches performing equivalent transformation training function and solving the connections weights after decomposing the transformed function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using decomposition of the training function into sub-problems using equivalent transformation. The sub-problem approach may guarantee that the constraints are satisfied and algorithm typically converges faster (Yedidia [0139]).
In regards to claim 19. Sheng and Yedidia disclose “One or more computer readable media” of claim 18 (as mentioned above) wherein: 
Yedidia further discloses:
“decoupling the training function” (Yedidia in at least ¶ [0026] “The processor 105 may further execute a plurality of engines that are configured to perform a functionality associated with generating the optimization solution. As illustrated, the processor 105 may execute a variable engine 125, a cost engine 130, a factor graph engine 135, a constraint engine 140, an optimization engine 145, and a signal generating engine 150. The engines will be described in further detail below”, in at least ¶ [0028] “ The factor graph engine 135 may generate the graphical model for the variables and the cost functions”, in at least ¶ [0178] In a further example, the exemplary embodiments of the above described method may be embodied as a program containing lines of code stored on a non-transitory computer readable storage medium that, when compiled, may be executed on a processor or microprocessor”, and in at least ¶ (0032] “Thus, the edge variables attached to the same equality constraint must ultimately equal each other but they may temporarily be unequal while they separately attempt to satisfy different cost functions on the left”). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function and further teaches decoupling operation of the transformed training function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using decomposition of the training function into sub-problems using equivalent transformation that includes the decoupling operation of the training function. The sub-problem approach may guarantee that the constraints are satisfied and algorithm typically converges faster and further, the decoupling operation (separately attempt to satisfy cost function) may use fine grained parallelized routines leading to additional performance gains (Yedidia [0062]).
In regards to claim 20. Sheng and Yedidia disclose “One or more computer readable media” of claim 18 (as mentioned above) wherein: 
solving the connection weights of the neural network for the sub-problems obtained after the decomposing comprises”
Yedidia further discloses:
“performing iterative computations of the sub-problems obtained after the decomposing to obtain the connection weights of the neural network” (Yedidia in at least ¶ [0026] “The processor 105 may further execute a plurality of engines that are configured to perform a functionality associated with generating the optimization solution. As illustrated, the processor 105 may execute a variable engine 125, a cost engine 130, a factor graph engine 135, a constraint engine 140, an optimization engine 145, and a signal generating engine 150. The engines will be described in further detail below”, in at least ¶ [0028] “ The factor graph engine 135 may generate the graphical model for the variables and the cost functions”, in at least ¶ [0178] In a further example, the exemplary embodiments of the above described method may be embodied as a program containing lines of code stored on a non-transitory computer readable storage medium that, when compiled, may be executed on a processor or microprocessor”, in at least ¶ [0035] “ In summary, the original problem of minimizing E(r) has become equivalent to finding the minimum of the augmented Lagrangian”, in at least ¶ [003] “ Therefore, the minimum of the Lagrangian may be determined through maximizing the dual function of the following:
                        
                            h
                            (
                            y
                            )
                            =
                            L
                            (
                            x
                            .
                            s
                            u
                            p
                            .
                            *
                            ,
                             
                            y
                            ,
                             
                            z
                            )
                        
                       (Equation 2)
where                         
                            (
                            x
                            .
                            s
                            u
                            p
                            .
                            *
                            ,
                            z
                            .
                            s
                            u
                            p
                            .
                            *
                            )
                        
                     are the values of                         
                            x
                        
                     and                         
                            z
                        
                     that minimize L for a particular choice of                         
                            y
                        
                    :”, and in at least ¶ 0037]” In order to maximize                         
                            h
                            (
                            y
                            )
                        
                    , a gradient ascent algorithm may be used. Thus, given values of                         
                            y
                            .
                            s
                            u
                            p
                            .
                            t
                        
                     at some iteration                         
                            t
                        
                    , an iterative computation may be performed”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sheng and Yedidia. Sheng teaches constructing a training function for a neural network using gradient pruning method and obtaining a connection weight of the neural network. Yedidia teaches performing equivalent transformation training function that includes the decoupling operation of the transformed training function and further uses iterative computations for performing decomposition of the transformation function. One of ordinary skill would have motivation to combine Sheng and Yedidia to provide a constrained training function and the ability to solve the constraint satisfaction problems applied to broad range of neural network applications by selecting among plurality of standard solving algorithms to solve problems in an efficient manner using iterative computational routines.  Such iterative computations may lead to natural amenability to concurrent processing at multi-level of execution using well known multi-core and GPU processors and concurrency computational framework such as MapReduce leading to highly efficient computations (Yedidia [0058]).








Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIRUMALE KRISHNASWAMY RAMESH whose telephone number is (571)272-4605. The examiner can normally be reached by phone.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on phone (571-272-3768). The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/TIRUMALE K RAMESH/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121