DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 12 and 18 have been amended. Claims 2, 13 and 19 have been canceled. Claims 1, 3-12, 14-18 and 20 have been examined.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 8 recites: “layer i of the k-th subsequent DNN” in line 6. However, claim 8 is dependent upon claims 7, 6, 3, and 1. Claim 6 recites: “layer i ≤ L;” claim 3 recites “the first plurality of indexed layers comprises L hidden layers;” and claim 1 recites “the first DNN comprises a first plurality of indexed layers.” Thus, the term “i” of claim 6 relates to a layer of the first DNN. However, the term “i” of claim 8 relates to a layer of a subsequent DNN.  It is not clear if the term “I” in claim 8 should refer to a layer in the first DNN, or to a layer in a subsequent DNN. 
Claim 8 recites: “the j-th subsequent DNN” in line 7. As such, the term “j” refers to a subsequent DNN. However, claim 8 is dependent upon claims 7, 6, 3, and 1. Claim 7 recites: “layer j ≤ Ms;” and claim 3 recites “each respective plurality of indexed layers s comprises Ms hidden layers;” and claim 1 recites “each subsequent DNN comprises a respective plurality of indexed layers.” Thus, the term “layer j” of claim 7 relates to a layer of a subsequent DNN. However, as noted, the term “j” of claim 8 relates to a subsequent DNN.  It is not clear if the term “j” in claim 8 should refer to a layer in a subsequent DNN (claim 7), or if it should provide a designation of a subsequent DNN (claim 8). 
Claim 8 provides a description for the term                     
                        
                            
                                h
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                    
                . However, no similar description is provided for the terms                     
                        
                            
                                h
                            
                            
                                i
                                -
                                1
                            
                            
                                (
                                k
                                )
                            
                        
                         
                    
                and                     
                        
                            
                                h
                            
                            
                                i
                                -
                                1
                            
                            
                                (
                                j
                                )
                            
                        
                    
                . As such, definitive interpretation of these terms cannot be made. 
Claim 8 provides a description of the set membership terms                     
                        
                            
                                W
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                        ∈
                        
                            
                                R
                            
                            
                                
                                    
                                        n
                                    
                                    
                                        1
                                    
                                
                                ×
                                
                                    
                                        n
                                    
                                    
                                        i
                                        -
                                        1
                                    
                                
                            
                        
                    
                 and                     
                        
                            
                                U
                            
                            
                                i
                                j
                            
                            
                                (
                                k
                                )
                            
                        
                        ∈
                        
                            
                                R
                            
                            
                                
                                    
                                        n
                                    
                                    
                                        1
                                    
                                
                                ×
                                
                                    
                                        n
                                    
                                    
                                        j
                                    
                                
                            
                        
                    
                . These set membership terms share the terms                     
                        
                            
                                W
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                    
                and                     
                        
                            
                                U
                            
                            
                                i
                                j
                            
                            
                                (
                                k
                                )
                            
                        
                    
                 with the activation                     
                        
                            
                                h
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                        =
                         
                        σ
                        
                            
                                
                                    
                                        W
                                    
                                    
                                        i
                                    
                                    
                                        (
                                        k
                                        )
                                    
                                
                                
                                    
                                        h
                                    
                                    
                                        i
                                        -
                                        1
                                    
                                    
                                        (
                                        k
                                        )
                                    
                                
                                +
                                
                                    
                                        ∑
                                        
                                            j
                                            <
                                            k
                                        
                                    
                                    
                                        
                                            
                                                U
                                            
                                            
                                                i
                                                j
                                            
                                            
                                                (
                                                k
                                                )
                                            
                                        
                                        
                                            
                                                h
                                            
                                            
                                                i
                                                -
                                                1
                                            
                                            
                                                (
                                                j
                                                )
                                            
                                        
                                    
                                
                            
                        
                    
                . However, the set membership terms are not clearly set forth as being part of the activation itself. The language of the claims do not clearly set forth whether or not the set membership terms limit the activation or not. 
Claim 9 recites: “layer i of the k-th subsequent DNN” in line 5. However, claim 9 is dependent upon claims 7, 6, 3, and 1. Claim 6 recites: “layer i ≤ L;” claim 3 recites “the first plurality of indexed layers comprises L hidden layers;” and claim 1 recites “the first DNN comprises a first plurality of indexed layers.” Thus, the term “i” of claim 6 relates to a layer of the first DNN. However, the term “i” of claim 9 relates to a layer of a subsequent DNN.  It is not clear if the term “I” in claim 9 should refer to a layer in the first DNN, or to a layer in a subsequent DNN. 
Claim 9 recites: “the j-th subsequent DNN” in line 7. As such, the term “j” refers to a subsequent DNN. However, claim 9 is dependent upon claims 7, 6, 3, and 1. Claim 7 recites: “layer j ≤ Ms;” claim 3 recites “each respective plurality of indexed layers s comprises Ms hidden layers;” and claim 1 recites “each subsequent DNN comprises a respective plurality of indexed layers.” Thus, the term “layer j” of claim 7 relates to a layer of a subsequent DNN. However, as noted, the term “j” of claim 9 relates to a subsequent DNN.  It is not clear if the term “j” in claim 9 should refer to a layer in a subsequent DNN (claim 7), or if it should provide a designation of a subsequent DNN (claim 9). 
Claim 9 provides a description for the term                     
                        
                            
                                h
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                    
                . However, no similar description is provided for the terms                     
                        
                            
                                h
                            
                            
                                i
                                -
                                1
                            
                            
                                (
                                k
                                )
                            
                        
                         
                    
                and                     
                        
                            
                                h
                            
                            
                                i
                                -
                                1
                            
                            
                                (
                                <
                                k
                                )
                            
                        
                    
                . As such, definitive interpretation of these terms cannot be made. 
Claim 9 provides a description of the set membership terms                     
                        
                            
                                W
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                        ∈
                        
                            
                                R
                            
                            
                                
                                    
                                        n
                                    
                                    
                                        1
                                    
                                
                                ×
                                
                                    
                                        n
                                    
                                    
                                        i
                                        -
                                        1
                                    
                                
                            
                        
                    
                 ,                     
                        
                            
                                U
                            
                            
                                i
                                j
                            
                            
                                (
                                k
                                )
                            
                        
                        ∈
                        
                            
                                R
                            
                            
                                
                                    
                                        n
                                    
                                    
                                        1
                                    
                                
                                ×
                                
                                    
                                        n
                                    
                                    
                                        j
                                    
                                
                            
                        
                    
                , and                     
                        
                            
                                V
                            
                            
                                i
                                j
                            
                            
                                (
                                k
                                )
                            
                        
                        ∈
                        
                            
                                R
                            
                            
                                
                                    
                                        n
                                    
                                    
                                        1
                                        -
                                        1
                                    
                                
                                ×
                                
                                    
                                        n
                                    
                                    
                                        i
                                        -
                                        1
                                    
                                    
                                        (
                                        <
                                        k
                                        )
                                    
                                
                            
                        
                    
                . These set membership terms share the terms                     
                        
                            
                                W
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                    
                ,                     
                        
                            
                                U
                            
                            
                                i
                                j
                            
                            
                                (
                                k
                                )
                            
                        
                    
                , and                     
                        
                            
                                V
                            
                            
                                i
                                j
                            
                            
                                (
                                k
                                )
                            
                        
                    
                 the activation                     
                        
                            
                                h
                            
                            
                                i
                            
                            
                                (
                                k
                                )
                            
                        
                        =
                         
                        σ
                        
                            
                                
                                    
                                        W
                                    
                                    
                                        i
                                    
                                    
                                        (
                                        k
                                        )
                                    
                                
                                
                                    
                                        h
                                    
                                    
                                        i
                                        -
                                        1
                                    
                                    
                                        (
                                        k
                                        )
                                    
                                
                                +
                                
                                    
                                        U
                                    
                                    
                                        i
                                        j
                                    
                                    
                                        (
                                        k
                                        )
                                    
                                
                                σ
                                
                                    
                                        
                                            
                                                V
                                            
                                            
                                                i
                                                j
                                            
                                            
                                                (
                                                k
                                                )
                                            
                                        
                                        
                                            
                                                α
                                            
                                            
                                                i
                                                -
                                                1
                                            
                                            
                                                (
                                                <
                                                k
                                                )
                                            
                                        
                                        
                                            
                                                h
                                            
                                            
                                                i
                                                -
                                                1
                                            
                                            
                                                (
                                                <
                                                k
                                                )
                                            
                                        
                                    
                                
                            
                        
                    
                . However, the set membership terms are not clearly set forth as being part of the activation itself. The language of the claims do not clearly set forth whether or not the set membership terms limit the activation or not.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1, 3-7, 9-12, 14-18 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10,949,734 in view of “Knowledge Transfer in Deep Block-Modular Neural Networks” by Terekhov et al. (“Terekhov”).

	In regard to claim 1, 10,949,734 claims:
1. A neural network system implemented by one or more computers, the neural network system comprising a sequence of deep neural networks (DNNs), See Claim 1, “1. A neural network system implemented by one or more computers, the neural network system comprising a sequence of deep neural networks (DNNs),”
wherein each DNN in the sequence of DNNs has been trained to perform a respective machine learning task, and wherein the sequence of DNN comprises: See Claim 1, “wherein each DNN in the sequence of DNNs has been trained to perform a respective machine learning task, and wherein the sequence of DNN comprises:” 
a first DNN that corresponds to a first machine learning task, wherein (i) the first DNN comprises a first plurality of indexed layers, and (ii) each layer in the first plurality of indexed layers is configured to receive a respective layer input and process the layer input to generate a respective layer output; and  See Claim 1, “a first DNN that corresponds to a first machine learning task, wherein (i) the first DNN comprises a first plurality of indexed layers, and (ii) each layer in the first plurality of indexed layers is configured to receive a respective layer input and process the layer input to generate a respective layer output; and“
one or more subsequent DNNs corresponding to one or more respective machine learning tasks, wherein (i) each subsequent DNN comprises a respective plurality of indexed layers, and (ii) each layer in a respective plurality of indexed layers with index greater than one receives input from (i) a preceding layer of the respective subsequent DNN, and (ii) one or more preceding layers of respective preceding DNNs, wherein a preceding layer is a layer whose index is one less than the current index. See Claim 1, “one or more subsequent DNNs corresponding to one or more respective machine learning tasks, wherein (i) each subsequent DNN comprises a respective plurality of indexed layers, and (ii) each layer in a respective plurality of indexed layers with index greater than one receives input from (i) a preceding layer of the respective subsequent DNN, and (ii) one or more preceding layers of respective preceding DNNs, wherein a preceding layer is a layer whose index is one less than a current index.” 
and wherein, each layer with index equal to one, of the first DNN and one or more subsequent DNNs receives a respective DNN input …, the respective DNN input comprising an input associated with the respective machine learning task of the DNN. See Claim 2, “2. The system of claim 1, wherein each layer with index equal to one in a respective plurality of indexed layers receives a respective subsequent DNN input.” Also see claim 1, “one or more subsequent DNNs corresponding to one or more respective machine learning tasks.”
10,949,734 fails to claim receiving a respective DNN input only. However, this is taught by Terekhov. See Terekhov, Fig. 1(b-d), depicting layers with index equal to one that receive a respective subsequent DNN input only associated with the respective machine learning task. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the DNN of 10,949,734 with Terekhov’s DNN input in order to efficie3ntly reuse an associated DNN task as suggested by Terekhov (see section 2.1).

In regard to claims 3-7, 9-12, 14-18 and 20, these are similar to 10,949,734 claims 3-10, 15-23, respectively.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3-7, 10-12, 14-18 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “Knowledge Transfer in Deep Block-Modular Neural Networks” by Terekhov et al. (“Terekhov”).

	In regard to claim 1, Terekhov discloses:
1. A neural network system implemented by one or more computers, the neural network system comprising a sequence of deep neural networks (DNNs), See Terekhov, p. 268, e.g. “deep neural networks (DNN).”
wherein each DNN in the sequence of DNNs has been trained to perform a respective machine learning task, and wherein the sequence of DNN comprises: 
a first DNN that corresponds to a first machine learning task, wherein (i) the first DNN comprises a first plurality of indexed layers, and (ii) each layer in the first plurality of indexed layers is configured to receive a respective layer input and process the layer input to generate a respective layer output; and  See Terekhov, Fig. 1(a), depicting a DNN with a plurality of layers which process inputs to generate outputs.
one or more subsequent DNNs corresponding to one or more respective machine learning tasks, wherein (i) each subsequent DNN comprises a respective plurality of indexed layers, and (ii) each layer in a respective plurality of indexed layers with index greater than one receives input from (i) a preceding layer of the respective subsequent DNN, and (ii) one or more preceding layers of respective preceding DNNs, wherein a preceding layer is a layer whose index is one less than the current index.  See Terekhov, Fig. 1(b), depicting a subsequent “block” DNN receiving inputs from a preceding layer as well as a preceding “original” DNN.
and wherein, each layer with index equal to one, of the first DNN and one or more subsequent DNNs receives a respective DNN input only, the respective DNN input comprising an input associated with the respective machine learning task of the DNN. See Terekhov, Fig. 1(b-d), depicting layers with index equal to one that receive a respective subsequent DNN input only associated with the respective machine learning task.

	In regard to claim 3, Terekhov discloses:
3. The system of claim 1, wherein (i) the first plurality of indexed layers comprises L hidden layers, and (ii) each respective plurality of indexed layers s comprises Ms hidden layers. See Terekhov, Fig. 1(b-d). Note that the terms L, s, and Ms are understood to be number variable, but have not been particularly defined and could mean anything.

	In regard to claim 4, Terekhov discloses:
4. The system of claim 3, wherein L is not equal to Ms for each s. See Terekhov, Fig. 1(c-d).

	In regard to claim 5, Terekhov discloses:
5. The system of claim 3, wherein L is equal to Ms for one or more s. See Terekhov, Fig. 1(b).

	In regard to claim 6, Terekhov discloses:
6. The system of claim 3, wherein each layer in the first plurality of indexed layers comprises a hidden activation                         
                            
                                
                                    h
                                
                                
                                    i
                                
                                
                                    (
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    n
                                    i
                                
                            
                        
                    , where ni represents a number of hidden units at layer i ≤ L. See Terekhov, top of p. 6, e.g. “The activation of the i-th neuron at the k-th level (with m(k) neurons) was computed as a weighted sum of the outputs of neurons from the previous layer.” This applies to Fig. 1(a) in addition to Figs. 1(b-d).

	In regard to claim 7, Terekhov discloses:
7. The system of claim 6, wherein each layer in a respective plurality of indexed layers s comprises a hidden activation                         
                            
                                
                                    h
                                
                                
                                    j
                                
                                
                                    (
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    n
                                    j
                                
                            
                        
                    , where nj represents a number of hidden units at layer j ≤ Ms . See Terekhov, top of p. 6, e.g. “The activation of the i-th neuron at the k-th level (with m(k) neurons) was computed as a weighted sum of the outputs of neurons from the previous layer.” This applies to Fig. 1(b-d).

	In regard to claim 10, Terekhov discloses:
10. The system of claim 1, wherein the sequence of machine learning tasks comprises independent machine learning tasks. See Terekhov, section 1.2, ¶ 3, e.g. “We train the original network on a task T1 and then train the weights of the block neurons on a task T2. The resulting network is then able to perform both tasks T1 and T2. Note that such a network has two classifiers: one for T1 (original classification neurons) and one for T2 (block classification neurons).”

	In regard to claim 11, Terekhov discloses:
11. The system of claim 1, wherein the sequence of machine learning tasks comprises one or more of (i) adversarial machine learning tasks, (ii) classification tasks, (iii) robot learning tasks, or (iv) generative modeling tasks. Terekhov, section 1.1, ¶ 3, e.g. “classification tasks.”

	In regard to claim 12, Terekhov discloses:
12. A method for sequentially training a sequence of deep neural networks (DNNs) to perform a sequence of machine learning tasks, each DNN in the sequence corresponding to a respective machine learning task, and the method comprising: See Terekhov, p. 2, e.g. “In the current paper we explore an alternative approach to training DNNs, partly inspired by modular NNs. This approach will allow the network to learn a new task by exploiting previously learned features.” Also see section 2, e.g. “Methods.”
for a first machine learning task in the sequence: training a first DNN in the sequence that corresponds to the first machine learning task to perform the first machine learning task, wherein (i) the first DNN comprises a first plurality of indexed layers, and (ii) each layer in the first plurality of indexed layers is configured to receive a respective layer input and process the layer input to generate a respective layer output; See Terekhov, Fig. 1(a), depicting a DNN with a plurality of layers which process inputs to generate outputs.
for each subsequent machine learning task in the sequence: training a subsequent DNN corresponding to the machine learning task to perform the machine learning task, wherein (i) the subsequent DNN comprises a subsequent plurality of indexed layers, and (ii) each layer with index greater than one in the subsequent indexed plurality of layers receives input from (i) a preceding layer of the subsequent DNN, and (ii) one or more preceding layers of respective preceding DNNs, wherein a preceding layer is a layer whose index is one less than the current index. See Terekhov, Fig. 1(b), depicting a subsequent “block” DNN receiving inputs from a preceding layer as well as a preceding “original” DNN.

	In regard to claims 14-15, the rejection of parent claim 12 is addressed above. All further limitations have been addressed in the above rejections of claims 10-11, respectively. 

	In regard to claim 16, Terekhov discloses:
16. The method of claim 12, wherein (i) the first DNN comprises one or more respective DNN parameters, and (ii) each subsequent DNN comprises one or more respective subsequent DNN parameters, and See Terekhov, top of p. 10, e.g. “Each original network had approximately 105 parameters, while each block network had about 2104 which were not shared with original networks (for 0-50-50 blocks).” … wherein training each subsequent DNN comprises setting preceding DNN parameters of preceding DNNs to constant values. See Terekhov, top of p. 2, e.g. “Rather, we would like to have a system that re-uses features learned in task T1 to solve task T2, and eventually, after having learned a number of tasks, would re-use relevant features from all (or more likely some) previous tasks to solve a new task.” Also section 1.1 at the top of p. 3, e.g. “Such a network, after being trained on a certain task,T1, has definite values of weights and biases.”

	In regard to claim 17, Terekhov discloses:
17. The method of claim 12, wherein training each subsequent DNN further comprises adjusting values of the respective subsequent DNN parameters using a machine learning training technique. See Terekhov, p. 2, 2nd full paragraph, e.g. “We repeat this procedure on multiple tasks, showing that the final architecture is able to learn a new task by adding a rather small number of blocks of neurons and connection weights to the original network, when compared with a number of neurons and connection weights in a network which must learn the new task from scratch.”

	In regard to claim 18, Terekhov discloses:
18. A method for processing an input using a sequence of deep neural networks (DNNs), wherein each DNN in the sequence of DNNs has been trained to perform a respective machine learning task, the sequence of DNN comprising:  See Terekhov, p. 2, e.g. “In the current paper we explore an alternative approach to training DNNs, partly inspired by modular NNs. This approach will allow the network to learn a new task by exploiting previously learned features.” Also see section 2, e.g. “Methods.”
… the method comprising: 
receiving an input as part of a machine learning task corresponding to a last subsequent DNN in the sequence of DNNs; and processing the input using the last subsequent DNN in the sequence to generate a last subsequent DNN output for the machine learning task. See Terekhov, at least Fig. 1v, depicting input for a machine learning task of a block DNN for processing and generation of an output.
All further limitations have been addressed in the above rejection of claim 1.

	In regard to claim 20, parent claim 18 is addressed above. All further limitations have been addressed in the above rejection of claim 11. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Terekhov as applied above, and further in view of United States Patent 5,748,847 to Lo (“Lo”).

	In regard to claim 8, Terekhov discloses:
8. The system of claim 7, wherein the sequence of machine learning tasks comprises k+1 machine learning tasks, and wherein an activation of the k-th subsequent DNN is given by
                 
                    
                        
                            h
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                    =
                     
                    σ
                    
                        
                            
                                
                                    W
                                
                                
                                    i
                                
                                
                                    (
                                    k
                                    )
                                
                            
                            
                                
                                    h
                                
                                
                                    i
                                    -
                                    1
                                
                                
                                    (
                                    k
                                    )
                                
                            
                            +
                            
                                
                                    ∑
                                    
                                        j
                                        <
                                        k
                                    
                                
                                
                                    
                                        
                                            U
                                        
                                        
                                            i
                                            j
                                        
                                        
                                            (
                                            k
                                            )
                                        
                                    
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            -
                                            1
                                        
                                        
                                            (
                                            j
                                            )
                                        
                                    
                                
                            
                        
                    
                
             
wherein                 
                    
                        
                            h
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                
             represents an activation of the k-th subsequent DNN,                 
                    
                        
                            W
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    n
                                
                                
                                    1
                                
                            
                            ×
                            
                                
                                    n
                                
                                
                                    i
                                    -
                                    1
                                
                            
                        
                    
                
             represents a weight matrix of layer i of the k-th subsequent DNN,                 
                    
                        
                            U
                        
                        
                            i
                            j
                        
                        
                            (
                            k
                            )
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    n
                                
                                
                                    1
                                
                            
                            ×
                            
                                
                                    n
                                
                                
                                    j
                                
                            
                        
                    
                
             represents lateral connections from layer i of the k-th subsequent DNN to layer i-1 of the j-th subsequent DNN and                 
                    σ
                     
                
            represents an element-wise … [function]. See Terekhov, top of p. 6, e.g. “The activation of the i-th neuron at the k-th level (with m(k) neurons) was computed as a weighted sum of the outputs of neurons from the previous layer:
                
                    
                        
                            x
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                    =
                     
                    
                        
                            ∑
                            
                                j
                                =
                                1
                            
                            
                                
                                    
                                        m
                                    
                                    
                                        (
                                        k
                                        )
                                    
                                
                            
                        
                        
                            
                                
                                    w
                                
                                
                                    j
                                    i
                                
                                
                                    (
                                    k
                                    )
                                
                            
                            
                                
                                    y
                                
                                
                                    j
                                
                                
                                    (
                                    k
                                    -
                                    1
                                    )
                                
                            
                            +
                            
                                
                                    b
                                
                                
                                    i
                                
                                
                                    (
                                    k
                                    )
                                
                            
                        
                    
                
            
Note that this essentially provides a weight matrix as applied to previous layers as depicted in the networks of Figure 1. Also note the network depictions in Figs. 1(b-d), showing the lateral connections from previous layers. Terekhov does not expressly disclose                 
                    t
                    h
                    a
                    t
                     
                    σ
                     
                
            represents an element-wise non linearity. However, Lo teaches this. Lo teaches an element-wise non-linear sigma function (i.e. sigmoid) based on a summation and a bias. See Lo, Fig. 2 and associated text at col. 7, lines 10-18, e.g. “The activation function is a sigmoid function such as the hyperbolic tangent function.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Lo’s non-linear function with Terekhov’s activation functions in order to provide non-linear processing capability as suggested by Lo (see at least column 5, lines 48-51).

	In regard to claim 9, Terekhov discloses:
9. The system of claim 7, wherein the sequence of machine learning tasks comprises k machine learning tasks, and wherein an activation of the k-th subsequent DNN is given by  
                
                    
                        
                            h
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                    =
                     
                    σ
                    
                        
                            
                                
                                    W
                                
                                
                                    i
                                
                                
                                    (
                                    k
                                    )
                                
                            
                            
                                
                                    h
                                
                                
                                    i
                                    -
                                    1
                                
                                
                                    (
                                    k
                                    )
                                
                            
                            +
                            
                                
                                    U
                                
                                
                                    i
                                    j
                                
                                
                                    (
                                    k
                                    )
                                
                            
                            σ
                            
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                            j
                                        
                                        
                                            (
                                            k
                                            )
                                        
                                    
                                    
                                        
                                            α
                                        
                                        
                                            i
                                            -
                                            1
                                        
                                        
                                            (
                                            <
                                            k
                                            )
                                        
                                    
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            -
                                            1
                                        
                                        
                                            (
                                            <
                                            k
                                            )
                                        
                                    
                                
                            
                        
                    
                
             
wherein                  
                    
                        
                            h
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                
            represents an activation of the k-th subsequent DNN,                 
                    
                        
                            W
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    n
                                
                                
                                    1
                                
                            
                            ×
                            
                                
                                    n
                                
                                
                                    i
                                    -
                                    1
                                
                            
                        
                    
                
             represents a weight matrix of layer i of the k-th subsequent DNN,                 
                    
                        
                            U
                        
                        
                            i
                            j
                        
                        
                            (
                            k
                            )
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    n
                                
                                
                                    1
                                
                            
                            ×
                            
                                
                                    n
                                
                                
                                    j
                                
                            
                        
                    
                
             represents lateral connections from layer i of the k-th subsequent DNN to layer i-1 of the j-th subsequent DNN,                 
                    σ
                
             represents an element-wise … [function],                 
                    
                        
                            V
                        
                        
                            i
                            j
                        
                        
                            (
                            k
                            )
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    n
                                
                                
                                    1
                                    -
                                    1
                                
                            
                            ×
                            
                                
                                    n
                                
                                
                                    i
                                    -
                                    1
                                
                                
                                    (
                                    <
                                    k
                                    )
                                
                            
                        
                    
                
             represents a projection matrix and                 
                    
                        
                            α
                        
                        
                            i
                            -
                            1
                        
                        
                            (
                            <
                            k
                            )
                        
                    
                
            is a learned scalar. See Terekhov, top of p. 6, e.g. “The activation of the i-th neuron at the k-th level (with m(k) neurons) was computed as a weighted sum of the outputs of neurons from the previous layer:
                
                    
                        
                            x
                        
                        
                            i
                        
                        
                            (
                            k
                            )
                        
                    
                    =
                     
                    
                        
                            ∑
                            
                                j
                                =
                                1
                            
                            
                                
                                    
                                        m
                                    
                                    
                                        (
                                        k
                                        )
                                    
                                
                            
                        
                        
                            
                                
                                    w
                                
                                
                                    j
                                    i
                                
                                
                                    (
                                    k
                                    )
                                
                            
                            
                                
                                    y
                                
                                
                                    j
                                
                                
                                    (
                                    k
                                    -
                                    1
                                    )
                                
                            
                            +
                            
                                
                                    b
                                
                                
                                    i
                                
                                
                                    (
                                    k
                                    )
                                
                            
                        
                    
                
            
Note that this essentially provides a weight matrix as applied to previous layers as depicted in the networks of Figure 1. Also note the network depictions in Figs. 1(b-d), showing the lateral connections from previous layers. Also see Fig. 1(c) and related text in section 1.1, e.g. “Such a network, after being trained on a certain task,T1, has definite values of weights and biases. … The first layer block neurons receive projections from the input only.” Thus, the network is based upon learned values and as depicting in the figure, multiple projections are provided, which applies to a broad interpretation of a projection matrix. Terekhov does not expressly disclose                 
                    t
                    h
                    a
                    t
                     
                    σ
                     
                
            represents an element-wise non linearity. However, Lo teaches this. Lo teaches an element-wise non-linear sigma function (i.e. sigmoid) based on a summation and a bias. See Lo, Fig. 2 and associated text at col. 7, lines 10-18, e.g. “The activation function is a sigmoid function such as the hyperbolic tangent function.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Lo’s non-linear function with Terekhov’s activation functions in order to provide non-linear processing capability as suggested by Lo (see at least column 5, lines 48-51).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703.  The examiner can normally be reached on M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/James D. Rutten/Primary Examiner, Art Unit 2121