DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities:
In [0016], line 5,                                 
                                    
                                        
                                            I
                                        
                                        
                                            P
                                        
                                    
                                
                            ,                                 
                                    
                                        
                                            I
                                        
                                        
                                            Q
                                        
                                    
                                
                            are not defined.
In [0022], line 2, RL is not defined.
In [0033], line 3,                                 
                                    K
                                    L
                                    
                                        
                                            P
                                        
                                        
                                            X
                                            Y
                                        
                                    
                                    
                                        
                                            P
                                        
                                        
                                            X
                                        
                                    
                                    ⊗
                                    
                                        
                                            P
                                        
                                        
                                            Y
                                        
                                    
                                
                             is not a conventional expression of KL divergence between                                 
                                    
                                        
                                            P
                                        
                                        
                                            X
                                            Y
                                        
                                    
                                
                             and                                 
                                    
                                        
                                            P
                                        
                                        
                                            X
                                        
                                    
                                    ⊗
                                    
                                        
                                            P
                                        
                                        
                                            Y
                                        
                                    
                                
                            . It is also not consistent with the expression in [00111], equation (4). 
In [00124], [00163], “Eq. Error Reference source not found” is not meaningful.
In [00125], no definition for                                 
                                    
                                        
                                            E
                                        
                                        
                                            
                                                
                                                    P
                                                
                                                
                                                    X
                                                    Y
                                                
                                            
                                        
                                    
                                    
                                        
                                             
                                        
                                    
                                    .
                                
                            
In [00126], no definition for                                 
                                    
                                        
                                            E
                                        
                                        
                                            
                                                
                                                    P
                                                
                                                
                                                    X
                                                
                                            
                                            ⊗
                                            
                                                
                                                    P
                                                
                                                
                                                    Y
                                                
                                            
                                        
                                    
                                    
                                        
                                             
                                        
                                    
                                    .
                                
                            
In [00111] and [001130], two different expressions for a mutual information:                                 
                                    I
                                    
                                        
                                            X
                                            ,
                                            Y
                                        
                                    
                                    ,
                                     
                                    I
                                    
                                        
                                            X
                                            ;
                                            Y
                                        
                                    
                                
                            .
Appropriate correction is required.
Claim Objections
Claims 4 and 14 are objected to because of the following informalities:  not introducing the expended text version of term "MINE".  Appropriate correction is required.

The claims 4 and 14 are objected to because they include reference characters which are not enclosed within parentheses.  
Reference characters corresponding to elements recited in the detailed description of the drawings and used in conjunction with the recitation of the same element or group of elements in the claims should be enclosed within parentheses so as to avoid confusion with other numbers or characters which may appear in the claims.  See MPEP § 608.01(m).
Claims 4 and 14 recite the abbreviation "MINE" without its expanded text. The claims should be rewritten in such manner so as to recite "mutual information neural estimation (MINE)" in order to introduce the abbreviation along with its corresponding expanded text. Note the order of abbreviation with respect to its corresponding expanded text. Additionally, it is the abbreviation that which should be enclosed by parentheses.
Claims 4 and 14 are objected to because of the following informalities:                 
                    
                        
                            E
                        
                        
                            
                                
                                    P
                                
                                
                                    X
                                    Y
                                
                            
                        
                    
                    
                        
                             
                        
                    
                
             and                         
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                        
                                    
                                    ⊗
                                    
                                        
                                            P
                                        
                                        
                                            Y
                                        
                                    
                                
                            
                            
                                
                                     
                                
                            
                             
                        
                    are not defined.  Appropriate correction is required.
Claims 4 and 14 are objected to because of the following informalities:                  
                    
                        
                            I
                        
                        
                            ζ
                        
                        
                            P
                        
                    
                    (
                    X
                    ,
                     
                    Y
                    )
                
             is not defined.  Appropriate correction is required.
Claims 2 and 12 are objected to because of the following informalities:  Claims 2 and 12 recite the limitation             
                I
                
                    
                        X
                        ;
                        Y
                    
                
                =
                K
                L
                
                    
                        P
                    
                    
                        X
                        Y
                    
                
                
                    
                        P
                    
                    
                        X
                    
                
                ⊗
                
                    
                        P
                    
                    
                        Y
                    
                
            
         in line 4. It is also not consistent with specification (see [00111], equation (4)).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 11 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 11 and 20 recite the limitation "the data model Q's hidden features" in lines 7, 5 and 6 respectively .  There is insufficient antecedent basis for this limitation in the claim.
Claims 1, 11 and 20 recite the limitation "the l network" in line, 14-15, and 15-16 respectively. It Is not clear that "the l network" is "discriminator neural network" or a neural network including the discriminator neural network.
Claims 1, 11 and 20 recite the limitation "reward" in line, 15 and 16 respectively. There is no special definition of “reward” in specification. Specification does provide some examples of “reward”, but they are not in consistent. For examination purpose, examiner has interpreted it as anything that helps mutual information.
Claims 1, 11 and 20 recite the limitation  “            
                
                    
                        I
                    
                    
                         
                    
                    
                        P
                    
                
            
        " in lines 13, 11 and 12 respectively. It Is not clear “            
                
                    
                        I
                    
                    
                         
                    
                    
                        P
                    
                
            
        " is a mutual information under distribution             
                P
            
        . For examination purpose, examiner has interpreted it as a mutual information under a data distribution             
                P
            
        .
Claims 1, 11 and 20 recite the limitation  “            
                
                    
                        I
                    
                    
                        Θ
                        ,
                         
                        ω
                    
                    
                        Q
                    
                
            
        " in line 14 & 17,  12 &15, and 13 & 16 respectively. It Is not clear             
                
                    
                        I
                    
                    
                        Θ
                        ,
                         
                        ω
                    
                    
                        Q
                    
                
            
         is a mutual information or a neural network. For examination purpose, examiner has interpreted it as a mutual information.
Claims 1, 11 and 20 recite the limitation  “            
                
                    
                        I
                    
                    
                        Θ
                        ,
                         
                        ω
                    
                    
                        P
                    
                
            
        " in lines 16, 14, and 15 respectively. It Is not clear             
                
                    
                        I
                    
                    
                        Θ
                        ,
                         
                        ω
                    
                    
                        P
                    
                
            
         is a mutual information under distribution             
                P
            
        . For examination purpose, examiner has interpreted it as a mutual information under a data distribution             
                P
            
        .
Claims 1, 11 and 20 recite the limitation "the one or more mutual information parameters" in lines 16, 14 and 15 respectively.  There is insufficient antecedent basis for this limitation in the claim.
Claims 2 and 12 recite the limitation "the mutual information             
                I
                
                    
                        X
                        ;
                        Y
                    
                
            
        " in line 1.  There is insufficient antecedent basis for this limitation in the claim.
Claims 3-10 and 13-19 are also rejected under 35 U.S.C. 112(b) as being dependent upon a rejected base claim. 
Claims 5 and 15 recite the limitation " the parametrized test function" in line 3.  There is insufficient antecedent basis for this limitation in the claim.

The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 5-6 and 15-16 rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  Claims 5-6 and 15-16 recite claim limitations associated with                         
                            
                                
                                    T
                                
                                
                                    ζ
                                
                            
                            (
                            X
                            ,
                            Y
                            )
                        
                    . However,                         
                            
                                
                                    T
                                
                                
                                    ζ
                                
                            
                            (
                            X
                            ,
                            Y
                            )
                        
                     is not cited in claims 1-2 or 11-12, but in claims 4 and 14. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
For examination purpose, examiner considers claims 5-6 and 15-16 dependent upon claims 4 and 14 respectively.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 9-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Olabiyi et al (U.S PG-PUB NO. 20190244609 A1) in view of Belghazi et al (arXiv:1801.04062v1 2018).
-Regarding claim 1, Olabiyi discloses a computer implemented system for training a neural network representing data model Q, the system comprising (Abstract; FIGS 1-10): a computer processor (FIG. 2 device 201; [0053]) operating in conjunction with computer memory and a data storage (FIG. 2 memory 205, device 209; [0053]) maintaining one or more interconnected computing nodes having adaptive interconnections which represent the neural network (FIGS. 4-8; [0070]), the computer processor configured to (FIG. 2 device 201; [0053]): initialize the neural network by providing a discriminator neural network parametrized by θ (Abstract; FIG. 3B, 303C; FIGS. 4-5, 111B; FIG. 6, 605; FIGS. 8A-8B; [0092]; [0095]) for the data model Q's hidden features parametrized by ω (Abstract; FIG. 3A; FIG. 3B, 303B; FIGS. 4-5, 111A; FIG. 6, 603; FIG. 7; [0070]; [0079]-[0080]; [0087]), the discriminator neural network observing pairs of segments or sequence in an input data set (FIG. 4, 403; FIG. 6, 601, 605; [0079]); conduct a next token prediction training process of the data model Q (FIG. 4, 405; FIGS. 5-6; [0079], “generator G 603 is trained”; [0037]; [0072]; [0075]-[0076]), the next token prediction training process adapted for learning to classify a correct next token from a randomly sampled token until a switching condition is satisfied to provide parameters θ, ω ([0081]-[0084]; [0095], “training … end to end”; FIGS. 5, 10; [0098]-[0101]; [0032]-[0033]), the next token prediction training process establishing a lower bound of mutual information between sampled elements in the series of elements                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                        
                    ; establish a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    , in the model Q based on the parameters θ, ω ([0107], “variation lower bound … log-likelihood”); and train the neural network to optimize                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    P
                                
                            
                        
                     ([0033]; [0083], “cross-entropy”; equations (6)-(8)) and to use the one or more mutual information parameters of the neural network                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                     as a reward in the data model Q to optimize the mutual information in the model Q between two random variables                         
                            X
                             
                        
                    and                         
                            Y
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    Q
                                
                            
                            (
                            X
                            ,
                             
                            Y
                            )
                        
                     ([0033]; [0083], “cross-entropy”; equations (6)-(8)), the training causing updates to the adaptive interconnections of the one or more interconnected computing nodes of the neural network (FIGS. 5, 10; [0098]-[0101]).
Olabiyi does teach using maximum mutual information (MMI) criterion ([0033]) Olabiyi teaches a loss function related to cross-entropy ([0083], equation (7)) and combining maximum likelihood estimation ([0076]). A person skilled in the art would understand that a loss function related to cross-entropy is the same as a loss function related to mutual information (see reference “Kullback–Leibler divergence – Wikipedia”). Olabiyi is silent to teach establish a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    .
In the same field of endeavor, Belghazi teaches a neural estimator for mutual information based on KL-divergence and using the estimator to train adversarial models (Belghazi: Abstract; Page 2, 3rd paragraph; Page 6, Section 4.2, “GANs”; equations (1)-(20)). Belghazi teaches to establish a lower bound of mutual information between sampled elements in the series of elements                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                        
                     (Belghazi: equations (1), (11)-(12)); establish a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    , in the model Q based on the parameters θ, ω, (Belghazi: Theorem 1) and train the neural network to optimize                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    P
                                
                            
                        
                     (Belghazi: algorithm 1) and use the one or more mutual information parameters of the neural network                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                     as a reward (Belghazi: Page 7, 2nd paragraph, “encourage the generator to maximize the entropy … by modifying the GAN objective … a mutual information term”) in the data model Q to optimize the mutual information in the model Q between two random variables                         
                            X
                             
                        
                    and                         
                            Y
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    Q
                                
                            
                            (
                            X
                            ,
                             
                            Y
                            )
                        
                     (Belghazi: equation (20)).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Olabiyi with the teaching of Belghazi by using mutual information neural estimation in order to enhance the property of generative models in both unsupervised and supervised settings. 
-Regarding claim 11, Olabiyi discloses a computer implemented method for training a neural network representing data model Q (Abstract; FIGS 1-10) maintained one or more interconnected computing nodes having adaptive interconnections which represent the neural network (FIGS. 4-8; [0070]), the method comprising: initializing the neural network by providing a discriminator neural network parametrized by θ (Abstract; FIG. 3B, 303C; FIGS. 4-5, 111B; FIG. 6, 605; FIGS. 8A-8B; [0092]; [0095]) for the data model Q's hidden features parametrized by ω (Abstract; FIG. 3A; FIG. 3B, 303B; FIGS. 4-5, 111A; FIG. 6, 603; FIG. 7; [0070]; [0079]-[0080]; [0087]), the discriminator neural network observing pairs of segments or sequence in an input data set (FIG. 4, 403; FIG. 6, 601, 605; [0079]); conducting a next token prediction training process of the data model Q (FIG. 4, 405; FIGS. 5-6; [0079], “generator G 603 is trained”; [0037]; [0072]; [0075]-[0076]), the next token prediction training process adapted for learning to classify a correct next token from a randomly sampled token until a switching condition is satisfied to provide parameters θ, ω ([0081]-[0084]; [0095], “training … end to end”; FIGS. 5, 10; [0098]-[0101]; [0032]-[0033]), the next token prediction training process establishing a lower bound of mutual information between sampled elements in the series of elements                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                        
                    ; establishing a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    , in the model Q based on the parameters θ, ω ([0107], “variation lower bound … log-likelihood”); and training the neural network to optimize                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    P
                                
                            
                        
                     ([0033]; [0083], “cross-entropy”; equations (6)-(8)) and to use the one or more mutual information parameters of the neural network                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                     as a reward in the data model Q to optimize the mutual information in the model Q between two random variables                         
                            X
                             
                        
                    and                         
                            Y
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    Q
                                
                            
                            (
                            X
                            ,
                             
                            Y
                            )
                        
                     ([0033]; [0083], “cross-entropy”; equations (6)-(8)), the training causing updates to the adaptive interconnections of the one or more interconnected computing nodes of the neural network (FIGS. 5, 10; [0098]-[0101]).
Olabiyi does teach using maximum mutual information (MMI) criterion ([0033]) Olabiyi teaches a loss function related to cross-entropy ([0083], equation (7)) and combining maximum likelihood estimation ([0076]). A person skilled in the art would understand that a loss function related to cross-entropy is the same as a loss function related to mutual information (see reference “Kullback–Leibler divergence – Wikipedia”). Olabiyi is silent to teach establish a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    .
In the same field of endeavor, Belghazi teaches a neural estimator for mutual information based on KL-divergence and using the estimator to train adversarial models (Belghazi: Abstract; Page 2, 3rd paragraph; Page 6, Section 4.2, “GANs”; equations (1)-(20)). Belghazi teaches to establish a lower bound of mutual information between sampled elements in the series of elements                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                        
                     (Belghazi: equations (1), (11)-(12)); establish a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    , in the model Q based on the parameters θ, ω, (Belghazi: Theorem 1) and train the neural network to optimize                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    P
                                
                            
                        
                     (Belghazi: algorithm 1) and use the one or more mutual information parameters of the neural network                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                     as a reward (Belghazi: Page 7, 2nd paragraph, “encourage the generator to maximize the entropy … by modifying the GAN objective … a mutual information term”) in the data model Q to optimize the mutual information in the model Q between two random variables                         
                            X
                             
                        
                    and                         
                            Y
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    Q
                                
                            
                            (
                            X
                            ,
                             
                            Y
                            )
                        
                     (Belghazi: equation (20)).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Olabiyi with the teaching of Belghazi by using mutual information neural estimation in order to enhance the property of generative models in both unsupervised and supervised settings.
	-Regarding claims 2 and 12, the relation is either a definition of mutual information or a basic formulation of information theory (see references “Mutual information – Wikipedia” and Belghazi et al (arXiv:1801.04062v1 2018).
-Regarding claims 3 and 13, the relation is a basic formulation of information theory (see references “Mutual information – Wikipedia” and Belghazi et al (arXiv:1801.04062v1 2018).
-Regarding claims 4 and 14, Olabiyi in view of Belghazi discloses the system of claim 1 and the method of claim 11.
Olabiyi is silent to teach wherein IP is optimized using a MINE lower bound in accordance with a relation:
                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            ≥
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            :
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            =
                             
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                            Y
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            -
                            l
                            o
                            g
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                        
                                    
                                    ⊗
                                    
                                        
                                            P
                                        
                                        
                                            Y
                                        
                                    
                                
                            
                            (
                            
                                
                                    e
                                
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            )
                             
                        
                    ;
wherein                         
                            
                                
                                    T
                                
                                
                                    ζ
                                
                            
                            (
                            X
                            ,
                            Y
                            )
                        
                     is a parametrized test function adapted to distinguish samples of a joint distribution from those from a product of marginals (Belghazi: Theorem 1; equations (1), (11)-(12)).
In the same field of endeavor, Belghazi teaches a neural estimator for mutual information based on KL-divergence and using the estimator to train adversarial models (Belghazi: Abstract; Page 2, 3rd paragraph; Page 6, Section 4.2, “GANs”; equations (1)-(20)). Belghazi teaches wherein IP is optimized using a MINE lower bound in accordance with a relation:
                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            ≥
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            :
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            =
                             
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                            Y
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            -
                            l
                            o
                            g
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                        
                                    
                                    ⊗
                                    
                                        
                                            P
                                        
                                        
                                            Y
                                        
                                    
                                
                            
                            (
                            
                                
                                    e
                                
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            )
                             
                        
                    ;
wherein                         
                            
                                
                                    T
                                
                                
                                    ζ
                                
                            
                            (
                            X
                            ,
                            Y
                            )
                        
                     is a parametrized test function adapted to distinguish samples of a joint distribution from those from a product of marginals.
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Olabiyi with the teaching of Belghazi by using mutual information neural estimation in order to enhance the property of generative models in both unsupervised and supervised settings.
-Regarding claims 9 and 19, the combination further discloses wherein the trained neural network is utilized to receive new input data sets and to generate output data sets by processing the new input data sets through the adaptive interconnections of the one or more interconnected computing nodes of the neural network (Olabiyi: Abstract; FIGS. 1, 3-4, 6-8; [0107]; [0112]).
-Regarding claims 10, the combination further discloses wherein the new input data sets and the output data sets each include at least one of natural language text strings and structured query language (SQL) text tokens, and the output data sets are representative of a next token predicted based on a new input data set of the new input data sets (Olabiyi: Abstract; [0003]; [0032]; [0048], “user query”; [0078], “word tokens”; [0079], “generated tokens … output tokens”; FIGS. 3, 6).
-Regarding claim 20, Olabiyi discloses a non-transitory computer readable medium (FIG. 2 memory 205, device 209; [0053]), storing machine interpretable instructions, which when executed by a processor (FIG. 2 device 201; [0053]), cause the processor to perform a computer implemented method for training a neural network representing data model Q maintained on one or more interconnected computing nodes having adaptive interconnections which represent the neural network (FIGS. 4-8; [0070]), the method comprising (Abstract; FIGS 1-10): initializing the neural network by providing a discriminator neural network parametrized by θ (Abstract; FIG. 3B, 303C; FIGS. 4-5, 111B; FIG. 6, 605; FIGS. 8A-8B; [0092]; [0095]) for the data model Q's hidden features parametrized by ω (Abstract; FIG. 3A; FIG. 3B, 303B; FIGS. 4-5, 111A; FIG. 6, 603; FIG. 7; [0070]; [0079]-[0080]; [0087]), the discriminator neural network observing pairs of segments or sequence in an input data set (FIG. 4, 403; FIG. 6, 601, 605; [0079]); conducting a next token prediction training process of the data model Q (FIG. 4, 405; FIGS. 5-6; [0079], “generator G 603 is trained”; [0037]; [0072]; [0075]-[0076]), the next token prediction training process adapted for learning to classify a correct next token from a randomly sampled token until a switching condition is satisfied to provide parameters θ, ω ([0081]-[0084]; [0095], “training … end to end”; FIGS. 5, 10; [0098]-[0101]; [0032]-[0033]), the next token prediction training process establishing a lower bound of mutual information between sampled elements in the series of elements                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                        
                    ; establishing a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    , in the model Q based on the parameters θ, ω ([0107], “variation lower bound … log-likelihood”); and training the neural network to optimize                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    P
                                
                            
                        
                     ([0033]; [0083], “cross-entropy”; equations (6)-(8)) and to use the one or more mutual information parameters of the neural network                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                     as a reward in the data model Q to optimize the mutual information in the model Q between two random variables                         
                            X
                             
                        
                    and                         
                            Y
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    Q
                                
                            
                            (
                            X
                            ,
                             
                            Y
                            )
                        
                     ([0033]; [0083], “cross-entropy”; equations (6)-(8)), the training causing updates to the adaptive interconnections of the one or more interconnected computing nodes of the neural network (FIGS. 5, 10; [0098]-[0101]).
Olabiyi does teach using maximum mutual information (MMI) criterion ([0033]) Olabiyi teaches a loss function related to cross-entropy ([0083], equation (7)) and combining maximum likelihood estimation ([0076]). A person skilled in the art would understand that a loss function related to cross-entropy is the same as a loss function related to mutual information (see reference “Kullback–Leibler divergence – Wikipedia”). Olabiyi is silent to teach establish a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    .
In the same field of endeavor, Belghazi teaches a neural estimator for mutual information based on KL-divergence and using the estimator to train adversarial models (Belghazi: Abstract; Page 2, 3rd paragraph; Page 6, Section 4.2, “GANs”; equations (1)-(20)). Belghazi teaches to establish a lower bound of mutual information between sampled elements in the series of elements                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                        
                     (Belghazi: equations (1), (11)-(12)); establish a lower bound of mutual information                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                    , in the model Q based on the parameters θ, ω, (Belghazi: Theorem 1) and train the neural network to optimize                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    P
                                
                            
                        
                     (Belghazi: algorithm 1) and use the one or more mutual information parameters of the neural network                         
                            
                                
                                    I
                                
                                
                                    Θ
                                    ,
                                     
                                    ω
                                
                                
                                    Q
                                
                            
                        
                     as a reward (Belghazi: Page 7, 2nd paragraph, “encourage the generator to maximize the entropy … by modifying the GAN objective … a mutual information term”) in the data model Q to optimize the mutual information in the model Q between two random variables                         
                            X
                             
                        
                    and                         
                            Y
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    Q
                                
                            
                            (
                            X
                            ,
                             
                            Y
                            )
                        
                     (Belghazi: equation (20)).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Olabiyi with the teaching of Belghazi by using mutual information neural estimation in order to enhance the property of generative models in both unsupervised and supervised settings.
Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Olabiyi et al (U.S PG-PUB NO. 20190244609 A1) in view of Belghazi et al (arXiv:1801.04062v1 2018), and further in view of Hjelm et al (arXiv 1808.06670v1 2018).
-Regarding claims 5 and 15, Olabiyi in view of Belghazi discloses the system of claim 4 and the method of claim 14.
Olabiyi in view of Belghazi teaches intermediary hidden layer of the neural network (Olabiyi: FIG. 6, RNN 603). Olabiyi in view of Belghazi is silent to teach wherein the parametrized test function is provided in accordance with                         
                            
                                
                                    T
                                
                                
                                    ζ
                                
                            
                            
                                
                                    X
                                    ,
                                    Y
                                
                            
                            =
                            
                                
                                    T
                                
                                
                                    θ
                                    ,
                                    ω
                                
                            
                            
                                
                                    X
                                    ,
                                    Y
                                
                            
                        
                    :                         
                            
                                
                                    T
                                
                                
                                    θ
                                    ,
                                    ω
                                
                            
                            
                                
                                    X
                                    ,
                                    Y
                                
                            
                            =
                             
                            
                                
                                    D
                                
                                
                                    θ
                                
                            
                            (
                            
                                
                                    ∅
                                
                                
                                    ω
                                
                            
                            
                                
                                    X
                                
                            
                            ,
                             
                             
                            
                                
                                    ∅
                                
                                
                                    ω
                                
                            
                            
                                
                                    Y
                                
                            
                            )
                        
                    .
However, Hjelm is an analogous art pertinent to the problem to be solved in this application and further discloses an intermediary hidden layer representation                         
                            
                                
                                    ∅
                                
                                
                                    ω
                                
                            
                            (
                            .
                            )
                        
                     of the neural network with a discriminator                         
                            
                                
                                    D
                                
                                
                                    θ
                                
                            
                            :
                             
                            Ф
                            →
                            R
                        
                    ; and wherein the parametrized test function is provided in accordance with                         
                            
                                
                                    T
                                
                                
                                    ζ
                                
                            
                            
                                
                                    X
                                    ,
                                    Y
                                
                            
                            =
                            
                                
                                    T
                                
                                
                                    θ
                                    ,
                                    ω
                                
                            
                            
                                
                                    X
                                    ,
                                    Y
                                
                            
                        
                    :                         
                            
                                
                                    T
                                
                                
                                    θ
                                    ,
                                    ω
                                
                            
                            
                                
                                    X
                                    ,
                                    Y
                                
                            
                            =
                             
                            
                                
                                    D
                                
                                
                                    θ
                                
                            
                            (
                            
                                
                                    ∅
                                
                                
                                    ω
                                
                            
                            
                                
                                    X
                                
                            
                            ,
                             
                             
                            
                                
                                    ∅
                                
                                
                                    ω
                                
                            
                            
                                
                                    Y
                                
                            
                            )
                        
                     (Hjelm: Page 6, section 3.3, 3rd paragraph).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Olabiyi in view of Belghazi with the teaching of Hjelm by explicitly maximizing mutual information between input data and a high-level representations in order to provide a flexible formulations of representation-learning objectives catered towards specific end-goals.
-Regarding claim 6 and 16, Olabiyi in view of Belghazi discloses the system of claim 4 and the method of claim 14.
Olabiyi in view of Belghazi is silent to teach wherein the relation                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            ≥
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            :
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            =
                             
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                            Y
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            -
                            l
                            o
                            g
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                        
                                    
                                    ⊗
                                    
                                        
                                            P
                                        
                                        
                                            Y
                                        
                                    
                                
                            
                            (
                            
                                
                                    e
                                
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            )
                        
                     is  optimized using noise contrastive estimation to turning convert the relation into a binary classification problem.
However, Hjelm is an analogous art pertinent to the problem to be solved in this application and further discloses wherein the relation                         
                            
                                
                                    I
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            ≥
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            :
                             
                            
                                
                                    I
                                
                                
                                    ζ
                                
                                
                                    P
                                
                            
                            
                                
                                    X
                                    ,
                                     
                                    Y
                                
                            
                            =
                             
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                            Y
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            -
                            l
                            o
                            g
                            
                                
                                    E
                                
                                
                                    
                                        
                                            P
                                        
                                        
                                            X
                                        
                                    
                                    ⊗
                                    
                                        
                                            P
                                        
                                        
                                            Y
                                        
                                    
                                
                            
                            (
                            
                                
                                    e
                                
                                
                                    
                                        
                                            T
                                        
                                        
                                            ζ
                                        
                                        
                                            P
                                        
                                    
                                    
                                        
                                            X
                                            ,
                                             
                                            Y
                                        
                                    
                                
                            
                            )
                        
                     is  optimized using noise contrastive estimation to turning convert the relation into a binary classification problem (Hjelm: Page 6, section 3.3).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Olabiyi in view of Belghazi with the teaching of Hjelm by explicitly maximizing mutual information between input data and a high-level representations in order to provide a flexible formulations of representation-learning objectives catered towards specific end-goals.
Claims 7-8 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Olabiyi et al (U.S PG-PUB NO. 20190244609 A1) in view of Belghazi et al (arXiv:1801.04062v1 2018), and further in view of Norouzi et al (NIPS 2016).
-Regarding claims 7 and 17, Olabiyi in view of Belghazi discloses the system of claim 1 and the method of claim 11.
Olabiyi in view of Belghazi is silent to teach wherein the one or more mutual information parameters of the neural network                 
                    
                        
                            I
                        
                        
                            Θ
                            ,
                             
                            ω
                        
                        
                            Q
                        
                    
                
             are directly optimized using a reward augmented maximum likelihood approach (RAML) whereby a reverse direction of KL divergence is optimized compared to an entropy-regularized policy gradient RL objective.
However, Norouzi is an analogous art pertinent to the problem to be solved in this application and further discloses wherein the one or more mutual information parameters of the neural network                 
                    
                        
                            I
                        
                        
                            Θ
                            ,
                             
                            ω
                        
                        
                            Q
                        
                    
                
             are directly optimized using a reward augmented maximum likelihood approach (RAML) whereby a reverse direction of KL divergence is optimized compared to an entropy-regularized policy gradient RL objective (Norouzi: Abstract; Page 2, 4th paragraph, “entropy regularized expected reward”; Page 4, 1st paragraph, “optimize a KL divergence in opposite directions”,  section 2.1; equations (3), (7)-(10)).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Olabiyi in view of Belghazi with the teaching of Norouzi by using a reward augmented maximum likelihood approach (RAML) in order to achieve direct optimization of a task reward metric with computationally efficient and simple implementation. 
-Regarding claims 8 and 18, Olabiyi in view of Belghazi discloses the system of claim 7 and the method of claim 17.
Olabiyi in view of Belghazi is silent to teach wherein the reward augmented maximum likelihood approach includes utilizes an importance sampling approach whereby a geometric distribution based at the index of Y* as a proposal distribution is used, where Y* is a token following X in a corpus of data.
However, Norouzi is an analogous art pertinent to the problem to be solved in this application and further discloses wherein the reward augmented maximum likelihood approach includes utilizes an importance sampling approach whereby a geometric distribution based at the index of Y* as a proposal distribution is used, where Y* is a token following X in a corpus of data (Norouzi: Page 4, section 2.2; Page 2, 4th paragraph).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Olabiyi in view of Belghazi with the teaching of Norouzi by using a reward augmented maximum likelihood approach (RAML) in order to achieve direct optimization of a task reward metric with computationally efficient and simple implementation.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571) 272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIAO LIU/Examiner, Art Unit 2664                                                                                                                                                                                                        /NANCY BITAR/Primary Examiner, Art Unit 2664