Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Response to Arguments
The amendment filed 7/22/2022 has been entered. Claims 1, 6-8, 12-13, and 17-19 remain pending in the application. Applicant’s amendments to the specification and claims overcome each and every objection set forth in the previous action. Applicant’s amendments overcome each and every 35 U.S.C. 112 and 35 U.S.C. 101 rejection provided. 
Regarding the prior art rejections, Examiner notes that Applicant did not provide arguments or amend the inventive entity to overcome the reference “Estimation Method Of Meteorological Sensitive Load Power Based On Correlation Analysis And Stacked Auto-Encoder” (Chen). Rather, Applicant amended to include dependent claims that were not specifically taught by Chen. Given Chen is good art, the backup rejections using the reference “Load Forecasting using Deep Neural Networks” (Hosein) is no longer necessary.
In the Response, Applicant argues that limitations of former dependent claim 5, incorporated into claim 1 as amended, are not taught by Goodfellow, (see Response filed 7/22/2022, [pages 11-13]). Examiner respectfully disagrees. Examiner interprets the limitation as applying the mean squared error (MSE) algorithm using a norm operator. To distinguish the mean squared error algorithm taught by Goodman, Applicant uses Wikipedia rather than the definition of MSE provided in Goodfellow (see eq’s. (5.4)-(5.7) of Goodfellow, [page 106]). Examiner would like to emphasize that although Wikipedia provides a definition, under a broadest reasonable interpretation standard (see MPEP 2111), arguments regarding Wikipedia definitions do nothing to disparage the definition provided by Goodfellow.
The definition of Goodfellow is provided in pages 106-107. To minimize the distance between             
                
                    
                        
                            
                                y
                            
                            ^
                        
                    
                    
                        (
                        t
                        e
                        s
                        t
                        )
                    
                
            
         and             
                
                    
                        y
                    
                    
                        (
                        t
                        e
                        s
                        t
                        )
                    
                
            
        , the function is using the L2-norm. Examiner thus responds to Applicants first argument that Goodfellow does not teach minimizing using the norm operator by stating that Goodfellow does teach minimizing on the L2-norm using mean squared error MSE in eq’s (5.4)-(5.12), and the accompanying explanation “To make a machine learning algorithm, we need to design an algorithm that improves the weights w in a way that reduces MSE_test… one intuitive way is just to minimize the mean squared error on the training set”, (Goodfellow [page 106 paragraph 5]). Second, the inclusion of the constant ½ is sometimes included for purposes of making the derivative equation have a coefficient of 1 rather than 1/2: (i.e. the derivative with respect to x of x^2/2) = x), but multiplying any formula by any positive constant doesn’t change the minimum as described by Goodfellow. Here is the derivation using Applicant’s limitation:
            
                
                    
                        θ
                    
                    
                        *
                    
                
                =
                a
                r
                g
                m
                i
                n
                 
                
                    
                        1
                    
                    
                        2
                        N
                    
                
                
                    
                        
                            
                                
                                    
                                        ∑
                                        
                                            i
                                            =
                                            1
                                        
                                        
                                            N
                                        
                                    
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                x
                                                            
                                                            ^
                                                        
                                                    
                                                    
                                                        i
                                                    
                                                
                                                -
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                    
                        2
                    
                
            
        
Take the derivative and solve for 0:
            
                0
                =
                 
                
                    
                        1
                    
                    
                        N
                    
                
                
                    
                        
                            
                                ∑
                                
                                    i
                                    =
                                    1
                                
                                
                                    N
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    ^
                                                
                                            
                                            
                                                i
                                            
                                        
                                        -
                                        
                                            
                                                x
                                            
                                            
                                                i
                                            
                                        
                                    
                                
                            
                        
                    
                
            
        
Multiply both sides by the constant N or 2N if the 2 is not included where both 2N*0 = 0 and N*0  = 0:
            
                0
                =
                 
                
                    
                        
                            
                                ∑
                                
                                    i
                                    =
                                    1
                                
                                
                                    N
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    ^
                                                
                                            
                                            
                                                i
                                            
                                        
                                        -
                                        
                                            
                                                x
                                            
                                            
                                                i
                                            
                                        
                                    
                                
                            
                        
                    
                
            
        
The above equation gives the roots of argmin function, which is used to find the minimum or local minimum. See the explanation in Goodfellow as to why to take the derivative: (eq. (5.4)-eq. (5.7) of Goodfellow, [page 106]; and see the algebra step from eq. (5.11) to (5.12) showing Goodfellow canceling out the two by dividing by two on both sides, [page 107]; this is equivalent to multiplying the original equation by 1/2). 
Third, the summation operator is actually already included in the norm operator as described by Goodfellow. In Goodfellow, notice the summation gets removed after inclusion of the norm operator from equations (5.4) to (5.5). Examiner has updated the citation of Goodfellow and excerpts from the Goodfellow reference to include an explanation of norm operators (Goodfellow [pages 37-38]). As can be seen from equation 2.31, the norm operator as described by Goodfellow includes a summation, (Goodfellow [page 38]). On page 37, Goodfellow further teaches:
The L^2 norm is used so frequently in machine learning that it is often denoted simply as ||x||, with the subscript 2 omitted. It is also common to measure the size of a vector using the squared L^2 norm, which can be calculated simply as x^T * x.
(Goodfellow, [page 37 paragraph 4 lines 1-5]). This explanation provides detail of the step between eq. (5.8) and eq. (5.10), (Goodfellow, [page 107]). To be clear, the formula of former claim 5 is shorthand, and Goodman teaches the calculations in equations (5.7)-(5.12) to execute that formula. Goodfellow also teaches various norm operations that require a summation, but former claim 5 does not include any details of what norm operation is being used, and instead relies on shorthand. Goodfellow explains this shorthand is often used for the L2-norm, (Goodfellow [pages 37-38]).
In general, the argument that the inventors formulas are directed to a novel definition of mean squared error as indicated by Applicant using words like “traditional” and “usually” is not supported by the claims or specification (see Response filed 7/22/2022, [page 11 paragraph 4 line 1]-[page 12 paragraph 4 line 4]). Therefore, Applicant's arguments filed 7/22/2022 with respect to the prior art rejections have been fully considered but they are not persuasive.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 6-7, 12-13, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over  “Estimation Method Of Meteorological Sensitive Load Power Based On Correlation Analysis And Stacked Auto-Encoder” (Chen) in view of “Deep Learning” (Goodfellow)
Note, although the inventive entity seems similar, JIN Yuqing is given as an author in the above  art reference, and is NOT listed as an inventor of the instant application. Under MPEP 2153.01(a), the exception provided by 35 U.S.C. 102(b)(1)(A) is not applicable. 
If, however, the application names fewer joint inventors than a publication (e.g., the application names as joint inventors A and B, and the publication names as authors A, B and C), it would not be readily apparent from the publication that it is by the inventor (i.e., the inventive entity) or a joint inventor and the publication would be treated as prior art under AIA  35 U.S.C. 102(a)(1).
(See MPEP 2153.01(a)). An affidavit under 37 CFR 1.130(a) may be filed or a change of inventorship to overcome this rejection. See MPEP 717:
Where the authorship of the prior art disclosure includes the inventor or a joint inventor named in the application, an "unequivocal" statement from the inventor or a joint inventor that he/she (or some specific combination of named joint inventors) invented the subject matter of the disclosure, accompanied by a reasonable explanation of the presence of additional authors, may be acceptable in the absence of evidence to the contrary. 
(See MPEP 717.01(a)(1)).


With respect to claim 1, Chen teaches A method for estimating a meteorology sensitive load power, comprising ([Title]): obtaining a meteorology sensitive load power estimation model (establish a load-meteorology nonlinear correlation model, [Abstract] line 4), inputting a daily load curve of a date to be estimated to meteorology sensitive load power estimation model (input daily load curve of single day, [page 17 paragraph 1 lines 3-6]) and extracting a daily load curve dimension reduction feature of the date to be estimated (extract the dimension reduction characteristics of daily load curve, [Abstract] lines 6-7); and outputting the meteorology sensitive load power based on the daily load curve dimension reduction feature of the date to be estimated (output dimension of SAE is input dimension of fully connected layers, and the output is the daily weather-sensitive load power of 144 points, [page 17 paragraph 1 lines 10-13]) and mapping relationships from the daily load curve dimension reduction features onto meteorology sensitive load powers (forming a mapping from the deep features of daily load curve extracted by the SAE to the weather-sensitive load power curve, [page 17 paragraph 1 lines 14-15]), wherein obtaining the meteorology sensitive load power estimation model comprises: obtaining the meteorology sensitive load power estimation model by training, and testing the meteorology sensitive load power estimation model (training and testing, [page 17 paragraph 1 line 10]); wherein the meteorology sensitive load power estimation model comprises a stacked auto-encoder (SAE) model and a fully-connected layer; wherein obtaining the meteorology sensitive load power estimation model by training comprises: training the SAE model and the fully-connected layer; wherein inputting the daily load curve of the date to be estimated to the meteorology sensitive load power estimation model and extracting the daily load curve dimension reduction feature of the date to be estimated comprises: inputting the daily load curve of the date to be estimated to the SAE model and extracting the daily load curve dimension reduction feature of the date to be estimated; and wherein outputting the meteorology sensitive load power based on the daily load curve dimension reduction feature of the date to be estimated and the mapping relationships from the daily load curve dimension reduction features onto the meteorology sensitive load powers comprises: outputting, by the fully-connected layer, the meteorology sensitive load power based on the daily load curve dimension reduction feature of the date to be estimated extracted by the SAE15SGE-001 US (EX190662CPE-US) English application of 201810606900.9model and the mapping relationships from the daily load curve dimension reduction features onto the meteorology sensitive load powers (see FIG. 2, [page 18]); wherein training the SAE model comprises: taking a historical data sample as input and output labels of the SAE model to train a first AE of the SAE model; taking output of an encoding layer of the first AE as an input label to train a next AE of the SAE model until all AEs of the SAE model have been trained; wherein a target function for the training is that a relative mean absolute percentage error (MAPE) of an output of the SAE model with respect to a daily load curve of a corresponding historical data sample is the minimum,
                
                    M
                    A
                    P
                    E
                    =
                     
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                n
                            
                        
                        
                            
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                
                                            
                                            -
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                
                                                
                                                    '
                                                
                                            
                                        
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            ∙
                            
                                
                                    100
                                
                                
                                    n
                                
                            
                        
                    
                    ,
                
            
where                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is an actual daily load power,                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                     is the output of the SAE model, and n is the total number of sample points ([page 19 paragraph 3 lines 1-11]).
Chen does not specifically teach wherein training the first AE of the SAE model satisfies the following formula:
                
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            f
                        
                    
                    
                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            +
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                    
                    ,
                
            
where                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is an input of the first AE of the SAE model,                         
                            h
                            
                                
                                    (
                                    1
                                    )
                                
                                
                                    i
                                
                            
                        
                     is the output of the encoding layer of the first AE,                         
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                     are respectively a weight matrix and a bias matrix, and                         
                            
                                
                                    s
                                
                                
                                    f
                                
                            
                        
                     is an activate function;
                
                    
                        
                            
                                
                                    x
                                
                                ^
                            
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            g
                        
                    
                    
                        
                            W
                        
                        
                            1
                        
                        
                            '
                        
                    
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    +
                    
                        
                            b
                        
                        
                            1
                        
                        
                            '
                        
                    
                    ,
                
            
where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    i
                                
                            
                        
                     is an output of the decoding layer of the first AE of the SAE model,                         
                            
                                
                                    W
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     are respectively a weight matrix and a bias matrix in reconstruction, and                         
                            
                                
                                    s
                                
                                
                                    g
                                
                            
                        
                     is an activate function;
                
                    
                        
                            θ
                        
                        
                            *
                        
                    
                    =
                    a
                    r
                    g
                    m
                    i
                    n
                    
                        
                            1
                        
                        
                            2
                            N
                        
                    
                    (
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                N
                            
                        
                        
                            
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            i
                                        
                                    
                                    -
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    )
                                
                                
                                    2
                                
                            
                        
                    
                    ,
                
            
where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     have a minimum mean squared error,                         
                            
                                
                                    θ
                                
                                
                                    *
                                
                            
                        
                    is an optimal fully-connected layer parameter of the encoding layer and decoding layer of the first AE, and N is a number of historical data samples
However, Goodfellow teaches wherein training the first AE of the SAE model satisfies the following formula:
                
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            f
                        
                    
                    
                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            +
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                    
                    ,
                
            
where                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is an input of the first AE of the SAE model,                         
                            h
                            
                                
                                    (
                                    1
                                    )
                                
                                
                                    i
                                
                            
                        
                     is the output of the encoding layer of the first AE,                         
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                     are respectively a weight matrix and a bias matrix, and                         
                            
                                
                                    s
                                
                                
                                    f
                                
                            
                        
                     is an activate function (see equation h = g(x^tW+c), where g is the activation function, [page 171 paragraph 1 lines 3-4]);
                
                    
                        
                            
                                
                                    x
                                
                                ^
                            
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            g
                        
                    
                    
                        
                            W
                        
                        
                            1
                        
                        
                            '
                        
                    
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    +
                    
                        
                            b
                        
                        
                            1
                        
                        
                            '
                        
                    
                    ,
                
            
where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    i
                                
                            
                        
                     is an output of the decoding layer of the first AE of the SAE model,                         
                            
                                
                                    W
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     are respectively a weight matrix and a bias matrix in reconstruction, and                         
                            
                                
                                    s
                                
                                
                                    g
                                
                            
                        
                     is an activate function (see definition of an autoencoder, where h(x) is as defined above, and g(h) is the reconstruction such that g(f(x)) = x, [page 499 paragraph 1 lines 1-6]; and FIG. 14.1, [page 500]);
                
                    
                        
                            θ
                        
                        
                            *
                        
                    
                    =
                    a
                    r
                    g
                    m
                    i
                    n
                    
                        
                            1
                        
                        
                            2
                            N
                        
                    
                    (
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                N
                            
                        
                        
                            
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            i
                                        
                                    
                                    -
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    )
                                
                                
                                    2
                                
                            
                        
                    
                    ,
                
            
where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     have a minimum mean squared error,                         
                            
                                
                                    θ
                                
                                
                                    *
                                
                            
                        
                    is an optimal fully-connected layer parameter of the encoding layer and decoding layer of the first AE, and N is a number of historical data samples (learning process is executed by minimizing the loss function on the mean squared error, [page 500 paragraph 5 lines 1-4]; minimizing mean squared error of the norm solved in eq’s (5.4)-(5.12), with removing the 2 constant taught between eq. (5.11) and (5.12), [pages 106-107]; description of norm including summation operator, [pages 37-38]).
It would have been obvious to one skilled in the art before the effective filing date to combine Chen with Goodfellow because this is applying a known technique (solving for the min mean squared error) to a known method and device (Chen) ready for improvement to yield predictable results. Chen is the base reference that teaches all limitations except for the exact function for minimizing the mean squared error as an objective function. Chen is ready for improvement because it teaches some objective functions but not mean squared error MSE. Goodfellow teaches a known technique of minimizing the mean squared error to train a machine learning function, (Goodfellow [page 106 paragraph 5 lines 1-5]), and how to solve said function by hand, (see equations 5.6-5.12, Goodfellow [pages 106-107]). One having ordinary skill in the art would have recognized that applying the known technique in Goodfellow of minimizing the MSE function would yield the predictable result of allowing one to solve for the weights and biases of Chen. Therefore, it would have been obvious to combine Chen with Goodfellow to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103.

With respect to claim 6, Chen teaches all of the limitations of claim 1, as noted above. Chen further teaches wherein training the fully-connected layer comprises (train the full connection layers, [Abstract] line 8): taking the daily load curve dimension reduction feature of a historical data sample (historical load data, [Abstract] line 5) as an input label of the fully connected layer (extract the dimension reduction characteristics of daily load curve, and the calculative results of the correlation model is used as a label sample to train, [Abstract] lines 6-8), and a meteorology sensitive load power curve as an output label of the fully connected layer (the meteorological sensitive load power curve can be obtained, [Abstract] lines 8-9), wherein a corresponding date of the daily load curve dimension reduction feature of the historical data sample is same as that of the meteorology sensitive load power curve (obtained directly from the daily load curve, [Abstract] line 9).
Chen does not teach 
                
                    
                        
                            θ
                        
                        
                            '
                            *
                        
                    
                    =
                    a
                    r
                    g
                    m
                    i
                    n
                    
                        
                            1
                        
                        
                            2
                            N
                            '
                        
                    
                    (
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                N
                                '
                            
                        
                        
                            
                                
                                    
                                        
                                            O
                                        
                                        
                                            i
                                        
                                    
                                    -
                                    
                                        
                                            P
                                        
                                        
                                            W
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    )
                                
                                
                                    2
                                
                            
                        
                    
                    ,
                
            
where                         
                            
                                
                                    O
                                
                                
                                    i
                                
                            
                        
                     is output of a last fully connected layer of an ith sample,                         
                            
                                
                                    P
                                
                                
                                    W
                                
                                
                                    i
                                
                            
                        
                     is a meteorology sensitive load power of the ith sample, and N' is a number of dates of fully connected layer training samples.	
	However, Goodfellow teaches                         
                            
                                
                                    θ
                                
                                
                                    '
                                    *
                                
                            
                            =
                            a
                            r
                            g
                            m
                            i
                            n
                            
                                
                                    1
                                
                                
                                    2
                                    N
                                    '
                                
                            
                            (
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        N
                                        '
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    O
                                                
                                                
                                                    i
                                                
                                            
                                            -
                                            
                                                
                                                    P
                                                
                                                
                                                    W
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    
                                        
                                            )
                                        
                                        
                                            2
                                        
                                    
                                
                            
                            ,
                        
                    
where                         
                            
                                
                                    O
                                
                                
                                    i
                                
                            
                        
                     is output of a last fully connected layer of an ith sample,                         
                            
                                
                                    P
                                
                                
                                    W
                                
                                
                                    i
                                
                            
                        
                     is a meteorology sensitive load power of the ith sample, and N' is a number of dates of fully connected layer training samples (minimize the mean square error, i.e. equation 5.4, on the training set, [page 106 paragraph 5 line 5]; note that multiplying by the constant ½ does not change the parameters                         
                            
                                
                                    θ
                                
                                
                                    *
                                
                            
                        
                     at which the right hand side of the equation would be a minimum).
It would have been obvious to one skilled in the art before the effective filing date to combine Chen with Goodfellow because this is applying a known technique (solving for the min mean squared error) to a known method and device (Chen) ready for improvement to yield predictable results. Chen is the base reference that teaches all limitations except for the exact function for the mean squared error. Chen teaches using a MAPE objective function, (Chen [page 2 paragraph 2 lines 6-7]). Chen is ready for improvement because it does not teach how to mean squared error objective function. Goodfellow teaches a known technique of minimizing the mean squared error to train a machine learning function, (Goodfellow [page 106 paragraph 5 lines 1-5]), and how to solve said function by hand, (see equations 5.6-5.12, Goodfellow [pages 106-107]). One having ordinary skill in the art would have recognized that applying the known technique in Goodfellow of minimizing the MSE function would yield the predictable result of allowing one to solve for the weights and biases of the fully connected layer. Therefore, it would have been obvious to combine Chen with Goodfellow to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103.

With respect to claim 7, Chen in view of Goodfellow teaches all of the limitations of claim 6, as noted above. Chen further teaches wherein a computation formula of the fully connected layer satisfies
                
                    O
                    =
                    R
                    
                        
                            W
                            I
                            +
                            b
                        
                    
                    ;
                
            
where                         
                            I
                        
                     and                         
                            O
                        
                     are respectively an input vector and an output vector of the fully connected layer,                         
                            W
                        
                     and                         
                            b
                        
                     are respectively a weight matrix and a bias matrix of the fully connected layer, and R is an activate function of the fully connected layer (equation 14, [page 18]).


With respect to claim 12, Chen teaches A method for estimating a meteorology sensitive load power based on a stacked auto-encoder, the method comprising ([Abstract]): adding a multilayer fully-connected layer in an output end of a stacked auto-encoder (SAE) (fully connected layers added, [page 16 paragraph 3 line 4-5]), and establishing a meteorology sensitive load power estimation model based on the SAE (An estimation model of meteorological sensitive load based on SAE is established, [Abstract] lines 5-6); extracting a daily load curve dimension reduction feature by using an unsupervised training method of the SAE (unsupervised learning ability of SAE utilized to extract dimension reduction characteristics, [page 5 paragraph 1 lines 7-8]), using a meteorology sensitive load power curve as a labeled sample to train the fully-connected layer ([page 16 paragraph 3 lines 6-7]), to form mapping relationships from daily load curve dimension reduction features onto meteorology sensitive load powers at the fully-connected layer (forming a mapping from deep features to weather-sensitive load power curve, [page 17 paragraph 1  lines 14-15]); wherein an unsupervised training method of the SAE is as follows: training the SAE by using a historical daily load curve data sample as input and output labels of the SAE (daily load curves from April to October, [page 19 paragraph 3 lines 1-2]), and reserving h(1)i, using h(1)i as input and output labels of a next AE, continuing to train the next AE in the above manner, with input of the next AE being h(1)i and so on, where the final SAE is stacked by a plurality of AEs (FIG. 2, [page 18]).
Chen does not teach wherein a forward propagation computation formula of the SAE is as follows: input of a first layer of the SAE being X, calculating output of an encoding layer of a first AE:
                
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            f
                        
                    
                    
                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            +
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                    
                    ,
                
            
                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                     are respectively a weight matrix and a bias matrix, and                         
                            
                                
                                    s
                                
                                
                                    f
                                
                            
                        
                     is an activate function (see equation h = g(x^tW+c), where g is the activation function, [page 171 paragraph 1 lines 3-4]); outputting by the encoding layer of the SE, and reconstructing an input vector through a decoding layer according to the following formula:
                
                    
                        
                            
                                
                                    x
                                
                                ^
                            
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            g
                        
                    
                    
                        
                            W
                        
                        
                            1
                        
                        
                            '
                        
                    
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    +
                    
                        
                            b
                        
                        
                            1
                        
                        
                            '
                        
                    
                    ,
                
            
where                         
                            
                                
                                    W
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     are respectively a weight matrix and a bias matrix in reconstruction, and                        
                             
                            
                                
                                    s
                                
                                
                                    g
                                
                            
                        
                     is an activate function in reconstruction, and h(1)i is the output of the encoding layer of the first AE; calculating an optimal fully-connected-layer parameter θ* of the encoding layer and decoding layer of the AE  according to the following formula:

    PNG
    media_image1.png
    53
    295
    media_image1.png
    Greyscale

wherein N is a number of training samples.
However, Goodfellow teaches wherein a forward propagation computation formula of the SAE is as follows: input of a first layer of the SAE being X, calculating output of an encoding layer of a first AE:
                
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            f
                        
                    
                    
                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            +
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                    
                    ,
                
            
                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                            
                        
                     are respectively a weight matrix and a bias matrix, and                         
                            
                                
                                    s
                                
                                
                                    f
                                
                            
                        
                     is an activate function (see equation 1 [page 2 col 1]); outputting by the encoding layer of the SE, and reconstructing an input vector through a decoding layer according to the following formula:
                
                    
                        
                            
                                
                                    x
                                
                                ^
                            
                        
                        
                            i
                        
                    
                    =
                    
                        
                            s
                        
                        
                            g
                        
                    
                    
                        
                            W
                        
                        
                            1
                        
                        
                            '
                        
                    
                    h
                    
                        
                            (
                            1
                            )
                        
                        
                            i
                        
                    
                    +
                    
                        
                            b
                        
                        
                            1
                        
                        
                            '
                        
                    
                    ,
                
            
where                         
                            
                                
                                    W
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     and                         
                            
                                
                                    b
                                
                                
                                    1
                                
                                
                                    '
                                
                            
                        
                     are respectively a weight matrix and a bias matrix in reconstruction, and                        
                             
                            
                                
                                    s
                                
                                
                                    g
                                
                            
                        
                     is an activate function in reconstruction, and                         
                            h
                            
                                
                                    (
                                    1
                                    )
                                
                                
                                    i
                                
                            
                        
                     is the output of the encoding layer of the first AE (see equation h = g(x^tW+c), where g is the activation function, [page 171 paragraph 1 lines 3-4]); and calculating an optimal fully-connected-layer parameter θ* of the encoding layer and the decoding layer of the AE  (see definition of an autoencoder, where trained using reconstruction such that g(f(x)) = x, [page 499 paragraph 1 lines 1-6]; and FIG. 14.1, [page 500]) according to the following formula:

    PNG
    media_image1.png
    53
    295
    media_image1.png
    Greyscale

wherein N is a number of training samples (minimize the mean square error, i.e. equation 5.4, on the training set, [page 106 paragraph 5 line 5]; minimizing mean squared error of the norm solved in eq’s (5.4)-(5.12), with removing the 2 constant taught between eq. (5.11) and (5.12), [pages 106-107]; description of norm including summation operator, [pages 37-38]).
It would have been obvious to one skilled in the art before the effective filing date to combine Chen with Goodfellow because this is applying a known technique (solving for the min mean squared error) to a known method and device (Chen) ready for improvement to yield predictable results. Chen is the base reference that teaches all limitations except for the exact function for minimizing the mean squared error as an objective function. Chen is ready for improvement because it teaches some objective functions but not mean squared error MSE. Goodfellow teaches a known technique of minimizing the mean squared error to train a machine learning function, (Goodfellow [page 106 paragraph 5 lines 1-5]), and how to solve said function by hand, (see equations 5.6-5.12, Goodfellow [pages 106-107]). One having ordinary skill in the art would have recognized that applying the known technique in Goodfellow of minimizing the MSE function would yield the predictable result of allowing one to solve for the weights and biases of Chen. Therefore, it would have been obvious to combine Chen with Goodfellow to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103.

With respect to claim 13, Chen teaches all of the limitations of claim 12, as noted above. Chen further teaches wherein input of the estimation model is a daily load curve, a number of input dimensions is a number of sample points of the daily load curve ([page 17 paragraph 1 lines 3-4]; output of the estimation model is the meteorology sensitive load power ([page 17 paragraph 1 lines 12-13]), and a number of output dimensions is the number of sample points of the daily load curve (input 144 points, [page 17 paragraph 1 lines 4-5]; and output 144 points, [page 17 paragraph 1 line 13]).

With respect to claim 17, Chen teaches all of the limitations of claim 12, as noted above. Chen further teaches wherein a forward propagation computation formula of the fully-connected layer is as follows:                         
                            O
                            =
                            R
                            
                                
                                    W
                                    I
                                    +
                                    b
                                
                            
                            ;
                        
                    
where                         
                            I
                        
                     and                         
                            O
                        
                     are respectively an input vector and an output vector of the fully connected layer,                         
                            W
                        
                     and                         
                            b
                        
                     are respectively a weight matrix and a bias matrix of the fully connected layer, and R is an activate function of the fully connected layer (equation 14, [page 18]).

With respect to claim 18, Chen teaches all of the limitations of claim 17, as noted above. Chen does not teach wherein a supervised training method of the fully-connected layer is as follows: training by taking a deep layer feature of the daily load curve of a certain date after SAE dimension reduction as input of the fully connected layer, and taking the meteorology sensitive load power curve as the output label of the fully connected layer in a corresponding date, and calculating an optimal fully-connected-layer parameter :  

    PNG
    media_image2.png
    54
    324
    media_image2.png
    Greyscale

 P1 where Oi is output of a last layer of the fully connected layer of an ith sample, PWi is a meteorology sensitive load power of an ith sample, and N' is a number of dates of the meteorology sensitive load power.
However, Goodfellow teaches wherein a supervised training method of the fully-connected layer is as follows: training by taking a deep layer feature of the daily load curve of a certain date after SAE dimension reduction as input of the fully connected layer, and taking the meteorology sensitive load power curve as the output label of the fully connected layer in a corresponding date, and calculating an optimal fully-connected-layer parameter :

    PNG
    media_image2.png
    54
    324
    media_image2.png
    Greyscale

P1 where Oi is output of a last layer of the fully connected layer of an ith sample, PWi is a meteorology sensitive load power of an ith sample, and N' is a number of dates of the meteorology sensitive load power (minimize the mean square error, i.e. equation 5.4, on the training set, [page 106 paragraph 5 line 5]; note that multiplying by the constant ½ does not change the parameters                         
                            
                                
                                    θ
                                
                                
                                    *
                                
                            
                        
                     at which the right hand side of the equation would be a minimum).
It would have been obvious to one skilled in the art before the effective filing date to combine Chen with Goodfellow because this is applying a known technique (solving for the min mean squared error) to a known method and device (Chen) ready for improvement to yield predictable results. Chen is the base reference that teaches all limitations except for the exact function for the mean squared error. Chen is ready for improvement because it does not teach how to express how the objective function could be minimizing the mean squared error. Goodfellow teaches a known technique of minimizing the mean squared error to train a machine learning function, (Goodfellow [page 106 paragraph 5 lines 1-5]), and how to solve said function by hand, (see equations 5.6-5.12, Goodfellow [pages 106-107]). One having ordinary skill in the art would have recognized that applying the known technique in Goodfellow of minimizing the MSE function would yield the predictable result of allowing one to solve for the weights and biases. Therefore, it would have been obvious to combine Chen with Goodfellow to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103.

With respect to claim 19, Chen in view of Goodfellow teaches all of the limitations of claim 17, as noted above. Chen further teaches wherein the meteorology sensitive load power curve for a supervised training of the fully-connected layer is computed by the following steps: performing data processing on a total load power and meteorology data of a certain region or a certain transformer station (substation in a city of Jiangsu Province, [page 21 paragraph 1 line 1-2]), and reordering to obtain a vertical data sample composed of a total load power and meteorology data at the same time on the same date in different months (1,420 valid "vertical" samples were finally obtained as shown in Table 1, [page 21 paragraph 2 lines 1-3]; see also Table 1, [page 8]); establishing a load-meteorology nonlinear association model between the total load power, the meteorology sensitive load power, and various pieces of meteorological information ([page 29 paragraph 2 lines 1-4]), and identifying model parameters by using a gradient method ([page 22 paragraph 3 lines 1-2]); and substituting the identified model parameters, longitudinal historical meteorology data, and total load power data into the association model, calculating a longitudinal meteorology sensitive load power curve, and arranging according to a normal time sequence to obtain a historical daily meteorology sensitive load power curve (Based on the correlation model parameters in Section 3.1, a total of 70 "horizontal" daily weather-sensitive load power curves could be calculated, [page 26 paragraph 2 lines 1-3]; see section 3.1 which gives the “basis power” or total power, [page 22 paragraph 1 line 4] and Table 2, which gives the meteorological sensitive parameters, [page 23]).


Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Estimation Method Of Meteorological Sensitive Load Power Based On Correlation Analysis And Stacked Auto-Encoder” (Chen) in view of “Deep Learning” (Goodfellow) in further view of “Data Normalization and Standardization: A technical Report” (Ali)
With respect to claim 8, Chen teaches all of the limitations of claim 1, as noted above. Chen further teaches performing a normalization process on a historical data sample before training the SAE model (each variable in each sample was normalized, [page 14 paragraph 2 lines 6-7]).
Chen does not teach restoring a normalization calculation result of each sample output by the fully-connected layer after training the fully connected layer.
	However, Ali teaches restoring a normalization calculation result of each sample output by the fully-connected layer after training the fully connected layer (normalization is used in the data preprocessing stage in which the data is prepared to be processed later by one of the data mining and machine learning techniques like neural network, [Abstract] lines 1-4; with a basic formula normalization provided as x’ = (x-min)/(max-min), [page 2]; and denormalization should be done if normalization applied, [page 2 paragraph 1 line 1] with a basic formula for denormalization provided as x = [x′ * (max − min)] + min, [page 2].
It would have been obvious to one skilled in the art before the effective filing date to combine Chen with Ali because this is applying a known technique (normalization and denormalization) to a known method and device (Chen) ready for improvement to yield predictable results. Chen is the base reference that teaches all limitations except for the exact function for normalization and denormalization. Chen is ready for improvement because although it describes normalization, it does not describe the necessary opposite process of denormalization. Ali cures this issue by giving a clear and concise definition of how such data is normalized during preprocessing, and how such normalized data is denormalized to give actual results, (Ali [pages 1-3]). One having ordinary skill in the art would have recognized that applying the known technique in Ali of using a normalization step and a denormalization step would yield the predictable result of allowing one to use the SAE-fully connected layer architecture described in Chen in Fig. 2 [page 18] on data that has been normalized (Chen, [page 14 paragraph 2 lines 6-7]) and then denormalized (Ali [pages 1-3]). Therefore, it would have been obvious to combine Chen with Ali to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103.







Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Deep Learning for Passive Synthetic Aperture Radar (Yonel) – eq. (5) includes the ½ with the L2-norm. To support this proposition, Yonel cites Goodfellow, as reference [30], [page 92].
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL EMERSON MILLER whose telephone number is (408)918-7548. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rehana Perveen can be reached on 571-272-3676. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/D.M./Examiner, Art Unit 2148                                                                                                                                                                                                        

/REHANA PERVEEN/Supervisory Patent Examiner, Art Unit 2148