Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain
meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.

As explained in MPEP §2181, subsection I, claim limitations that meet the following three-prong
test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;

(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and

 (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.


This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are:

“an automatic compression component configured to ……” in claim 5. 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
In paragraph 0026 of applicant’s specification, it is clearly described that the multi-task language model includes a compression model that compress data in the platform.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

	




EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone interview with Mike Wiersch  (Reg. No. 55996) on 04/20/2022. Please see attached interview summary for details. 
The claims has been amended as follows: 
Claim 1, (Cancelled) 
Claim 2, (Currently Amended)  A meta-knowledge fine tuning method for a multi-task language model, comprising the following stages:	
a first stage, calculating the prototypes of cross-domain data sets of tasks of the same category: embedded features of the prototypes of the corresponding domains of the tasks of the category is intensively learned from the data sets of different domains of the tasks of the same category, and the average embedded feature of all input texts of the tasks of the same category in different domains is taken as a corresponding multi-domain category prototype of the tasks of the same category;
a second stage, calculating typical scores of instances: where                 
                    
                        
                            d
                        
                        
                            s
                            e
                            l
                            f
                        
                    
                
             represents the distance between the embedded feature of each instance and                 
                    
                        
                            d
                        
                        
                            o
                            t
                            h
                            e
                            r
                            s
                        
                    
                
             represents the distance between the embedded feature of each instance and other domain prototypes; and the typical score of each instance is defined as a linear combination of                 
                    
                        
                            d
                        
                        
                            s
                            e
                            l
                            f
                        
                    
                
             and                 
                    
                        
                            d
                        
                        
                            o
                            t
                            h
                            e
                            r
                            s
                        
                    
                
            ;
and a third stage, a meta-knowledge fine tuning network based on typical scores: the typical scores obtained in the second stage is used as weight coefficients of the meta-knowledge fine tuning network, and a multi-task typical sensitive label classification loss function is designed as a learning objective function of meta-knowledge fine tuning; and the loss function penalizes the labels of the instances of all domains that the language model predicts incorrectly;
wherein in the first stage,                 
                    
                        
                            D
                        
                        
                            m
                        
                        
                            k
                        
                    
                
             represents a set of input texts                 
                    
                        
                            x
                        
                        
                            i
                        
                        
                            k
                        
                    
                
             with a category label m in a kth domain                 
                    
                        
                            D
                        
                        
                            k
                        
                    
                
             of the data set:
                
                    
                        
                            D
                        
                        
                            m
                        
                        
                            k
                        
                    
                    =
                    
                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                            ∨
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                        
                                            k
                                        
                                    
                                    ，
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            ∈
                            
                                
                                    D
                                
                                
                                    k
                                
                            
                            ，
                            
                                
                                    y
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                            =
                            m
                        
                    
                
            
where m∈M, M represents a set of all category labels in the data set; and                 
                    
                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                            ，
                            
                                
                                    y
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                        
                    
                
             represents an ith instance in the kth domain;
the category prototype                 
                    
                        
                            c
                        
                        
                            m
                        
                        
                            k
                        
                    
                
             represents the average embedded feature of all input texts with the category label m in the kth domain:
                
                    
                        
                            c
                        
                        
                            m
                        
                        
                            k
                        
                    
                    =
                    
                        
                            1
                        
                        
                            
                                
                                    D
                                
                                
                                    m
                                
                                
                                    k
                                
                            
                        
                    
                    
                        
                            ∑
                            
                                
                                    
                                        x
                                    
                                    
                                        i
                                    
                                    
                                        k
                                    
                                
                                ∈
                                
                                    
                                        D
                                    
                                    
                                        m
                                    
                                    
                                        k
                                    
                                
                            
                        
                        
                            E
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                        
                                            k
                                        
                                    
                                
                            
                        
                    
                
            
wherein,                 
                    ℇ
                    
                        
                            ∙
                        
                    
                
             represents an embedded expression of                 
                    
                        
                            x
                        
                        
                            i
                        
                        
                            k
                        
                    
                
             output by a BERT model; and for the BERT model, the average embedded feature is the average pooling of the last layer of Transformer encoder corresponding to the input                 
                    
                        
                            x
                        
                        
                            i
                        
                        
                            k
                        
                    
                
            .
Claim 3. The meta-knowledge fine tuning method for the multi-task language model according to claim 2, wherein in the second stage, the typical score                 
                    
                        
                            t
                        
                        
                            i
                        
                        
                            k
                        
                    
                
             of the instance                 
                    
                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                            ，
                            
                                
                                    y
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                        
                    
                
             is expressed as:
                
                    
                        
                            t
                        
                        
                            i
                        
                        
                            k
                        
                    
                    =
                    α
                    
                        
                            
                                
                                    ∑
                                    
                                        m
                                        ∈
                                        M
                                    
                                
                                
                                    
                                        
                                            β
                                        
                                        
                                            m
                                        
                                    
                                    c
                                    o
                                    s
                                    
                                        
                                            E
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            i
                                                        
                                                        
                                                            k
                                                        
                                                    
                                                
                                            
                                            ，
                                            
                                                
                                                    c
                                                
                                                
                                                    m
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                
                            
                        
                        
                            
                                
                                    ∑
                                    
                                        m
                                        ∈
                                        M
                                    
                                
                                
                                    
                                        
                                            β
                                        
                                        
                                            m
                                        
                                    
                                
                            
                        
                    
                    +
                    
                        
                            1
                            -
                            α
                        
                        
                            K
                            -
                            1
                        
                    
                    ∙
                    
                        
                            ∑
                            
                                
                                    
                                        k
                                    
                                    ~
                                
                                =
                                1
                            
                            
                                K
                            
                        
                        
                            
                                
                                    
                                        
                                            1
                                        
                                        
                                            
                                                
                                                    
                                                        
                                                            k
                                                        
                                                        ~
                                                    
                                                    ≠
                                                    k
                                                
                                            
                                        
                                    
                                    
                                        
                                            ∑
                                            
                                                m
                                                ∈
                                                M
                                            
                                        
                                        
                                            
                                                
                                                    β
                                                
                                                
                                                    m
                                                
                                            
                                        
                                    
                                    c
                                    o
                                    s
                                    
                                        
                                            E
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            i
                                                        
                                                        
                                                            k
                                                        
                                                    
                                                
                                            
                                            ，
                                            
                                                
                                                    c
                                                
                                                
                                                    m
                                                
                                                
                                                    
                                                        
                                                            k
                                                        
                                                        ~
                                                    
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                m
                                                ∈
                                                M
                                            
                                        
                                        
                                            
                                                
                                                    β
                                                
                                                
                                                    m
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                
            
where α represents a predefined balance factor, and 0<α<1;                 
                    c
                    o
                    s
                    
                        
                            ∙
                            ,
                            ∙
                        
                    
                
             represents a cosine similarity measurement function; K represents the number of domains;                 
                    
                        
                            1
                        
                        
                            
                                
                                    
                                        
                                            k
                                        
                                        ~
                                    
                                    ≠
                                    k
                                
                            
                        
                    
                
             represents an indication function, if                 
                    
                        
                            k
                        
                        ~
                    
                    ≠
                    k
                
            , 1 is returned, and if                 
                    
                        
                            k
                        
                        ~
                    
                    =
                    k
                
            , 0 is returned, the index                 
                    
                        
                            k
                        
                        ~
                    
                
            is used for summation; and                 
                    
                        
                            β
                        
                        
                            m
                        
                    
                    >
                    0
                
             represents a weight of                 
                    
                        
                            x
                        
                        
                            i
                        
                        
                            k
                        
                    
                
            , and the weight of                 
                    
                        
                            x
                        
                        
                            i
                        
                        
                            k
                        
                    
                
             of the same category is the same.
Claim 4, The meta-knowledge fine tuning method for the multi-task language model according to claim 3, wherein in the third stage, the multi-task typical sensitive label classification loss function                 
                    
                        
                            L
                        
                        
                            T
                        
                    
                
             is expressed as:
                
                    
                        
                            L
                        
                        
                            T
                        
                    
                    =
                    
                        
                            -
                            1
                        
                        
                            K
                        
                    
                    
                        
                            ∑
                            
                                
                                    
                                        
                                            
                                                x
                                            
                                            
                                                i
                                            
                                            
                                                k
                                            
                                        
                                        ，
                                        
                                            
                                                y
                                            
                                            
                                                i
                                            
                                            
                                                k
                                            
                                        
                                    
                                
                                ∈
                                D
                            
                        
                        
                            
                                
                                    ∑
                                    
                                        m
                                        ∈
                                        M
                                    
                                
                                
                                    
                                        
                                            1
                                        
                                        
                                            
                                                
                                                    
                                                        
                                                            y
                                                        
                                                        
                                                            i
                                                        
                                                        
                                                            k
                                                        
                                                    
                                                    =
                                                    m
                                                
                                            
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            ∙
                            l
                            o
                            g
                            
                                
                                    τ
                                
                                
                                    m
                                
                            
                            
                                
                                    f
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                
            
where D represents a set of all domains;                 
                    
                        
                            1
                        
                        
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                        
                                            k
                                        
                                    
                                    =
                                    m
                                
                            
                        
                    
                
             represents an indication function, if                 
                    
                        
                            y
                        
                        
                            i
                        
                        
                            k
                        
                    
                    =
                    m
                
            , 1 is returned, and if                 
                    
                        
                            y
                        
                        
                            i
                        
                        
                            k
                        
                    
                    ≠
                    m
                
            , 0 is returned;                 
                    
                        
                            τ
                        
                        
                            m
                        
                    
                    
                        
                            f
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                        
                                            k
                                        
                                    
                                
                            
                        
                    
                
             represents the probability that the category label                 
                    
                        
                            x
                        
                        
                            i
                        
                        
                            k
                        
                    
                
             is predicted as m; and                 
                    f
                    
                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                        
                    
                
             represents an embedded layer feature of the token of "[CLS]" output by the last layer of the BERT model.
Claim 5, (Currently Amended) A meta-knowledge fine tuning method for a multi-task language model, comprising the following stages:	
a first stage, calculating the prototypes of cross-domain data sets of tasks of the same category: embedded features of the prototypes of the corresponding domains of the tasks of the category is intensively learned from the data sets of different domains of the tasks of the same category, and the average embedded feature of all input texts of the tasks of the same category in different domains is taken as a corresponding multi-domain category prototype of the tasks of the same category;
a second stage, calculating typical scores of instances: where                 
                    
                        
                            d
                        
                        
                            s
                            e
                            l
                            f
                        
                    
                
             represents the distance between the embedded feature of each instance and                 
                    
                        
                            d
                        
                        
                            o
                            t
                            h
                            e
                            r
                            s
                        
                    
                
             represents the distance between the embedded feature of each instance and other domain prototypes; and the typical score of each instance is defined as a linear combination of                 
                    
                        
                            d
                        
                        
                            s
                            e
                            l
                            f
                        
                    
                
             and                 
                    
                        
                            d
                        
                        
                            o
                            t
                            h
                            e
                            r
                            s
                        
                    
                
            ; and
a third stage, a meta-knowledge fine tuning network based on typical scores: the typical scores obtained in the second stage is used as weight coefficients of the meta-knowledge fine tuning network, and a multi-task typical sensitive label classification loss function is designed as a learning objective function of meta-knowledge fine tuning; and the loss function penalizes the labels of the instances of all domains that the language model predicts incorrectly
a data loading component configured to obtain a training sample of a multi-task-oriented pre-training language model, wherein the training sample is a labeled text sample that satisfies a supervised learning task;
an automatic compression component configured to automatically compress the multi-task-oriented pre-training language model, comprising a pre-training language model and a meta-knowledge fine tuning module,
 wherein the meta-knowledge fine tuning module is used for constructing a downstream task network on the pre-training language model generated by the automatic compression component, performing fine tuning on a downstream task scenario by using the meta-knowledge of a typical score, outputting a final fine-tuned student model, that is, a compression model of the pre-training language model which is required by a logged-in user and comprises a downstream task; outputting the compression model to a designated container for the logged-in user to download, and presenting the comparison information of model size before and after the compression; and
an inference component: the logged-in user obtains the compression model of the pre-training language model from the platform, and the user uses the compression model output by the automatic compression component to infer the new data of a natural language processing downstream task uploaded by the logged-in user on the data set of the actual scenario, and presents the comparison information of the inference speed before and after the compression.


Allowable Subject Matter
Examiner’s reason for Allowance
Claims 2-5 are allowed. Renumbered as 1-4.
 Claim 2, A meta-knowledge fine tuning method for a multi-task language model, comprising the following stages:	
a first stage, calculating the prototypes of cross-domain data sets of tasks of the same category: embedded features of the prototypes of the corresponding domains of the tasks of the category is intensively learned from the data sets of different domains of the tasks of the same category, and the average embedded feature of all input texts of the tasks of the same category in different domains is taken as a corresponding multi-domain category prototype of the tasks of the same category;
a second stage, calculating typical scores of instances: where             
                
                    
                        d
                    
                    
                        s
                        e
                        l
                        f
                    
                
            
         represents the distance between the embedded feature of each instance and             
                
                    
                        d
                    
                    
                        o
                        t
                        h
                        e
                        r
                        s
                    
                
            
         represents the distance between the embedded feature of each instance and other domain prototypes; and the typical score of each instance is defined as a linear combination of             
                
                    
                        d
                    
                    
                        s
                        e
                        l
                        f
                    
                
            
         and             
                
                    
                        d
                    
                    
                        o
                        t
                        h
                        e
                        r
                        s
                    
                
            
        ;
and a third stage, a meta-knowledge fine tuning network based on typical scores: the typical scores obtained in the second stage is used as weight coefficients of the meta-knowledge fine tuning network, and a multi-task typical sensitive label classification loss function is designed as a learning objective function of meta-knowledge fine tuning; and the loss function penalizes the labels of the instances of all domains that the language model predicts incorrectly;
wherein in the first stage,             
                
                    
                        D
                    
                    
                        m
                    
                    
                        k
                    
                
            
         represents a set of input texts             
                
                    
                        x
                    
                    
                        i
                    
                    
                        k
                    
                
            
         with a category label m in a kth domain             
                
                    
                        D
                    
                    
                        k
                    
                
            
         of the data set:
            
                
                    
                        D
                    
                    
                        m
                    
                    
                        k
                    
                
                =
                
                    
                        
                            
                                x
                            
                            
                                i
                            
                            
                                k
                            
                        
                        ∨
                        
                            
                                
                                    
                                        x
                                    
                                    
                                        i
                                    
                                    
                                        k
                                    
                                
                                ，
                                
                                    
                                        y
                                    
                                    
                                        i
                                    
                                    
                                        k
                                    
                                
                            
                        
                        ∈
                        
                            
                                D
                            
                            
                                k
                            
                        
                        ，
                        
                            
                                y
                            
                            
                                i
                            
                            
                                k
                            
                        
                        =
                        m
                    
                
            
        
where m∈M, M represents a set of all category labels in the data set; and             
                
                    
                        
                            
                                x
                            
                            
                                i
                            
                            
                                k
                            
                        
                        ，
                        
                            
                                y
                            
                            
                                i
                            
                            
                                k
                            
                        
                    
                
            
         represents an ith instance in the kth domain;
the category prototype             
                
                    
                        c
                    
                    
                        m
                    
                    
                        k
                    
                
            
         represents the average embedded feature of all input texts with the category label m in the kth domain:
            
                
                    
                        c
                    
                    
                        m
                    
                    
                        k
                    
                
                =
                
                    
                        1
                    
                    
                        
                            
                                D
                            
                            
                                m
                            
                            
                                k
                            
                        
                    
                
                
                    
                        ∑
                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                            ∈
                            
                                
                                    D
                                
                                
                                    m
                                
                                
                                    k
                                
                            
                        
                    
                    
                        E
                        
                            
                                
                                    
                                        x
                                    
                                    
                                        i
                                    
                                    
                                        k
                                    
                                
                            
                        
                    
                
            
        
wherein,             
                ℇ
                
                    
                        ∙
                    
                
            
         represents an embedded expression of             
                
                    
                        x
                    
                    
                        i
                    
                    
                        k
                    
                
            
         output by a BERT model; and for the BERT model, the average embedded feature is the average pooling of the last layer of Transformer encoder corresponding to the input             
                
                    
                        x
                    
                    
                        i
                    
                    
                        k
                    
                
            
        .

The following is an examiner's statement of reasons for allowance:Regarding claim 2, the prior art of record, specifically LIU et al. (US Patent Application Publication #20210142181) teaches  a neural network structure that have different layers that perform different specific functions. such as pooling, encoding, or convolution operations. For the purposes of this document, the term “layer” refers to a group of nodes that share inputs and outputs, e.g., to or from external sources or other layers in the network. The trained models can share the same model structure and yet have different values for the parameters, e.g., if the two models trained on different training data or if there are underlying stochastic processes in the training process. (Paragraphs 0019). Larson et al. (US 20200320982) teaches a speech processing that provides input data that may be used as a basis for training a model for speech processing, validating a model for speech processing, testing a model for speech processing or determining a classification for speech data.  (Paragraphs 0003). Larson et al. (US 20200320982) teaches a speech processing that provides input data that may be used as a basis for training a model for speech processing, validating a model for speech processing, testing a model for speech processing or determining a classification for speech data.  (Paragraphs 0003). Larson continuous to teach that After a model is trained based on the determined data, the model's performance may exhibit more resilience to a wider range of speech properties. Based on validating a model based on the determined data, the model's performance may indicate the model's resilience to a wider range of speech properties. Further, the performance of a classification task may exhibit more resilience to error or noise. (Paragraph 0008) 

However, none of the prior art cited alone or in combination provides the motivation to teach wherein in the first stage,             
                
                    
                        D
                    
                    
                        m
                    
                    
                        k
                    
                
            
         represents a set of input texts             
                
                    
                        x
                    
                    
                        i
                    
                    
                        k
                    
                
            
         with a category label m in a kth domain             
                
                    
                        D
                    
                    
                        k
                    
                
            
         of the data set:
            
                
                    
                        D
                    
                    
                        m
                    
                    
                        k
                    
                
                =
                
                    
                        
                            
                                x
                            
                            
                                i
                            
                            
                                k
                            
                        
                        ∨
                        
                            
                                
                                    
                                        x
                                    
                                    
                                        i
                                    
                                    
                                        k
                                    
                                
                                ，
                                
                                    
                                        y
                                    
                                    
                                        i
                                    
                                    
                                        k
                                    
                                
                            
                        
                        ∈
                        
                            
                                D
                            
                            
                                k
                            
                        
                        ，
                        
                            
                                y
                            
                            
                                i
                            
                            
                                k
                            
                        
                        =
                        m
                    
                
            
        
where m∈M, M represents a set of all category labels in the data set; and             
                
                    
                        
                            
                                x
                            
                            
                                i
                            
                            
                                k
                            
                        
                        ，
                        
                            
                                y
                            
                            
                                i
                            
                            
                                k
                            
                        
                    
                
            
         represents an ith instance in the kth domain;
the category prototype             
                
                    
                        c
                    
                    
                        m
                    
                    
                        k
                    
                
            
         represents the average embedded feature of all input texts with the category label m in the kth domain:
            
                
                    
                        c
                    
                    
                        m
                    
                    
                        k
                    
                
                =
                
                    
                        1
                    
                    
                        
                            
                                D
                            
                            
                                m
                            
                            
                                k
                            
                        
                    
                
                
                    
                        ∑
                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                            ∈
                            
                                
                                    D
                                
                                
                                    m
                                
                                
                                    k
                                
                            
                        
                    
                    
                        E
                        
                            
                                
                                    
                                        x
                                    
                                    
                                        i
                                    
                                    
                                        k
                                    
                                
                            
                        
                    
                
            
        
wherein,             
                ℇ
                
                    
                        ∙
                    
                
            
         represents an embedded expression of             
                
                    
                        x
                    
                    
                        i
                    
                    
                        k
                    
                
            
         output by a BERT model; and for the BERT model, the average embedded feature is the average pooling of the last layer of Transformer encoder corresponding to the input             
                
                    
                        x
                    
                    
                        i
                    
                    
                        k
                    
                
            
        .

	Claim 5, A meta-knowledge fine tuning method for a multi-task language model, comprising the following stages:	
a first stage, calculating the prototypes of cross-domain data sets of tasks of the same category: embedded features of the prototypes of the corresponding domains of the tasks of the category is intensively learned from the data sets of different domains of the tasks of the same category, and the average embedded feature of all input texts of the tasks of the same category in different domains is taken as a corresponding multi-domain category prototype of the tasks of the same category;
a second stage, calculating typical scores of instances: where             
                
                    
                        d
                    
                    
                        s
                        e
                        l
                        f
                    
                
            
         represents the distance between the embedded feature of each instance and             
                
                    
                        d
                    
                    
                        o
                        t
                        h
                        e
                        r
                        s
                    
                
            
         represents the distance between the embedded feature of each instance and other domain prototypes; and the typical score of each instance is defined as a linear combination of             
                
                    
                        d
                    
                    
                        s
                        e
                        l
                        f
                    
                
            
         and             
                
                    
                        d
                    
                    
                        o
                        t
                        h
                        e
                        r
                        s
                    
                
            
        ; and
a third stage, a meta-knowledge fine tuning network based on typical scores: the typical scores obtained in the second stage is used as weight coefficients of the meta-knowledge fine tuning network, and a multi-task typical sensitive label classification loss function is designed as a learning objective function of meta-knowledge fine tuning; and the loss function penalizes the labels of the instances of all domains that the language model predicts incorrectly
a data loading component configured to obtain a training sample of a multi-task-oriented pre-training language model, wherein the training sample is a labeled text sample that satisfies a supervised learning task;
an automatic compression component configured to automatically compress the multi-task-oriented pre-training language model, comprising a pre-training language model and a meta-knowledge fine tuning module,
 wherein the meta-knowledge fine tuning module is used for constructing a downstream task network on the pre-training language model generated by the automatic compression component, performing fine tuning on a downstream task scenario by using the meta-knowledge of a typical score, outputting a final fine-tuned student model, that is, a compression model of the pre-training language model which is required by a logged-in user and comprises a downstream task; outputting the compression model to a designated container for the logged-in user to download, and presenting the comparison information of model size before and after the compression; and
an inference component: the logged-in user obtains the compression model of the pre-training language model from the platform, and the user uses the compression model output by the automatic compression component to infer the new data of a natural language processing downstream task uploaded by the logged-in user on the data set of the actual scenario, and presents the comparison information of the inference speed before and after the compression.

The following is an examiner's statement of reasons for allowance:Regarding claim 5, the prior art of record, specifically LIU et al. (US Patent Application Publication #20210142181) teaches  a neural network structure that have different layers that perform different specific functions. such as pooling, encoding, or convolution operations. For the purposes of this document, the term “layer” refers to a group of nodes that share inputs and outputs, e.g., to or from external sources or other layers in the network. The trained models can share the same model structure and yet have different values for the parameters, e.g., if the two models trained on different training data or if there are underlying stochastic processes in the training process. (Paragraphs 0019). Larson et al. (US 20200320982) teaches a speech processing that provides input data that may be used as a basis for training a model for speech processing, validating a model for speech processing, testing a model for speech processing or determining a classification for speech data.  (Paragraphs 0003). Larson continuous to teach that After a model is trained based on the determined data, the model's performance may exhibit more resilience to a wider range of speech properties. Based on validating a model based on the determined data, the model's performance may indicate the model's resilience to a wider range of speech properties. Further, the performance of a classification task may exhibit more resilience to error or noise. (Paragraph 0008) 
However, none of the prior art cited alone or in combination provides the motivation to teach wherein the meta-knowledge fine tuning module is used for constructing a downstream task network on the pre-training language model generated by the automatic compression component, performing fine tuning on a downstream task scenario by using the meta-knowledge of a typical score, outputting a final fine-tuned student model, that is, a compression model of the pre-training language model which is required by a logged-in user and comprises a downstream task; outputting the compression model to a designated container for the logged-in user to download, and presenting the comparison information of model size before and after the compression

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AKWASI M SARPONG/Primary  Examiner, Art Unit 2675                                                                                                                                                                                                        04/22/2022