DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities:  claim 1 line 21 recite “BERT”, however BERT is an acronym that’s needs to be writing out at least once in the claim.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

claim 1 recites “in order that a certain kind of classifier is driven to be of…”, it is not clear if “certain kind of classifier” is referring to the adversarial domain classifier or a separate classifier. For the purpose of examining, ‘certain kind of classifier’ is interpreted as ‘the adversarial domain classifier’. Appropriate clarification/correction is required. 

claim 1 recites “a loss function needs to be maximized . . .”, however, because it’s not really a step of the method—it says that something needs to be done, but it doesn’t state that the method is doing it. The claim asserts that the loss function “needs” to be maximized, but it does not in fact perform the step of maximizing the loss function, nor does it state what entity or process maximizes the loss function. The claim also does not describe what the loss function applies to (e.g. is it loss of the “certain kind of classifier” or the “adversarial domain classifier”, etc.). For the purpose of examining, ‘a loss function needs to be maximized’ is interpreted as ‘loss optimization’. Appropriate clarification/correction is required.

The term “always tends to predict a label of a wrong domain” in claim 1 is a relative term which renders the claim indefinite. The term “tends” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Also, Does the invention  always “predict …”, or does the invention just tend to “predict …”?. For the purpose of examining, ‘always tends to predict a label of a wrong domain’ is interpreted as ‘any degree of one or more predicted wrong labels’. Appropriate clarification/correction is required.

Claim 1 recite “proposing the classifier to predict…” It’s not clear what triggers the classifier to predict since it’s based on a proposal. For the purpose of examining, ‘proposing the classifier to predict …’ is interpreted to mean ‘the classifier performs a prediction’. Appropriate clarification/correction is required. 

Claim 1 recite “to hide and reveal any domain information…” It’s not clear what is the method hiding and what is being revealed. Also how can something be simultaneously hidden and revealed. For the purpose of examining, ‘to hide and reveal any domain information …’ is interpreted to mean ‘hidden domain information’. Appropriate clarification/correction is required. 

Claim 1, recite “to ensure that even if the classifier learns … a damaged output will be generated.” It’s not clear weather this means that the damaged output will always be generated. If so, why “even if”? Or is it that “if the classifier …” then why “even”?. For the purpose of examining, the term “to ensure that even if the classifier learns … a damaged output will be generated”, is interpreted as ‘damaged output will always be generated’. Appropriate clarification/correction is required.

Claim 1, recite “the theory of adversarial machine learning”, There is insufficient antecedent basis for this limitation in the claim. Appropriate clarification/correction is required.
Claim 1 recite “wherein a loss function needs to be maximized according to the theory of adversarial machine learning”, it is not clear what is the theory that is being used to maximize the loss function. Appropriate clarification/correction is required.




Claims should be proofread for antecedent issues, for example, Claim 1 recites the limitation "the categories of different domains”.  There is insufficient antecedent basis for this limitation in the claim. Appropriate clarification/correction is required.

Claims should be proofread for clarity issues, other than the mentioned issues above. Appropriate clarification/correction is required.

Claims 2-5 are rejected as they are being directly or indirectly dependent on rejected claim 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1 is rejected under 35 USC 103 as being unpatentable over Li et al.  (“MetaNER: Named Entity Recognition with Meta-Learning”, 2020 IW3C2) in view of Luo et al (“CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations”, arXiv, 2020).

Regarding claim 1. 
Li teaches a method for meta-knowledge fine-tuning based on domain-invariant features (see figure 2, MetaNER including meta-knowledge and fine-tuning based on domain features), comprising the following stages: 
a first stage of constructing an adversarial domain classifier (see page 33, “the meta-learning strategy aims to encourage the model to learn good parameters that can be adapted to a new domains with as little data as possible. We also adopt an adversarial training strategy to improve model generalization. The adversarial network ensures that the intermediate representations from the sequence encoder can mislead the domain discriminator and correctly guide the tag decoder prediction, while the domain discriminator tries its best to correctly determine the domain class of each training instance.”, i.e. adversarial training strategy to correctly determine domain class corresponds to adversarial domain classifier): 
adding the adversarial domain classifier by meta-knowledge fine-tuning to optimize downstream tasks (see figure 2, MetaNER including meta-knowledge and fine-tuning based on domain features with the adversarial training strategy as showing in figure 2), in order that a certain kind of classifier is driven to be capable of distinguishing the categories of different domains (see figure 2, also see page 432, under section 4.1, “Our ultimate goal is to learn a meta-knowledge learner for the sequence encoder by leveraging sufficient source data Ds . Given a new unseen domain from Dnew (which can be either homogeneous or heterogeneous), the new learning task of NER can be solved by fine-tuning the learned sequence encoder (domain-invariant parameters) and a new tag decoder (domain specific parameters) with only a small number of training samples.”, also page 432 going into page 433, section 4.2, “The validation errors on Dval should be considered to improve the transferability of the model. In short, the meta-learning strategy aims to encourage the model to learn good parameters that can be adapted to a new domains with as little data as possible. We also adopt an adversarial training strategy to improve model generalization. The adversarial network ensures that the intermediate representations from the sequence encoder can mislead the domain discriminator and correctly guide the tag decoder prediction, while the domain discriminator tries its best to correctly determine the domain class of each training instance”. i.e. MetaNER including meta-knowledge and fine-tuning with adversarial training strategy to correctly determine domain class corresponds distinguishing the categories of different domains), 
constructing the adversarial domain classifier, wherein a loss function needs to be maximized according to the theory of adversarial machine learning, so that the domain classifier is capable of predicting real domain labels (see figure 2, using inner optimization and outer optimization maximizes the loss function using adversarial training, to learn new domains to use for predicting labels.
    PNG
    media_image1.png
    574
    1013
    media_image1.png
    Greyscale
); 
in order that a prediction probability of the adversarial domain classifier always tends to predict a label of a wrong domain when the loss function of an exchange is minimized domain (see figure 2, Temporal Model corresponds to exchange domain, page 430, “the temporary model is evaluated on the meta validation sets to minimize the domain divergence, enabling metaknowledge transfer across different domains”), proposing the classifier to predict a label of a direct exchange domain to minimize the loss function of the exchange domain, so that the learned features are independent of the domain (see figure 2 and figure 3, also see page 433 going into page 434, “At learning time, in order to encourage domain invariant features, we seek the parameters θ that maximize the loss of the domain discriminator (by making the two feature distributions as indistinguishable as possible), while simultaneously seeking the parameters θ and γ that minimize the loss of the domain discriminator. In addition, we seek the parameters ϕ that minimize the loss of the tag decoder. Thus, the optimization problem involves a minimization with respect to some parameters and a maximization with respect to others”); 
a second stage of constructing an input feature, wherein the input feature is composed of word embedding representation and domain embedding representation (see page 433, “The input representation in our study consists of character-level and word-level representations. Given an input sentence W = (W1,W2, . . . ,WL) of length L, W ∈ D, let Wl denote its l-th word.”, i.e. W ∈ D is word embedding and domain, also see figures 2 and 3, i.e. input representations corresponds to word embedding and domain 1-N are the domain embeddings); 
a third stage of learning domain-invariant features: constructing a domain damage objective function (see figure 2, unseen domain Dnew corresponds to domain damage) based on the adversarial domain classifier (see figure 2, “final evaluation phase”, where a new domain is tested using the MetaNER adversarial training); 
inputting the domain embedding representation of a real domain into the classifier to ensure that even if the classifier learns real domain information from the domain embedding representation of the real domain, a damaged output will still be generated (see figure 2, final evaluation phase and page 433, “The adversarial network ensures that the intermediate representations from the sequence encoder can mislead the domain discriminator and correctly guide the tag decoder prediction, while the domain discriminator tries its best to correctly determine the domain class of each training instance. In the final evaluation phase, the meta-knowledge learned by the sequence encoder can be applied to new domains. Given a new domain Dnew = {Tt r , Tte }, the learned sequence encoder and a new tag decoder are fine-tuned on Tt r and finally tested on Tte . Next, we briefly introduce the sequence labeling model (i.e., “sequence encoder + tag decoder”). Then, we describe the adversarial training strategies and meta-learning strategy in detail.”, i.e. unseen domain are still output and then used in the sequence encoder fine-tuned and tested corresponds to a damaged output is still generated);  

Li do not teach forcing the word embedding representation of BERT to hide and reveal any domain information, and ensuring the domain-invariance of the features of an input text.
Luo teaches forcing the word embedding representation of BERT to hide and reveal any domain information, and ensuring the domain-invariance of the features of an input text (see page 2, “CAPT aims at aiding E in learning noise invariant sequence representations by enhancing the consistency between representations of the original sequence and its corrupted version. Specifically, for pre-trained model E and any sequence x, the model-specific noise (e.g. masking in BERT) can be added to x to construct its corrupted version ˆx. Then, the pre-trained model E encodes x or ˆx with self-attention mechanism [32] to obtain hidden representations h(x) = E(x) or h(ˆx) = E(ˆx). Both h(x) and h(ˆx) belong to the representation space Rm×d, where m denotes the length of the input sequence and d is the dimension of hidden representation.”, i.e. Masking BERT can be constructed for corrupted (damaged) domains to obtain hidden representation which corresponds to hiding domain information).
Both Li and Luo pertain to the problem of learning sequence representations, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Li and Luo to force the word representation of BERT to hide and reveal hidden representations to learn invariant representations. The motivation for doing so would be “The proposed CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals. In this way, it not only alleviates the pretrain-finetune discrepancy induced by the noise of pre-training, but also aids the pre-trained model in better capturing global semantics of the input via more effective sentence-level supervision.” (See Luo Abstract).

Claims 2-4 are rejected under 35 USC 103 as being unpatentable over Li et al.  (“MetaNER: Named Entity Recognition with Meta-Learning”, 2020 IW3C2) in view of Luo et al (“CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations”, arXiv, 2020) in further view of Wang et al. (“Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining”, 2020).

Regarding claim 2. 
Li and Luo teach the method for meta-knowledge fine-tuning based on domain-invariant features according to claim 1, However Li and Lou do not teach the equations presented in claim 2. 
	Wang teaches wherein in the first stage, the step of constructing the adversarial domain classifier comprises: step 1.1: defining the adversarial domain classifier; taking two different domains ki and k2 into consideration, in order to drive a certain classifier to be capable of distinguishing the categories of different domains, constructing an adversarial domain classifier, and defining the loss function LAD of the adversarial domain classifier as: (Examiner notes the rest of the claim as an image for clarity of the equations. 

    PNG
    media_image2.png
    738
    748
    media_image2.png
    Greyscale


(Examiner notes the rejection by Wang will also be an image for clarity, please see page 4 and 5 for both Equation and explanations, loss function LAD of the adversarial domain classifier (see figure below with the definition of variables) and loss function LFAD of exchange domain minimization (see below for Flip domain loss with variables definitions), respectfully). 

    PNG
    media_image3.png
    341
    493
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    609
    510
    media_image4.png
    Greyscale


Li, Luo and Wang pertain to the problem of learning sequence representations, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Li, Luo and Wang to teach use the equations in the limitations above. The motivation for doing so would be “It further encourages the language model to encode domain invariant representations by optimizing a series of novel domain corruption loss functions. After MFT, the model can be fine-tuned for each domain with better parameter initialization and higher generalization ability. We implement MFT upon BERT to solve several multi-domain text mining tasks. Experimental results confirm the effectiveness of MFT and its usefulness for few-shot learning..” (See Wang Abstract).

Regarding claim 3. 
Li, Luo and Wang teach the method for meta-knowledge fine-tuning based on domain-invariant features according to claim 2,
Wang further teach (examiner notes images of the claim and rejection will be provided for clarity).

    PNG
    media_image5.png
    491
    750
    media_image5.png
    Greyscale
 
    PNG
    media_image6.png
    37
    402
    media_image6.png
    Greyscale
 

(please see page 5, (image is provided for clarity))

    PNG
    media_image7.png
    408
    511
    media_image7.png
    Greyscale
 
    PNG
    media_image8.png
    39
    296
    media_image8.png
    Greyscale


The motivation utilized in the combination of claim 2, applies equally as well to claim 3.

Regarding claim 4. 
Li, Luo and Wang teach the method for meta-knowledge fine-tuning based on domain-invariant features according to claim 3,
Wang further teach (examiner notes images of the claim and rejection will be provided for clarity).

    PNG
    media_image9.png
    221
    810
    media_image9.png
    Greyscale


(please see page 5, (image is provided for clarity), wherein 
    PNG
    media_image10.png
    32
    66
    media_image10.png
    Greyscale
 is the predictable probability of the input feature which is, 
    PNG
    media_image11.png
    39
    369
    media_image11.png
    Greyscale
)

    PNG
    media_image12.png
    464
    502
    media_image12.png
    Greyscale


The motivation utilized in the combination of claim 2, applies equally as well to claim 4.

Allowable Subject Matter
Claim 5 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and overcoming 112 rejections above.

Below are the closest cited references, each of which disclose various aspects of the claimed invention:
·         Sun et al. (“Patient Knowledge Distillation for BERT Model Compression”) discloses, inter alia, model compression including Fine-tuning BERT representation.
           However, none of the prior art references of record—alone or in combination—disclose or suggest the combined features recited in dependent claim 5, including specifically: “an automatic compression component configured to automatically compress the pre-trained language model, comprising the pre-trained language model and a meta-knowledge fine-tuning module; wherein the meta-knowledge fine-tuning module is configured to construct a downstream task network on the pre-trained language model generated by the automatic compression component, fine-tune a downstream task scene by using meta-knowledge of the domain-invariant features, and output a finally fine-tuned compression model; the compression model is output to a designated container for a login user to download, and comparison information about model sizes before and after compression is presented on a page of the output compression model of the platform; a reasoning component, wherein the login user obtains a pre-trained compression model from the platform, and the user uses the compression model output by the automatic compression component to reason new data of a natural language processing downstream task uploaded by the login user on a data set of an actual scene; and the comparison information about reasoning speeds before and after compression is presented on a page of the compression model reasoning of the platform.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IMAD M KASSIM whose telephone number is (571)272-2958. The examiner can normally be reached mon-fri 730-500.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J. Huntley can be reached on (303) 297 - 4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/IMAD KASSIM/Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129