DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Introduction
This office action is in response to communications filed on 08/26/2020. Claims 1-20 are pending, and as such, Claims 1-20 have been examined.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-3, 6-10 and 13-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Independent Claim 1 recites the limitations “A language processing method, carried out by at least one processor having access to at least one computer storage device”, “the method comprising: forming or accessing a classification and regression model”, “receiving training data, and adjusting the classification and regression model according to the training data;”, “and employing prior knowledge to optimize the classification and regression model”, “including applying feature weights to one or more features of the classification and regression model, to form an optimized classification and regression model”.
The limitations “the method comprising: forming or accessing a classification and regression model”, “receiving training data, and adjusting the classification and regression model according to the training data;”, “and employing prior knowledge to optimize the classification and regression model”, “including applying feature weights to one or more features of the classification and regression model, to form an optimized classification and regression model” as drafted, covers a mental process, as this could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application. Claim 1 recites “A language processing method, carried out by at least one processor having access to at least one computer storage device”. These limitations direct towards using a computer for the method, and does not impose any meaningful limits on practicing the abstract idea. Claim 1 does not contain any additional limitations.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The addition of the generic computer components recited above with regard to claim 1 does amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claim 1 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 2 recites the additional limitations of “The method of claim 1, wherein the optimization includes the at least one processor regularizing the classification and regression model.” These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 2  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 2 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 3 recites the additional limitations of “The method of claim 2, wherein the regularizing includes the at least one processor adjusting the feature weights according to at least some of the prior knowledge”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 3  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 3 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 6 recites the additional limitations of “The method of claim 2, wherein the regularizing includes the at least one processor adjusting a cost function to incorporate at least some of the prior knowledge”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 6  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 6 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 7 recites the additional limitations of “The method of claim 6, further comprising the at least one processor adjusting the cost function to give higher priority to a feature that is expected to be more general within training data”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 7  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 7 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Independent Claim 8 recites the limitations “A language processing system, comprising: at least one processor”, “configured to form or access a classification and regression model and to receive training data and to adjust the classification and regression model according to the training data;”, “and the at least one processor further configured”, “to employ prior knowledge and to apply feature weights to one or more features of the classification and regression model to form an optimized classification and regression model”.
The limitations “configured to form or access a classification and regression model and to receive training data and to adjust the classification and regression model according to the training data;”, “to employ prior knowledge and to apply feature weights to one or more features of the classification and regression model to form an optimized classification and regression model” as drafted, covers a mental process, as this could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application. Claim 8 recites “A language processing system, comprising: at least one processor”, “and the at least one processor further configured”. These limitations direct towards using a computer for the method, and does not impose any meaningful limits on practicing the abstract idea. Claim 8 does not contain any additional limitations.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The addition of the generic computer components recited above with regard to claim 8 does amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claim 8 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 9 recites the additional limitations of “The system of claim 8, wherein the at least one processor is configured to optimize the classification and regression model through regularization”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 9  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 9 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 10 recites the additional limitations of “The system of claim 9, wherein the at least one processor is configured to adjust feature weights according to at least some of the prior knowledge during the regularization”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 10  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 10 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 13 recites the additional limitations of “The system of claim 9, wherein the at least one processor is configured to adjust a cost function to incorporate at least some of the prior knowledge during regularization”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 13  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 13 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 14 recites the additional limitations of “The system of claim 13, wherein the at least one processor is configured to adjust the cost function to give higher priority to a feature that is expected to be more general within training data”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 14  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 14 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Independent Claim 15 recites the limitations of “A method employing natural language understanding, carried out by at least one processor having access to at least one computer storage device”, “the method comprising: forming or accessing a classification and regression model, receiving training data, and adjusting the classification and regression model according to the training data;”, 28Attorney Docket No. NUA-19-0035-US-ORG “testing the classification and regression model;”, “optimizing, employing prior knowledge, the classification and regression model”, “including applying feature weights to one or more features of the classification and regression model, to form an optimized classification and regression model;”, “and receiving operational data and employing the optimized classification and regression model to classify the operational data to adapt the feature weights”.
The limitations “the method comprising: forming or accessing a classification and regression model, receiving training data, and adjusting the classification and regression model according to the training data;”, 28Attorney Docket No. NUA-19-0035-US-ORG “testing the classification and regression model;”, “optimizing, employing prior knowledge, the classification and regression model”, “including applying feature weights to one or more features of the classification and regression model, to form an optimized classification and regression model;”, “and receiving operational data and employing the optimized classification and regression model to classify the operational data to adapt the feature weights” as drafted, covers a mental process, as this could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application. Claim 15 recites “A method employing natural language understanding, carried out by at least one processor having access to at least one computer storage device”. These limitations direct towards using a computer for the method, and does not impose any meaningful limits on practicing the abstract idea. Claim 15 does not contain any additional limitations.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The addition of the generic computer components recited above with regard to claim 15 does amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claim 15 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 16 recites the additional limitations of “The method of claim 15, wherein the classification of the operational data is related to the intent of the operational data”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 16  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 16 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 17 recites the additional limitations of “The method of claim 16, wherein optimizing includes the at least one processor regularizing the classification and regression model”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 17  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 17 does not contain any additional limitations. The claim as drafted, is not patent eligible.

Dependent Claim 18 recites the additional limitations of “The method of claim 17, wherein the regularizing includes the at least one processor adjusting the feature weights according to at least some of the prior knowledge”. These limitations cover mental processes, as they could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application as Claim 18  does not contain any additional limitations. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as Claim 18 does not contain any additional limitations. The claim as drafted, is not patent eligible.

The following states reasons as to why Claims 4, 5, 11, 12, 19 and 20 were not rejected under USC 101.

Claims 4, 11, 19 and 20(depends from claim 19) include “Natural Language Understanding feature weights” as the weights of the system. “Natural Language Understanding” as understood as a term of the art, when referring to weights, would imply the complexity of the model is high enough, such that executing the model on pen and paper would not be realistically feasible.
Claims 5 and 12 both include “machine learning feature weights” as the weights of the system. This limitation would exclude pen and paper models, as machine learning models would be too complex to feasibly use on pen and paper.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-6, 8-13 and 15-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Fei et al (US 11354506 B2).

Regarding Claim 1:
	Fei teaches a language processing method, carried out by at least one processor having access to at least one computer storage device, the method comprising: forming or accessing a classification and regression model(Abstract Ln 5-8, Presented herein are novel approaches to learn coreference-aware word representations for the NER task. In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture. Col 26, Ln 36-37, central processing units. Col 16, Ln 43-44, system memory), 
	receiving training data, and adjusting the classification and regression model according to the training data(Col 8, Ln 57-62, a method for training a coreference-aware NER model,…. a document comprising words (or a batch comprising a plurality of documents) is input (405) into a coreference-aware named entity recognition (NER) network. Col 9, Ln 10-12, Finally, an objective function is minimized (430) to update parameters of the coreference-aware NER network. Col 9, Table 1 shows data sets for NER); 
	and employing prior knowledge to optimize the classification and regression model(Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM layer to incorporate prior knowledge about coreferential relations among entity mentions), 
	including applying feature weights to one or more features of the classification and regression model, to form an optimized classification and regression model(Col 6, Ln 17-23, word sequence …input, for each word x.sub.i,.. combining its word embedding w.sub.i.sup.word from a word embedding component 210 with its character-level features .. and with one or more extra word-level features. Col 6, Ln 35-36, to obtain the word x.sub.i's hidden representation h.sub.i.  Col 7, Ln 50-60, Eq 1 shows weight W.sub.core being applied to h.sub.i).

	Regarding Claim 2:
	Fei teaches the method of claim 1, wherein the optimization includes the at least one processor regularizing the classification and regression model(Abstract, Ln 11-12, a coreference regularization is added during training).

	Regarding Claim 3:
	Fei teaches the method of claim 2, wherein the regularizing includes the at least one processor adjusting the feature weights according to at least some of the prior knowledge(Col 7, Ln 50-58, f.sub.coref(h.sub.i) = …..W.sub.core[ ….,h.sub.C.sub.k]………….where W.sub.core…are weight…h.sub.C.sub.k is the coreference vector. Col 8, Ln 33-40, The coreference regularization term may take the following form: R.sub.coref = …....f.sub.coref(h.sub.i)). 

	Regarding Claim 4:
	Fei teaches the method of claim 2, wherein the classification and regression model is a natural language understanding model and the features weights include natural language understanding feature weights(Abstract, Ln 5-10, Presented herein are novel approaches to learn coreference-aware word representations for the NER task. In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture is modified to include a coreference layer component on top of the BiLSTM layer to incorporate coreferential relations. This Named entity recognition requires the model to learn an understanding of the language, as it learns the context of the sentence through the Bi-LSTM and coreference. Col 7, Ln 57, where W.sub.coref and b.sub.coref are the weight and bias parameters).

	Regarding Claim 5:
	Fei teaches the method of claim 2, wherein the feature weights include machine learning feature weights(Abstract, Ln 7-8, In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture. Col 7, Ln 57, where W.sub.coref and b.sub.coref are the weight and bias parameters).

	Regarding Claim 6:
	Fei teaches the method of claim 2, wherein the regularizing includes the at least one processor adjusting a cost function to incorporate at least some of the prior knowledge(Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM layer to incorporate prior knowledge about coreferential relations among entity mentions. Col8, Ln 21-29, the CRF layer 330 can make consistent predictions across different coreferential mentions….to guide the word representation learning of the coreference layer, a regularization may be applied them to the output word vectors of the coreference layer component 325. The resulting regularization term may also be minimized as a part of the final objective function during model training. The objective function is a cost/loss function, See Col 19, Ln 10-15, and a parameterized coreference regularization to penalize difference between coreference representations for different words of the document that are members of the same coreference cluster; and using the loss to update parameters of the coreference-aware NER network. Or, Col 8, Ln 50-53, objective function for full model).

	Regarding Claim 8:
Fei teaches a language processing system, comprising: at least one processor configured to form or access a classification and regression model(Abstract Ln 5-8, Presented herein are novel approaches to learn coreference-aware word representations for the NER task. In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture. Col 26, Ln 36-37, central processing units. Col 16, Ln 43-44, system memory)
and to receive training data and to adjust the classification and regression model according to the training data(Col 8, Ln 57-62, a method for training a coreference-aware NER model,…. a document comprising words (or a batch comprising a plurality of documents) is input (405) into a coreference-aware named entity recognition (NER) network. Col 9, Ln 10-12, Finally, an objective function is minimized (430) to update parameters of the coreference-aware NER network. Col 9, Table 1 shows data sets for NER); 
and the at least one processor further configured to employ prior knowledge(Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM layer to incorporate prior knowledge about coreferential relations among entity mentions) 
and to apply feature weights to one or more features of the classification and regression model to form an optimized classification and regression model(Col 6, Ln 17-23, word sequence …input, for each word x.sub.i,.. combining its word embedding w.sub.i.sup.word from a word embedding component 210 with its character-level features .. and with one or more extra word-level features. Col 6, Ln 35-36, to obtain the word x.sub.i's hidden representation h.sub.i.  Col 7, Ln 50-60, Eq 1 shows weight W.sub.core being applied to h.sub.i).

Regarding Claim 9:
Fei teaches the system of claim 8, wherein the at least one processor is configured to optimize the classification and regression model through regularization(Abstract, Ln 11-12, a coreference regularization is added during training).

Regarding Claim 10:
Fei teaches the system of claim 9, wherein the at least one processor is configured to adjust feature weights according to at least some of the prior knowledge during the regularization(Col 7, Ln 50-58, f.sub.coref(h.sub.i) = …..W.sub.core[ ….,h.sub.C.sub.k]………….where W.sub.core…are weight…h.sub.C.sub.k is the coreference vector. Col 8, Ln 33-40, The coreference regularization term may take the following form: R.sub.coref = …....f.sub.coref(h.sub.i)).

Regarding Claim 11:
Fei teaches the system of claim 9, wherein the at least one processor is configured to employ natural language understanding feature weights when adjusting the feature weights(Abstract, Ln 5-10, Presented herein are novel approaches to learn coreference-aware word representations for the NER task. In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture is modified to include a coreference layer component on top of the BiLSTM layer to incorporate coreferential relations. This Named entity recognition requires the model to learn an understanding of the language, as it learns the context of the sentence through the Bi-LSTM and coreference. Col 7, Ln 57, where W.sub.coref and b.sub.coref are the weight and bias parameters. Col 8, Ln 27-29, The resulting regularization term may also be minimized as a part of the final objective function during model training).

Regarding Claim 12:
Fei teaches the system of claim 9, wherein the at least one processor is configured to employ machine learning feature weights when adjusting the feature weights(Abstract, Ln 7-8, In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture. Col 7, Ln 57, where W.sub.coref and b.sub.coref are the weight and bias parameters. Col 8, Ln 27-29, The resulting regularization term may also be minimized as a part of the final objective function during model training). 

Regarding Claim 13:
Fei teaches The system of claim 9, wherein the at least one processor is configured to adjust a cost function to incorporate at least some of the prior knowledge during regularization(Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM layer to incorporate prior knowledge about coreferential relations among entity mentions. Col8, Ln 21-29, the CRF layer 330 can make consistent predictions across different coreferential mentions….to guide the word representation learning of the coreference layer, a regularization may be applied them to the output word vectors of the coreference layer component 325. The resulting regularization term may also be minimized as a part of the final objective function during model training. The objective function is a cost/loss function, See Col 19, Ln 10-15, and a parameterized coreference regularization to penalize difference between coreference representations for different words of the document that are members of the same coreference cluster; and using the loss to update parameters of the coreference-aware NER network. Or, Col 8, Ln 50-53, objective function for full model).

Regarding Claim 15:
Fei teaches a method employing natural language understanding(Abstract, Ln 5-7, Presented herein are novel approaches to learn coreference-aware word representations for the NER task. Named entity recognition requires the model to learn an understanding of the language), 
carried out by at least one processor having access to at least one computer storage device(Col 26, Ln 36-37, central processing units. Col 16, Ln 43-44, system memory), 
the method comprising: forming or accessing a classification and regression model(Abstract Ln 5-8, Presented herein are novel approaches to learn coreference-aware word representations for the NER task. In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture), 
receiving training data, and adjusting the classification and regression model according to the training data(Col 8, Ln 57-62, a method for training a coreference-aware NER model,…. a document comprising words (or a batch comprising a plurality of documents) is input (405) into a coreference-aware named entity recognition (NER) network. Col 9, Ln 10-12, Finally, an objective function is minimized (430) to update parameters of the coreference-aware NER network. Col 9, Table 1 shows data sets for NER); 28Attorney Docket No. NUA-19-0035-US-ORG 
testing the classification and regression model(Col 9, Table 1, Test…Dataset 1); 
optimizing, employing prior knowledge, the classification and regression model(Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM layer to incorporate prior knowledge about coreferential relations among entity mentions), 
including applying feature weights to one or more features of the classification and regression model, to form an optimized classification and regression model(Col 6, Ln 17-23, word sequence …input, for each word x.sub.i,.. combining its word embedding w.sub.i.sup.word from a word embedding component 210 with its character-level features .. and with one or more extra word-level features. Col 6, Ln 35-36, to obtain the word x.sub.i's hidden representation h.sub.i.  Col 7, Ln 50-60, Eq 1 shows weight W.sub.core being applied to h.sub.i); 
and receiving operational data and employing the optimized classification and regression model to classify the operational data to adapt the feature weights(Col 9, Train …Dataset 1…Dataset 2. Col 7, Ln 57, where W.sub.coref and b.sub.coref are the weight and bias parameters. Col 8, Ln 57-62, a method for training a coreference-aware NER model,…. a document comprising words (or a batch comprising a plurality of documents) is input (405) into a coreference-aware named entity recognition (NER) network. Col 9, Ln 10-12, Finally, an objective function is minimized (430) to update parameters of the coreference-aware NER network).

Regarding Claim 16:
	Fei teaches the method of claim 15, wherein the classification of the operational data is related to the intent of the operational data(This Named Entity Recognition(NER) learns sentence context as it uses a Bi-directional LSTM(Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM. Col 6, Ln 17-37 explains Bi-direction LSTM and context) and it uses coreference data(Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM layer to incorporate prior knowledge about coreferential relations among entity mentions). NER with the context learns the meaning of the word and the surrounding context in order to predict the entity. The meaning of the sentence is the intent, so the classification is related to the intent).

Regarding Claim 17:
Fei teaches the method of claim 16, wherein optimizing includes the at least one processor regularizing the classification and regression model(Abstract, Ln 11-12, a coreference regularization is added during training).  

Regarding Claim 18:
Fei teaches the method of claim 17, wherein the regularizing includes the at least one processor adjusting the feature weights according to at least some of the prior knowledge Col 7, Ln 50-58, f.sub.coref(h.sub.i) = …..W.sub.core[ ….,h.sub.C.sub.k]………….where W.sub.core…are weight…h.sub.C.sub.k is the coreference vector. Col 8, Ln 33-40, The coreference regularization term may take the following form: R.sub.coref = …....f.sub.coref(h.sub.i).

	Regarding Claim 19:
Fei teaches the method of claim 18, wherein the classification and regression model is a natural language understanding model and the features weights include natural language understanding feature weights(Abstract, Ln 5-10, Presented herein are novel approaches to learn coreference-aware word representations for the NER task. In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture is modified to include a coreference layer component on top of the BiLSTM layer to incorporate coreferential relations. This Named entity recognition requires the model to learn an understanding of the language, as it learns the context of the sentence through the Bi-LSTM and coreference. Col 7, Ln 57, where W.sub.coref and b.sub.coref are the weight and bias parameters).

Regarding Claim 20:
Fei teaches the method of claim 19, wherein the regularizing includes adjusting a cost function to incorporate at least some of the prior knowledge and/or to give higher priority to a feature that is expected to be more general within training data(Optional limitation, prior knowledge: Col 4, Ln 7-9, a coreference component is added on top of the BiLSTM layer to incorporate prior knowledge about coreferential relations among entity mentions. Col8, Ln 21-29, the CRF layer 330 can make consistent predictions across different coreferential mentions….to guide the word representation learning of the coreference layer, a regularization may be applied them to the output word vectors of the coreference layer component 325. The resulting regularization term may also be minimized as a part of the final objective function during model training. The objective function is a cost/loss function, See Col 19, Ln 10-15, and a parameterized coreference regularization to penalize difference between coreference representations for different words of the document that are members of the same coreference cluster; and using the loss to update parameters of the coreference-aware NER network. Or, Col 8, Ln 50-53, objective function for full model).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Fei as applied to claim 6 above, and further in view of Blog.datadive.net  “Selecting good features – Part II: linear models and regularization” hereinafter DataDiveBlog.

	Regarding Claim 7:
 	Fei teaches the method of claim 6, but Fei does not explicitly teach further comprising the at least one processor adjusting the cost function to give higher priority to a feature that is expected to be more general within training data.
	In the same field of Machine Learning Regularization, DataDiveBlog teaches further comprising the at least one processor adjusting the cost function to give higher priority to a feature that is expected to be more general within training data(Pg 2, Regularization Models, Para 1, Ln 1-3, Regularization is a method for adding additional constraints or penalty to a model, with the goal of preventing overfitting and improving generalization. Instead of minimizing a loss function E(X,Y), the loss function to minimize becomes E(X,Y)+α∥w∥. Pg 3, L1 Regularization/Lasso, Para 1, Ln 1-3, L1 regularization.…forces weak features to have zero as coefficients. Thus L1 regularization produces sparse solutions, inherently performing feature selection. The feature is “expected” as more weight is given to a feature that shows up more(more general), therefore the feature being seen more during training, makes it expected to be more common in the rest of the data).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Fei, with the Lasso Regularization of DataDiveBlog, as it prevents overfitting and improving generalization(Pg 2, Regularized Models, Para 1, Ln 1-2).

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Fei as applied to claim 13 above, and further in view of DataDiveBlog.

Regarding Claim 14:
Fei teaches the system of claim 13, but Fei does not explicitly teach wherein the at least one processor is configured to adjust the cost function to give higher priority to a feature that is expected to be more general within training data.
	In the same field of Machine Learning Regularization, DataDiveBlog teaches wherein the at least one processor is configured to adjust the cost function to give higher priority to a feature that is expected to be more general within training data(Pg 2, Regularization Models, Para 1, Ln 1-3, Regularization is a method for adding additional constraints or penalty to a model, with the goal of preventing overfitting and improving generalization. Instead of minimizing a loss function E(X,Y), the loss function to minimize becomes E(X,Y)+α∥w∥. Pg 3, L1 Regularization/Lasso, Para 1, Ln 1-3, L1 regularization.…forces weak features to have zero as coefficients. Thus L1 regularization produces sparse solutions, inherently performing feature selection. The feature is “expected” as more weight is given to a feature that shows up more(more general), therefore the feature being seen more during training, makes it expected to be more common in the rest of the data).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Fei, with the Lasso Regularization of DataDiveBlog, as it prevents overfitting and improving generalization(Pg 2, Regularized Models, Para 1, Ln 1-2).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Fei et al. (US 20210012215 A1)
Synonym prediction using prior knowledge through regularization.
Tutubalina et al. (US 20200019611 A1)
Sentiment priors and regularization.
Zhang et al. “Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization”
Machine translation using prior knowledge through regularization.
	Song et al. “Learning Word Representations with Regularization from Prior Knowledge”
Word embeddings using prior knowledge dictionaries through regularization.
	Grosse “Prior knowledge and overfitting”
Blog explaining effects of regularization, as well as how regularization is technically an incorporation of prior knowledge.
	Wang “Incorporating linguistic knowledge for learning distributed word representations”
Embeddings using prior knowledge and regularization.
	Su et al. (US 11043205 B1)
Intention Identification.
	Brown et al. (US 20200364511 A1)
Intention Identification.
	Shen et al. (US 20210027020 A1)(has provisional with prior date)
Intention Identification.
	Perez et al. (US 20180121415 A1)
Intention Identification.
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER G MARLOW whose telephone number is (571)272-4536. The examiner can normally be reached Monday - Thursday 11:00 am - 9:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richmond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER G MARLOW/Assistant Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658