Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to application 16/685,651, which was filed 11/15/19. Claims 1-34 are pending in the application and have been considered.


Specification
Content of Specification
The specification is objected to as it appears to be missing labels for the following sections:

(g) BACKGROUND OF THE INVENTION: See MPEP § 608.01(c). The specification should set forth the Background of the Invention in two parts:
(1) Field of the Invention: A statement of the field of art to which the invention pertains. This statement may include a paraphrasing of the applicable U.S. patent classification definitions of the subject matter of the claimed invention. This item may also be titled “Technical Field.”
(2) Description of the Related Art including information disclosed under 37 CFR 1.97 and 37 CFR 1.98: A description of the related art known to the applicant and including, if applicable, references to specific related art and problems involved in the prior art which are solved by the applicant’s invention. This item may also be titled “Background Art.”
(h) BRIEF SUMMARY OF THE INVENTION: See MPEP § 608.01(d). A brief summary or general statement of the invention as set forth in 37 CFR 1.73. The summary is separate and distinct from the 

Paragraph [0014] appears to Applicant’s Background of the Invention and paragraph [0015] appears to be Applicant’s Brief Summary of the Invention. The examiner suggests placing these before the “Brief Description of the Figures” section on page 1, and titling and renumbering the paragraphs/sections as appropriate.

Claim Objections
In claim 4, line 1, should “one or more accuracy prediction comprise” be “one or more accuracy predictions comprise”?

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed 


Claims 1-4, 9, 10, 13, 18-21, 26, 27, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. (2018/0322370) in view of Relangi et al. (10,635,751).

Consider claim 1, Sun discloses a method comprising: obtaining, by at least one processor (processor, [0086]), a document comprising text tokens (considering “document” a written item, a query containing characters, i.e. “text tokens”, [0011]); determining, by the at least one processor and based on a pre-trained language model, word embeddings corresponding to the text tokens (a probability value is determined according to a pre-trained probably value evaluating model, for example, a word embedding model, [0103]-[0104]); determining, by the at least one processor and based on the word embeddings: named entities corresponding to the text tokens (performing named entity recognition, [0011]); and one or more accuracy predictions corresponding to the named entities (confidence level of the candidate character, [0036]); comparing, by the at least one processor, the one or more accuracy predictions with at least one threshold (summing the parameters and comparing to a threshold, [0036]-[0039]); associating, by the at least one processor and based on the comparing, the named entities with one or more confidence levels (determining a bad case, [0039]); and 
Sun does not specifically mention delivering, by the at least one processor, the named entities and the one or more confidence levels.
Relangi discloses delivering, by the at least one processor, the named entities and the one or more confidence levels (one or more processors, Col 1 lines 60-62, outputting a confidence level to indicate the estimated accuracy of the named entities that have been labeled, Col 6 lines 13-16).



Consider claim 18, Sun discloses a system comprising: a non-volatile memory (ROM, [0206]); at least one processor (processor executes programs stored on memory, [0203]), coupled to the non-volatile memory, configured to: obtain a document comprising text tokens (considering “document” a written item, a query containing characters, i.e. “text tokens”, [0011]); determine, based on a pre-trained language model, word embeddings corresponding to the text tokens (a probability value is determined according to a pre-trained probably value evaluating model, for example, a word embedding model, [0103]-[0104]); determine, based on the word embeddings: named entities corresponding to the text tokens (performing named entity recognition, [0011]); and one or more accuracy predictions corresponding to the named entities (confidence level of the candidate character, [0036]); compare the one or more accuracy predictions with at least one threshold (summing the parameters and comparing to a threshold, [0036]-[0039]); associate, based on the comparing, the named entities with one or more confidence levels (determining a bad case, [0039]).
Sun does not specifically mention delivering, by the at least one processor, the named entities and the one or more confidence levels.
Relangi discloses delivering the named entities and the one or more confidence levels (outputting a confidence level to indicate the estimated accuracy of the named entities that have been labeled, Col 6 lines 13-16).


Consider claim 2, Sun does not, but Relangi discloses the document is a financial document (trends in data for financial analysis, Col 63-65, stored in a document system, Col 18 lines 53-54). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun such that the document is a financial document for reasons similar to those for claim 1.

Consider claim 3, Sun discloses a word embedding is a vector of real numbers (as those in the field of word embedding would have been aware, the word2vec tool produces vectors containing real numbers, [0109]). 

Consider claim 4, Sun discloses the one or more accuracy predictions comprise at least one of: a token-level accuracy prediction for a first text token of the text tokens; and a document-level accuracy prediction for the document (confidence level of the candidate character is considered “token-level” since a character is the smallest unit of text, [0036]). 

Consider claim 9, Sun discloses: the determining the named entities is based at least on a first decoder (Fig 1, perform named entity recognition for a to-be recognized query, using a “current named entity recognition system”, [0099], which is considered a “first decoder”); and the determining the one or more accuracy predictions is based at least on a second decoder (obtaining probability of forming a 


Consider claim 10, Sun does not, but Relangi discloses fine tuning the pre-trained language model based on at least one first outcome of the first decoder or at least one second outcome of the second decoder (training the name entity recognition model using the new labeled data set, when the pseudo labels output from the trained named entity recognition models match, Fig 1 steps 130, 140, 150, which is considered “fine tuning” a “pre-trained” language model because the training is iterative based on examples, so after partial training it is considered “pre-training” and the remaining examples are considered “fine tuning” examples, Col 7 lines 42-56). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun by fine tuning the pre-trained language model based on at least one first outcome of the first decoder or at least one second outcome of the second decoder for reasons similar to those for claim 1.

Consider claim 13, Sun discloses concatenating contiguous text tokens, that have the same associated named entity, to form a first sequence of text tokens; and extracting information based on the first sequence (considering a neighboring character together with a current character for probability of named entity evaluation is considered “concatenating”, the recognizing considered the “extracting”, [0100]-[0102]). 
Consider claim 19, Sun does not, but Relangi discloses the document is a financial document (trends in data for financial analysis, Col 63-65, stored in a document system, Col 18 lines 53-54). 


Consider claim 20, Sun discloses a word embedding is a vector of real numbers (as those in the field of word embedding would have been aware, the word2vec tool produces vectors containing real numbers, [0109]).

Consider claim 21, Sun discloses the one or more accuracy predictions comprise at least one of: a token-level accuracy prediction for a first text token of the text tokens; and a document-level accuracy prediction for the document (confidence level of the candidate character is considered “token-level” since a character is the smallest unit of text, [0036]).

Consider claim 26, Sun discloses: the determining the named entities is based at least on a first decoder (Fig 1, perform named entity recognition for a to-be recognized query, using a “current named entity recognition system”, [0099], which is considered a “first decoder”); and the determining the one or more accuracy predictions is based at least on a second decoder (obtaining probability of forming a word using the word embedding model, which is considered the “second decoder, [0102]-[0103], Fig 1 step 102).

Consider claim 27, Sun does not, but Relangi discloses fine tuning the pre-trained language model based on at least one first outcome of the first decoder or at least one second outcome of the second decoder (training the name dentity recognition model using the new labeled data set, when the pseudo labels output from the trained named eneitty recognition models match, Fig 1 steps 130, 140, 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun by fine tuning the pre-trained language model based on at least one first outcome of the first decoder or at least one second outcome of the second decoder for reasons similar to those for claim 1.

Consider claim 30, Sun discloses concatenating contiguous text tokens, that have the same associated named entity, to form a first sequence of text tokens; and extracting information based on the first sequence (considering a neighboring character together with a current character for probability of named entity evaluation is considered “concatenating”, [0100]-[0102]).

Claims 5-8, 11, 12, 22-25, 28, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. (2018/0322370) in view of Relangi et al. (10,635,751), in further view of Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. arXiv:1810.04805v2 [cs.CL] 24 May 2019).

Consider claim 5, Sun and Relangi do not, but Devlin discloses the pre-trained language model is a bidirectional transformer encoder model comprising a plurality of encoder layers (BERT’s model architecture is a multi-layer bidirectional transformer encoder, page 3, section 3). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the pre-trained language 

Consider claim 6, Sun and Relangi do not, but Devlin discloses an output of a first encoder layer, of the plurality of encoder layers, is an input to a second encoder layer of the plurality of encoder layers (the input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings, Figure 2, page 4). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that an output of a first encoder layer, of the plurality of encoder layers, is an input to a second encoder layer of the plurality of encoder layers for reasons similar to those for claim 5.

Consider claim 7, Sun and Relangi do not, but Devlin discloses an encoder layer comprises a self-attention sublayer and a feedforward neural network sublayer (In this work, we denote the number of layers (i.e., Transformer blocks) as L, the hidden size as H, and the number of self-attention heads as A. 3 We primarily report results on two model sizes: BERTBASE (L=12, H=768, A=12, Total Parameters=110M) and BERTLARGE (L=24, H=1024, A=16, Total Parameters=340M), page 3, Section 3, Model Architecture). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that an encoder layer comprises a self-attention sublayer and a feedforward neural network sublayer for reasons similar to those for claim 5.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the encoder layer has a sequence of input values; and the self-attention sublayer comprises processing a first input value, of the input values, based at least on a second input value of the input values for reasons similar to those for claim 5.

Consider claim 11, Sun and Relangi do not, but Devlin discloses: training one or more first parameters of the first decoder based on at least one first outcome of the first decoder and at least one second outcome of the second decoder; and training one or more second parameters of the second decoder based on the at least one second outcome of the second decoder and the at least one first outcome of the first decoder (considering the left to right context as the “first decoder” and right to left context as the “second decoder”, the bidirectional representation training uses outputs of these as inputs during training, pages 3-4, Section 3: BERT). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi by training one or more first parameters of the first decoder based on at least one first outcome of the first decoder and at least one second outcome of the second decoder; and training one or more second parameters of the second decoder based on the at least one second outcome of the second decoder and the at least one first outcome of the first decoder for reasons similar to those for claim 5.

Consider claim 12, Sun and Relangi do not, but Devlin discloses a decoder comprises one or more of a self-attention sublayer, an encoder-decoder attention sublayer and a feedforward linear neural network sublayer (self-attention heads, page 3, Section 3, Model Architecture).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that a decoder comprises one or more of a self-attention sublayer, an encoder-decoder attention sublayer and a feedforward linear neural network sublayer for reasons similar to those for claim 5.

Consider claim 22, Sun and Relangi do not, but Devlin discloses the pre-trained language model is a bidirectional transformer encoder model comprising a plurality of encoder layers (BERT’s model architecture is a multi-layer bidirectional transformer encoder, page 3, section 3). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the pre-trained language model is a bidirectional transformer encoder model comprising a plurality of encoder layers for reasons similar to those for claim 5.

Consider claim 23, Sun and Relangi do not, but Devlin discloses an output of a first encoder layer, of the plurality of encoder layers, is an input to a second encoder layer of the plurality of encoder layers (the input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings, Figure 2, page 4). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that an output of a first encoder 
Consider claim 24, Sun and Relangi do not, but Devlin discloses an encoder layer comprises a self-attention sublayer and a feedforward neural network sublayer (In this work, we denote the number of layers (i.e., Transformer blocks) as L, the hidden size as H, and the number of self-attention heads as A. 3 We primarily report results on two model sizes: BERTBASE (L=12, H=768, A=12, Total Parameters=110M) and BERTLARGE (L=24, H=1024, A=16, Total Parameters=340M), page 3, Section 3, Model Architecture). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that an encoder layer comprises a self-attention sublayer and a feedforward neural network sublayer for reasons similar to those for claim 5.
Consider claim 25, Sun and Relangi do not, but Devlin discloses the encoder layer has a sequence of input values; and the self-attention sublayer comprises processing a first input value, of the input values, based at least on a second input value of the input values (the input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings, Figure 2, page 4, self-attention heads, page 3, Section 3, Model Architecture).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the encoder layer has a sequence of input values; and the self-attention sublayer comprises processing a first input value, of the input values, based at least on a second input value of the input values for reasons similar to those for claim 5.


It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi by training one or more first parameters of the first decoder based on at least one first outcome of the first decoder and at least one second outcome of the second decoder; and training one or more second parameters of the second decoder based on the at least one second outcome of the second decoder and the at least one first outcome of the first decoder for reasons similar to those for claim 5.

Consider claim 29, Sun and Relangi do not, but Devlin discloses a decoder comprises one or more of a self-attention sublayer, an encoder-decoder attention sublayer and a feedforward linear neural network sublayer (self-attention heads, page 3, Section 3, Model Architecture).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that a decoder comprises one or more of a self-attention sublayer, an encoder-decoder attention sublayer and a feedforward linear neural network sublayer for reasons similar to those for claim 5.

Claims 14-17 and 31-34 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. (2018/0322370) in view of Relangi et al. (10,635,751), in further view of Tucker et al. (2019/0354720).

Consider claim 14, Sun and Relangi do not, but Tucker discloses the obtaining the document is based on an optical character recognition processing of an image of the document (OCR, [0043]).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the obtaining the document is based on an optical character recognition processing of an image of the document in order to better extract information, as suggested by Tucker ([0004]).

Consider claim 15, Sun and Relangi do not, but Tucker discloses the document is a structured document (extracting information from structured documents, [0035]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the document is a structured document for reasons similar to those for claim 14.

Consider claim 16, Sun and Relangi do not, but Tucker discloses a document comprises a plurality of pre-defined fields, wherein the text tokens are derived from the plurality of pre-defined fields (various fields in the request, [0060], Fig 5). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the document comprises a plurality of pre-defined fields, wherein the text tokens are derived from the plurality of pre-defined fields for reasons similar to those for claim 14.

Consider claim 17, Sun and Relangi do not, but Tucker discloses the document is an unstructured document (extracting information from unstructured documents, [0035]).



Consider claim 31, Sun and Relangi do not, but Tucker discloses the obtaining the document is based on an optical character recognition processing of an image of the document (OCR, [0043]).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the obtaining the document is based on an optical character recognition processing of an image of the document for reasons similar to those for claim 14.

Consider claim 32, Sun and Relangi do not, but Tucker discloses the document is a structured document (extracting information from structured documents, [0035]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the document is a structured document for reasons similar to those for claim 14.

Consider claim 33, Sun and Relangi do not, but Tucker discloses a document comprises a plurality of pre-defined fields, wherein the text tokens are derived from the plurality of pre-defined fields (various fields in the request, [0060], Fig 5). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the document comprises a 

Consider claim 34, Sun and Relangi do not, but Tucker discloses the document is an unstructured document (extracting information from unstructured documents, [0035]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sun and Relangi such that the document is an unstructured document for reasons similar to those for claim 14.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
9,037,464 Mikolov et al. disclose computing numeric representations of word in a high-dimensional space (word2vec)
2021/0034701 Fei et al. disclose coreference-aware representation for neural named entity recognition
Sienčnik, Scharolta Katharina (“Adapting word2vec to Named Entity Recognition”. Proceedings of the 20th Nordic Conference of Computational Linguistics, 2015) demonstrates that word vectors built using word2vec can be used to improve the performance of a classifier during Named Entity Recognition
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                    01/25/22