DETAILED ACTION
This action is responsive to the Amendment filed on 24 August 2022. Claims 1-10 are pending in the case. Claims 1, 5 and 6 are the independent claims.
This office action is FINAL.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s Response
In Applicant’s response dated 24 August 2022 (hereinafter Response), Applicant amended Claims 1, 3-6, and 10; and argued against all objections and rejections previously set forth in the Office Action dated 24 May 2022.
Applicant’s amendment to claims 1, 3-6, and 10 to further clarify the metes and bounds of the invention are acknowledged.
Response to Amendment/Arguments
In response to Applicant's amendment with respect to the previous objection of claim(s) 3, 4, and 10, the objection to the claim(s) is respectfully withdrawn.
In response to Applicant’s amendment with respect to the previous objection of claim 5, the first reason for the objection has been cured by the amendment. However, claim 5 remains a substantial duplicate of claim 2 incorporating the limitations of claim 1.
In response to Applicant's argument with respect to the rejection of claims 1, 5, and 6 under 35 USC 103 as unpatentable over ZHANG in view of DANG (see Response, page 7), Examiner respectfully disagrees.
Applicant has added the limitation “wherein the learning of the first parameter and the second parameter includes changing the first parameter and the second parameter by decreasing loss acquired based on a difference between correct answer tag included in the correct answer information and probability distribution acquired based on the second input sentence” to the independent claims. The limitation is supported in the disclosure as originally filed at [0063-0064], [0066-0068].
Applicant’s response broadly discusses the teachings of ZHANG (The reference ZHANG only discloses that when learning Named Entity Recognition from teacher data assumed to be correct, it combines tasks to predict the next character) and states without evidence the combination of references may not be relied upon to teach the claims (ZHANG and DAND [sic] do not disclose or suggest learning by using paraphrase of teacher data, and do not disclose the above-noted features of amended claims 1, 5 and 6).
DANG explicitly states (see page 3541 § 2.3.1) Except for the token and abbreviation embeddings, the two others are fine-tuned during the D3NER training by backpropagating gradients as the D3NER layers are stacked on top of each others, allowing the NER loss during the supervised training be used to make update to these embeddings.
As is well known in the art of Artificial Intelligence, “backpropagation” is A common method of training a neural net in which the initial system output is compared to the desired output, and the system is adjusted until the difference between the two is minimized.1
The “probability distribution” is the initial system output during training; the “correct answer tag” is the desired output, and the parameters of the system are adjusted to minimize the loss (the difference between probability distribute and the correct answer tag).
As Applicant makes no other arguments against the rejection of record nor presents any evidence of patentability over the cited references, the claims 1-10 remain rejected under 35 USC 103 as unpatentable over ZHANG in view of DANG, the rejections restated in response to Applicant’s amendment.


Claim Objections
Claim 5 is objected to for reciting “A learning method…the extraction method comprising” and for being a substantial duplicate of claim 1+2 (as can be seen in the table below the only difference being the preamble which is not limiting).
Claim 5
Claim 1+2
5. A learning method to be executed by a computer, the extraction method comprising: when a first input sentence in which a predetermined target is represented by a first named entity is input to a first machine learning model, learning a first parameter of the first machine learning model such that a value output from the first machine learning model approaches correct answer information corresponding to the first input sentence; when an intermediate representation generated by inputting the first input sentence to the first machine learning model and a second input sentence in which the predetermined target is represented by a second named entity are input to a second machine learning model, learning the first parameter and a second parameter of the second machine learning model such that a value output from the second machine learning model approaches correct answer information corresponding to the second input sentence; 


inputting an input sentence to be an extraction target of a named entity to the first machine learning model in which the learned first parameter is set; and extracting a named entity from the input sentence based on a value output from the first machine learning model

wherein the learning of the first parameter and the second parameter includes changing the first parameter and the second parameter by decreasing loss acquired based on a difference between correct answer tag included in the correct answer information and probability distribution acquired based on the second input sentence.
1. A learning method to be executed by a computer, the learning method comprising: when a first input sentence in which a predetermined target is represented by a first named entity is input to a first machine learning model, learning a first parameter of the first machine learning model such that a value output from the first machine learning model approaches correct answer information corresponding to the first input sentence; and when an intermediate representation generated by inputting the first input sentence to the first machine learning model and a second input sentence in which the predetermined target is represented by a second named entity are input to a second machine learning model, learning the first parameter and a second parameter of the second machine learning model such that a value output from the second machine learning model approaches correct answer information corresponding to the second input sentence2. The learning method according to claim 1, further comprising: inputting an input sentence to be an extraction target of a named entity to the first machine learning model in which the learned first parameter is set; and extracting a named entity from the input sentence based on a value output from the first machine learning model.

(claim 1) wherein the learning of the first parameter and the second parameter includes changing the first parameter and the second parameter by decreasing loss acquired based on a difference between correct answer tag included in the correct answer information and probability distribution acquired based on the second input sentence.



	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over ZHANG et al. (Multitask Learning for Chinese Named Entity Recognition. © Springer Nature Switzerland AG 2018 R. Hong et al. (Eds.): PCM 2018, LNCS 11165, pp. 653–662, 2018. https://doi.org/10.1007/978-3-030-00767-6_60, previously cited) in view of DANG et al. (D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics, 34(20), 2018, 3539–3546 doi: 10.1093/bioinformatics/bty356. Advance Access Publication Date: 30 April 2018, previously cited).
Regarding claim 1, ZHANG teaches the learning method to be executed by a computer (abstract: a character-level tagger for Chinese NER, which incorporates multitask learning, self-attention and multi-step training methods to exploit richer features and further improve the model performance), the learning method comprising: 
when a first input sentence in which a predetermined target is represented by a first named entity is input to a first machine learning model, learning a first parameter of the first machine learning model such that a value output from the first machine learning model approaches correct answer information corresponding to the first input sentence ((p 655 § 3) BiLSTM-CRF model chosen as baseline model (p 656 § 3.1) Chinese sentence which is (p 656 § 3.2) encoded at the character level and decoded using CRF in order to learn parameters W and b); and 
((p 657 § 3.3) multi-task learning adds auxiliary task by predicting the next word of each character in the sentences) when an intermediate representation generated by inputting the first input sentence to the first machine learning model and a second input sentence are input to a second machine learning model learning the first parameter and a second parameter of the second machine learning model such that a value output from the second machine learning model approaches correct answer information corresponding to the second input sentence ((p 657 § 3.3) we combine the named entity recognition training target with the objective functions based on forward and backward prediction to a new multi-task learning goal … Self-attention will generate the feature representation of each Chinese character based on the sequence, which further improves the recognition accuracy of the entity boundary (p 658) Step I uses a simple BiLSTM-CRF model to identify the boundaries of entities; Step II uses the boundary results of Step I as an input, with a slightly expanded entity boundary(e.g. adding several characters before and after the detected entity boundary). In this way, model in Step II can learn more accurate position and entity-type information; note Results starting (page 669 § 4.3));

While ZHANG clearly has a first sentence and a second sentence (the first sentence padded with extra characters), ZHANG does not explicitly state the second sentence includes predetermined target is represented by a second named entity. 
ZHANG does suggest a need for using an alternative representation of the predetermined target because (p 654) former is difficult to determine the entity boundary due to abbreviation and entity nesting problems; and the latter is difficult because of the overlap between the conventional vocabulary (p 658) In named entity recognition task, the proportion of positive and negative samples is uneven, and the entity labels are sparse. In other words, only the first named entity for the predetermined target might have the entity labels needed for the initial training, while alternative forms of the predetermined target may not be labeled properly, may be nested, or may overlap with other terms leading to improper segmentation. The system of ZHANG attempts to deal with these issues by using a second version of the training sentence which has been padded to better mark word boundaries.
Further, while ZHANG uses a BiLSTM-CRF model, ZHANG does not describe the details of the training of the parameters, thus ZHANG cannot be relied upon to expressly disclose wherein the learning of the first parameter and the second parameter includes changing the first parameter and the second parameter by decreasing loss acquired based on a difference between correct answer tag included in the correct answer information and probability distribution acquired based on the second input sentence.
DANG describes (p 3539 Abstract) D3NER, a novel biomedical named entity recognition (NER) model using conditional random fields and bidirectional long short-term memory improved with fine-tuned embeddings of various linguistic information. 
DANG explains (p 3540 c 1, bottom) Biomedical literature uses a lot of abbreviations, of which many do not follow a standard convention and are only used locally within the scope of authors’ articles. For example, the BC 5 CDR and NCBI corpora contain more than 6000 and 4000 abbreviations for chemicals and diseases, respectively. This ambiguous abbreviation usage causes some system’s errors… because the full form of an abbreviation is often longer and much more complex than the abbreviation itself, such the replacement introduces more syntactical complexity to sentences, possibly causing more errors when the model has to label separated tokens (within the full form) rather than the syntactically concrete abbreviation as a whole.
In order to deal with this ambiguity, DANG states the improved system (p 3540 c 2, top) Apart from pretrained word/token embeddings and character-level word embeddings, D3NER incorporates abbreviation embeddings and Part-of-speech (POS) embeddings. Abbreviations are at least one example of a predetermined target is represented by a second named entity.
DANG describes the D3NER architecture (p 3541 c 1 § 2.3) as comprising four layers, namely TPAC embeddings, context representing biLSTM, project and NER layer, being structured in an architect as depicted in Figure 1.
DANG further explains that (§ 2.3.1) Except for the token and abbreviation embeddings, the two others are fine-tuned during the D3NER training by backpropagating gradients as the D3NER layers are stacked on top of each others, allowing the NER loss during the supervised training be used to make update to these embeddings.
As explained in the Response to Arguments section above, backpropagation is A common method of training a neural net in which the initial system output is compared to the desired output, and the system is adjusted until the difference between the two is minimized.
Thus, DANG teaches it was known to train a system using both the original (pre-trained) word/token embeddings and abbreviation embeddings (an example of paraphrasing) in order to obtain better recognition results by backpropagating the loss (the difference between the expected output and the obtained output).
As is clear, the training of a biLSTM layer may be accomplished using backpropagation of loss (e.g. a comparison between the probability predicted by the system and the correct answer) in order to minimize the loss of the trained system, which results in adjusting the various parameters (e.g. weights) of the connections between nodes, teaching (as recited) wherein the learning of the first parameter and the second parameter includes changing the first parameter and the second parameter by decreasing loss acquired based on a difference between correct answer tag included in the correct answer information and probability distribution acquired based on the second input sentence.
Accordingly, it would have been obvious to one having ordinary skill in the art of machine learning, having the teachings of ZHANG (training a named entity recognition system using two versions of the same sentence with the same named entity) and DANG (training a named entity recognition system using two versions of the same named entity) to have tried using the predetermined target is represented by a second named entity in the second sentence of ZHANG, motivated by the teaching in DANG for training a system with abbreviations in order to decrease recognition errors (note, a similar technique which has been tried with other existing NER systems (see DANG (p 3540 c 1, bottom)) which lists other previous attempts with replacing), where a biLSTM is trained using a known backpropagation method, with a reasonable expectation of success. 
Regarding dependent claim 2, incorporating the rejection of claim 1, ZHANG in view of DANG, combined at least for the reasons discussed above, further teaches inputting an input sentence to be an extraction target of a named entity to the first machine learning model in which the learned first parameter is set; and extracting a named entity from the input sentence based on a value output from the first machine learning model (ZHANG explains (p 653 § 1) that Named entity recognition (NER) … aims to identify entities from raw text, and classifies the detected entities into one of predefined categories such as person, organization, location, etc; see also the testing of the NER model after training in discussion of results starting page 659).
Regarding dependent claim 3, incorporating the rejection of claim 1, ZHANG in view of DANG, combined at least for the reasons discussed above, further teaches wherein the first machine learning model and the second machine learning model include a recurrent neural network (RNN), and the learning the first parameter, and the learning the first parameter and the second parameter include learning a parameter of the RNN (ZHANG: BiLSTM-CRF; per instant application [0027] The LSTM which is one of recurrent neural networks (RNN) will be described as an example).
Regarding dependent claim 4, incorporating the rejection of claim 1, ZHANG in view of DANG, combined at least for the reasons discussed above, further teaches wherein the predetermined target includes a compound, a person, and a company (DANG is specific for chemical entity recognition (thus a compound); ZHANG explains (p 653 § 1) that Named entity recognition (NER) is a challenging task in NLP, which aims to identify entities from raw text, and classifies the detected entities into one of predefined categories such as person, organization, location, etc).
Regarding claim 5, ZHANG in view of DANG, combined at least for the reasons discussed above, similarly teaches the learning method to be executed by a computer, the extraction [or learning] method comprising: the method steps of claim 1 and 2 above, rejected under similar rationale.
Regarding claims 6-10, ZHANG in view of DANG, combined at least for the reasons discussed above, similarly teaches the information processing apparatus, comprising: a memory; and a processor coupled to the memory (inherent structural components for any computer-implemented named entity recognition system which is capable of being trained and tested) and the processor configured to: perform the operations of claims 1-4, thus rejected under similar rationale.

It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).


CONCLUSION
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 11093714 B1 (BHATIA) FIG 5 shows general training process for a neural network; in background: The parameters of a NN can be set in a process referred to as training. For example, a NN-based model can be trained using training data that includes input data and the correct or preferred output of the model for the corresponding input data. The NN can repeatedly process the input data, and the parameters (e.g., the weight matrices) of the NN can be modified in what amounts to a trial-and-error process until the model produces (or “converges” on) the correct or preferred output. The modification of weight values may be performed through a process referred to as “back propagation.” Back propagation includes determining the difference between the expected model output and the obtained model output, and then determining how to modify the values of some or all parameters of the model to reduce the difference between the expected model output and the obtained model output.
PANCHENDRARAJAN et al. (2018) Bidirectional LSTM-CRF for Named Entity Recognition. See network architecture in FIG 1, learning illustration in FIG 2, Sections 2.2, 2.3 on pages 534-535 which explain parameters are weights calculated for sequence of words and how loss is calculated and minimized.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY M LEVY whose telephone number is (571)270-3771. The examiner can normally be reached Mon-Fri 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KIEU VU can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Amy M Levy/Primary Examiner, Art Unit 2173                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 See https://www.thefreedictionary.com/Back-propagation.