DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the
first inventor to file provisions of the AIA .
Specification
Applicant is reminded of the proper content of an abstract of the disclosure.
A patent abstract is a concise statement of the technical disclosure of the patent and should include that which is new in the art to which the invention pertains. The abstract should not refer to purported merits or speculative applications of the invention and should not compare the invention with the prior art.
If the patent is of a basic nature, the entire technical disclosure may be new in the art, and the abstract should be directed to the entire disclosure. If the patent is in the nature of an improvement in an old apparatus, process, product, or composition, the abstract should include the technical disclosure of the improvement. The abstract should also mention by way of example any preferred modifications or alternatives. 
Where applicable, the abstract should include the following: (1) if a machine or apparatus, its organization and operation; (2) if an article, its method of making; (3) if a chemical compound, its identity and use; (4) if a mixture, its ingredients; (5) if a process, the steps.
Extensive mechanical and design details of an apparatus should not be included in the abstract. The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words in length.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to
an abstract idea without significantly more. The claims recites various limitations that cover
performance of the limitations in the mind, but for the recitation of generic computer
components. Independent claims 1, 9, and 17 recite, “obtaining text data and determining a word or phrase to be marked in the text data; according to the word or phrase to be marked, constructing a first training sample of the text data corresponding to a word or phrase replacing task and a second training sample corresponding to a label marking task; and training a neural network model with a plurality of' the first training samples and a plurality of the second training samples, respectively, until a loss function of the word or phrase replacing task and a loss function of the label marking task satisfy a preset condition, to obtain the label marking model.”
	The limitations of “obtaining…”, “according…”, and “training…” in
its broadest reasonable interpretation covers mental processes. More specifically a human obtaining text data and choosing a word or phrase to be marked in the text; according to the word chosen constructing a first list with words or phrases that replace the word or phrase in the text and constructing a second list of labeling the word or phrase in the text data; training a 
head and hand, see MPEP 2106.04(a) (2) III. 
This judicial exception is not integrated into a practical application because the claims
recite additional elements of a “processor”, “storage”, “neural network model”, and “non-transitory computer-readable storage medium”. Furthermore, the specifications specifically in para. 96 indicates programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to send data and instructions to, a storage system, at least one input device, and at least one output 
device. Furthermore, para. 89 indicates the electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. These elements are used to perform the claimed method/steps and are recited at a high-level of generality using generic computer components as there are no indications on how the special purpose hardware (para. 0115) cannot be generic computer components from the devices listed. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
	The remaining dependent claims, 2-8, 10-16, and 18-20, serve to further describe the method, system, and non-transitory computer-readable storage medium more generally alluded to in the parent claims. These claims deal with construction of the training samples according to the tasks on how to develop the samples as to segmenting the text and assigning 
The claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception because as discussed above with respect to the
integrations of the abstract idea into a practical application. Further, elements containing models as their broadest reasonable interpretation cover mental processes as it serves to label information collected, analyzed, and or inputted. The claims are not patent eligible.
Viewed as a whole, these additional claim elements do not provide meaningful
limitations to transform the abstract idea into a patent eligible application of the abstract idea
such that the claims amount to significantly more than the abstract idea itself. Therefore claims
1-20 are rejected under 35 U.S.C 101 as being directed to non-statutory subject matter in the
form of an abstract idea without significantly more.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C.
102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness
rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35
U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims
the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b) (2) (C) for any potential 35 U.S.C. 102(a) (2) prior art against the later invention.

Claims 1 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (CN
109766523) in view of Patra et al. (US Pub No. 2020/0342055 A1) hereinafter Patra.
Regarding claim 1,  Zhang teaches a method for creating a label marking model (Para. 10 pg. 3 under description heading, The present invention provides a-of-speech tagging method, comprises a convolutional neural network CNN model, a lock rod circulation unit BGRU model, two-way of the length of the memory network BLSTM model and the condition with the airport CRF model), comprising:
obtaining text data and determining a word or phrase to be marked in the text data (Step S301 on para. 18 pg. 7, training corpus of sample data conversion i.e. obtains text data through conversion; Step S302 on para. 19 pg. 7, detecting whether the first input text includes a rare word as to tag as the purpose of the document);
according to the word or phrase to be marked, constructing a first training sample of the text data corresponding to a word or phrase replacing task and a second training sample corresponding to a label marking task (Para. 19 – 20 pg. 7, S302, rare word replaced by preset characters i.e. V2 second input text from S303; S303 V1 is first input text i.e. label marking); and 
Zhang fails to explicitly disclose: 
Training a neural network model with a plurality of the first training samples and a plurality of the second training samples, respectively, until a loss function of the word or phrase replacing task and a loss function of the label marking task satisfy a preset condition, to obtain the label marking model. 
Zhang teaches a method training a neural network model with a plurality of the first training samples and a plurality of the second training samples (Para. 25 pg. 7, updating of (Para. 25 - 29 pg. 7) to obtain a labeling model hence the title; however, it is silent whether the 3000 step and attenuation base is a preset condition to satisfy. 
In a related field of endeavor (e.g. technology useful for building NLP applications to implement text analysis see Para. 2), Patra discloses a method where the “training data comprises multiple inputs, each being referred to as sample in a set of samples. Each sample includes a value for each input neuron. A sample may be stored as a vector of input values” i.e. where the training samples represent a task respectively (see para. 114). Furthermore, the neural network includes a loss function where the arithmetic or geometric difference between correct and actual outputs may be measured as error according to a loss function, such that zero represents error free (i.e. completely accurate) behavior (see para. 121) in which training may cease when the error stabilizes (i.e. ceases to reduce) or vanishes beneath a threshold (i.e. approaches zero) (see para. 122).
Modifying Zhang’s neural network model training including both samples to include the features of Patra discloses:
Training a neural network model with a plurality of the first training samples and a plurality of the second training samples, respectively, until a loss function of the word or phrase 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Patra to the method of Zhang. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two disclosures, for example both using natural language processing techniques for textual analysis. Further, doing so would have provided the users of Zhang, with the added benefits of the model knowing when to cease training due to a loss function results being compared to a threshold i.e. a measure of accurate behavior (see. Para. 121 - 122) in which the model training may be supervised or unsupervised (see. Para. 123). 

Regarding claim 2, in addition to the elements stated above regarding claim 1, the combination (Zhang in view of Patra) further discloses: 
Zhang teaches obtaining part-of-speeches of words or phrases in the text data after performing word segmentation on the text data (Para. 1 – 2 pg. 9, principles can be associated with each other at the reference; Para. 6 pg. 7, participle to text to be separated form first input text i.e. performing word segmentation on text data where para. 8 pg. 3 under description points out that the rare word is limited to the noun part-of-speech hence part of speeches are 
taking a word or phrase whose part-of-speech belongs to a preset part-of-speech as the word or phrase to be marked (Para. 19 pg. 7, S302 rare word detection due to its noun part of speech is taken to be labeled in the model).

Regarding claim 3, in addition to the elements stated above regarding claim 1, the combination (Zhang in view of Patra) as reasoned above in the rejection of claim 1 does not make obvious: 
obtaining a substitute word or phrase corresponding to the word or phrase to be marked:
after replacing the word or phrase to be marked in the text data with the substitute word or phrase, taking a class of the substitute word or phrase as a replacement class marking result of a replacement text; and
taking the replacement text and the replacement class marking result corresponding to the replacement text as the first training sample.
Zhang discloses according to the word or phrase to be marked, constructing a first training sample of the text data corresponding to a word or phrase replacing task (Para. 19 – 20 pg. 7, S302, rare word replaced by preset characters i.e. V2 second input text from S303). While Zhang discloses constructing a first training sample corr. to a word or phrase replacing task according to the word or phrase to be marked, it is silent in which the preset characters are substitute words or phrases corresponding to the word or phrase to be marked and taking the 
In a related field of endeavor (e.g. technology useful for building NLP applications to implement text analysis see Para. 2), Patra additionally discloses a method where the “training data comprises multiple inputs, each being referred to as sample in a set of samples. Each sample includes a value for each input neuron. A sample may be stored as a vector of input values” i.e. where the training samples represent a task respectively (see para. 114). Furthermore, Patra teaches that words of interest may be named entities, such as names of persons, locations, companies, and the like see. Para. 2 i.e. nouns words or phrase to be marked for example. Para. 44, indicates that the candidate finder component gathers “NY,” “NYC,” “Big Apple,” “The City,” among others for entity linking from the redirects subgraph where an example of a redirect may be Wikipedia). Para 45, indicates that the different appearances of the entity, “New_York” are different surface forms and their categories or replacement class are taken as nodes or vertex explained further in para. 46 with Paris being a French or American city). Para. 44, indicates that a sample may be stored as a vector of input values i.e. where the training samples represent replacement text and replacement class marking results, see Figure 4A.
Modifying Zhang’s neural network model training including both samples to include the features of Patra discloses:
obtaining a substitute word or phrase corresponding to the word or phrase to be marked (e.g. The rare word detection limited to nouns and replaced to preset characters as 
after replacing the word or phrase to be marked in the text data with the substitute word or phrase, taking a class of the substitute word or phrase as a replacement class marking result of a replacement text (e.g. e.g. The rare word detection limited to nouns and replaced to preset characters as taught by Zhang now modified now replaced with the substitute word or phrase and taking the class of the replacement word or phrase of marked result as taught by Patra); and
taking the replacement text and the replacement class marking result corresponding to the replacement text as the first training sample (e.g. The rare word detection limited to nouns and replaced to preset characters as taught by Zhang represented as the first sample i.e. V2 now modified by the replacement text and the corresponding replacement class marking result as taught by Patra).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Patra to the method of Zhang. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two disclosures, for example both using natural language processing techniques for textual analysis. Further, doing so would have provided the users of Zhang, with the added benefits of the disclosed named entity disambiguation (NED) to reduce overhead training by not requiring large text corpus for effective training, also provide flexibility by facilitating application for different domains, and help to provide better quality of answer, performance, and accuracy even without large text mining, see para. 16. 


Regarding claim 4, in addition to the elements stated above regarding claim 1, the combination (Zhang in view of Patra) further discloses: 
obtaining a label word or phrase associated with the word or phrase to be marked (Para. 19 – 20 pg. 7, S302, Label word is itself as second input text is enabled to equal to the first input text as it obtains the first input text V1), and
taking the label word or phrase as a label marking result of the word or phrase to be marked (Para. 19 – 20 pg. 7, S302, rare word replaced by preset characters, if it is not, then the second input text is enabled to be equal to the first input text; S303 V1 is first input text i.e. label marking); and
taking the text data, the word or phrase to be marked and the label marking result corresponding to the word or phrase to be marked as the second training sample (Para. 19 – 20 pg. 7, S302, rare word replaced by preset characters, if it is not, then the second input text is enabled to be equal to the first input text; S303 V1 is first input text i.e. label marking i.e. second training sample as it differs from V2 which is the replacement).

Regarding claim 5, in addition to the elements stated above regarding claim 3, the combination (Zhang in view of Patra) further discloses: 
determining identification information of the word or phrase to be marked in a preset knowledge base (Patra teaches in Para. 44, an entity for “New York City” might appear also as “NY,” “NYC,” “Big Apple,” “The City,” among others. To help address this issue, according to an 
obtaining the substitute word or phrase in the preset knowledge base corresponding to the identification information (Para. 45, By building redirects graphs with DBpedia redirect links, such as redirects graph 400 of FIG. 4A, the candidate finder component 112 may match a surface form of a mention, such as an “NY” surface form 402 to a vertex 404 corresponding to the entity “NY” in the redirects graph, and then follow a redirect link or edge 406 from the vertex “NY” to a vertex 408 corresponding to the entity “New_York.”; e.g. the where the rare detected words to be marked as taught replaced by preset characters by Zhang now modified to be replaced with substitute words in the preset knowledge base corresponding to the identification information as taught by Patra).

Regarding claim 6, in addition to the elements stated above regarding claim 3, the combination (Zhang in view of Patra) further discloses: 
taking the replacement text as input, and taking the replacement class marking result corresponding to the replacement text as output, so that the neural network model is able to, according to the input replacement text, output a probability that the input replacement text belongs to a replacement class (Para. 22 and 27 pg. 7. Input text into BGRU, a type of neural network i.e. Bidirectional Gated Recurrent Unit NN, of word or phrase to be marked as input V2 

Regarding claim 7, in addition to the elements stated above regarding claim 4, the combination (Zhang in view of Patra) further discloses: 
taking the text data and the word or phrase to be marked as input, and taking the label marking result corresponding to the word or phrase to be marked as output, so that the neural network model is able to, according to the input text data and the word or phrase to be marked, output a probability that the label word or phrase belong to the label marking result of the word or phrase to be marked (Para. 21 and 27 pg. 7. Input text into CNN of word or phrase to be marked as input V1 and taking the label marking result corresponding to V1 and V1’ as output, so it then outputs a probability through Adam (adaptive moment estimation) in probability theory square that variable X obeys some distribution according to loss function to the gradient of each parameter and dynamic adjustments are directed to the learning rate of each parameter i.e. probability of input correlation with output V1 and V1’ respectively).

Regarding claim 8, in addition to the elements stated above regarding claim 1, the combination (Zhang in view of Patra) as reasoned above in the rejection of claim 1 does not make obvious: 

completing the training with the word or phrase replacing task based on the training samples in the plurality of the first training samples corresponding to the two subtasks.
Zhang discloses according to the word or phrase to be marked, constructing a first training sample of the text data corresponding to a word or phrase replacing task (Para. 19 – 20 pg. 7, S302, rare word replaced by preset characters i.e. V2 second input text from S303). While Zhang discloses constructing a first training sample it is silent in which it has subtasks of a label word or phrase replacing subtask and an appositive word or phrase replacing subtask; and completing the training with word or phrase replacing task now containing the label word or phrase replacing subtask and an appositive word or phrase replacing subtask. 
In a related field of endeavor (e.g. technology useful for building NLP applications to implement text analysis see Para. 2), Patra additionally discloses a method where the “training data comprises multiple inputs, each being referred to as sample in a set of samples. Each sample includes a value for each input neuron. A sample may be stored as a vector of input values” i.e. where the training samples represent a task respectively (see para. 114). Furthermore, Patra discloses pairwise similarities as depicted on figure 5A in which Indiana Pacers are connected through dots with Miami Heat i.e. relational words fitting a generic concept of basketball teams, where each of the mentions is associated with a set of one or more candidate entities i.e. examples of figures 4A and 4B with the recognized entities of figure 5A. Firstly, the mentions of entities are labeled and secondly pairwise similarities may contain one or more candidate entities, see para. 54.

dividing the word or phrase replacing task into a label word or phrase replacing subtask and an appositive word or phrase replacing subtask (e.g. the word or phrase replacing task as taught and represented by V2 by Zhang now modified to label word or phrase replacing subtask and an appositive word or phrase replacing subtask as taught by Patra, see para. 54); and
completing the training with the word or phrase replacing task based on the training samples in the plurality of the first training samples corresponding to the two subtasks (e.g. the neural network model trained and updated until error meets condition hence completion modified now to use the first training sample including the label word or phrase replacements and the appositive word or phrase replacement tasks as taught by Patra, see para. 54).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Patra to the method of Zhang. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two disclosures, for example both using natural language processing techniques for textual analysis. Further, doing so would have provided the users of Zhang, with the added benefits of the disclosed named entity disambiguation (NED) to reduce overhead training by not requiring large text corpus for effective training, also provide flexibility by facilitating application for different domains, and help to provide better quality of answer, performance, and accuracy even without large text mining, see para. 16. 

Claim 9, is directed to a system claim corresponding to the method claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1; however, combination of rejected claim 1 fails to disclose: 
at least one processor; and
a storage communicatively connected with the at least one processor; wherein the storage stores instructions executable by the at least one processor, and the instructions arc executed by the at least one processor to enable the at least one processor to perform a method for creating a label marking model. 
Zhang teaches a method for creating a label marking model, The present invention provides a-of-speech tagging method, comprises a convolutional neural network CNN model, a lock rod circulation unit BGRU model, two-way of the length of the memory network BLSTM model and the condition with the airport CRF model see Para. 10 pg. 3 under description heading. Where it involved a word or phrase to be marked, constructing a first training sample of the text data corresponding to a word or phrase replacing task and a second training sample corresponding to a label marking task, see Para. 19 – 20 pg. 7, S302, rare word replaced by preset characters i.e. V2 second input text from S303; S303 V1 is first input text i.e. label marking. While Zhang discloses a labeling system it is silent on the generic hardware components to perform the method. 
In a related field of endeavor (e.g. technology useful for building NLP applications to implement text analysis see Para. 2), Patra additionally discloses a method and computing devices, programs, and other computing elements in which the methods may be implemented, see para. 63, where the named entity disambiguation techniques results in improved 
Modifying Zhang’s labeling system of creating a neural network model training including both samples to include the features of Patra discloses:
at least one processor (e.g. the labeling system of neural network model training as taught by Zhang now modified to include a generic processor as taught by Patra, see para. 75); and
a storage communicatively connected with the at least one processor (e.g. the labeling system of creating a neural network model training as taught by Zhang now modified to include a generic processor to execute instructions stored in a memory i.e. storage as taught by Patra hence it communicates connectedly, see para. 75); 
wherein the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform a method for creating a label marking model (e.g. the labeling system of creating a 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Patra to the method of Zhang. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two disclosures, for example both using natural language processing techniques for textual analysis. Further, doing so would have provided the users of Zhang, with generic hardware to execute instructions set forth by a method as recognized by Patra, see para. 65 and 75-78. 

Claim 10, is directed to a system claim corresponding to the method claim presented in claim 2 and is rejected under the same grounds stated above regarding claim 2; in addition to combination set forth in rejected claim 9.

Claim 11, is directed to a system claim corresponding to the method claim presented in claim 3 and is rejected under the same grounds stated above regarding claim 3; in addition to combination set forth in rejected claim 9.

Claim 12, is directed to a system claim corresponding to the method claim presented in claim 4 and is rejected under the same grounds stated above regarding claim 4; in addition to combination set forth in rejected claim 9.

Claim 13, is directed to a system claim corresponding to the method claim presented in claim 5 and is rejected under the same grounds stated above regarding claim 5. 

Claim 14, is directed to a system claim corresponding to the method claim presented in claim 6 and is rejected under the same grounds stated above regarding claim 6. 

Claim 15, is directed to a system claim corresponding to the method claim presented in claim 7 and is rejected under the same grounds stated above regarding claim 7. 

Claim 16, is directed to a system claim corresponding to the method claim presented in claim 8 and is rejected under the same grounds stated above regarding claim 8; in addition to combination set forth in rejected claim 9.

Claim 17, is directed to a non-transitory computer-readable storage medium corresponding to the system claim presented in claim 9 and is rejected under the same grounds stated above regarding claim 9.

Claim 18, is directed to a non-transitory computer-readable storage medium corresponding to the method claim presented in claim 11 and is rejected under the same grounds stated above regarding claim 11; 

Claim 19, is directed to a non-transitory computer-readable storage medium corresponding to the method claim presented in claim 12 and is rejected under the same grounds stated above regarding claim 12.

Claim 20, is directed to a non-transitory computer-readable storage medium corresponding to the method claim presented in claim 14 and is rejected under the same grounds stated above regarding claim 14. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s
disclosure. 
	Fauceglia et al. (US 2018/0137404 A1) hereinafter Fau, discusses a system, method and computer program product for disambiguating one or more entity mentions (abstract). Local and global features may be modeled on convolutional neural networks and recurrent neural networks for entity linking, see figure 2. There are tasks present in which the network represents nodes of entity links as through Chelsea where it may be a soccer team or the city as shown on figure 5 and context is associated as mentioned in para. 9. 
	Glass et al. (US 2021/0004672 A1) discusses methods and systems for populating knowledge graphs (abstract). Figure 2 is a descriptive representation of the invention where attention is taken to certain words in a corpora and they are labeled within the process of inputting into a deep neural network, these attention words may be entities and undergo a process as described on para. 31. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN E AMAYA HERNANDEZ whose telephone number is (571)272-2484. The examiner can normally be reached Monday - Thursday 7:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for 




/J.E.A. /Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655