Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Compact Prosecution
Examiner would like to propose an examiner initiated interview to discuss a proposed amendment to overcome the current rejection. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hori et al (20180157743) in view of LI et al. (20200356851) 
Claim 1, Hori discloses a non-transitory computer readable storage medium comprising instructions that when executed by at least one processor, (Section 0019, lines 4-7 – thus processor and memory storing codes for algorithm modules for predicting labels based the extracted input data) cause a computing device to
 receive a source data comprising a word sequence arranged in one or more sentences; (Section 0019, lines1-4 receiving input data (source document)  from an input device-see lines 8-10, see fig. 5, El. 500)
classify, utilizing a machine-learning model (Classifiers Section 0007, lines 1-8)  the word sequence as comprising a definition for a term; (Section 0015, lines 1-3: annotating or labeling raw data such as text documents with a set of labels relevant to the content of the data) 
generate, utilizing the machine-learning model labels for words within the word sequence corresponding to the term and the definition; (Section 0016, lines 6-8 “a method of generating multi-relevant labels”)


    PNG
    media_image1.png
    675
    611
    media_image1.png
    Greyscale

Figure 1:The Label Predictor reads on the Label generator.

(The secondary reference Li also addresses this limitation of generating labels see Section 0059, lines 7-12 Labels are predicted at each level)

and extract the definition (Section 0043, lines 4-6, vocabulary) for the term  and definition from the source document based on classifying the word sequence and the labels for the words within the word sequence. (Section 0019, lines 2-4 and lines 8-13 the labels are generated based on the extracted input data (word sequence) from the source document)
Hori does not disclose clearly teaches that the input data is a document. 
Li discloses a deep multi-level labels learning technology that receives an input data as a source document. (Section 0087, lines 1-3 “the mentioned same document” is a source document). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of receiving an input data from a source document.  The motivation is it gives the system more options to extract data from. 
Claim 2, Hori in view of Li discloses that the non-transitory computer readable storage medium as recited in claim 1, further comprising instructions that, when executed by the at least one processor,  (Hori: Section 0033, lines 4-6 – thus algorithm modules stored into a memory or a storage as program code) cause the computing device to classify the word sequence utilizing sequence classification layers of the machine-learning model; (Hori: Section 0032, lines 1-3 label classification method) 
 and generate the labels for the words within the word sequence utilizing sequence labeling layers of the machine-learning model. (Hori: Section 0032, lines 3, label predictor reads on the label generator that generates labels based on the extracted input data) 
Claim 3, Hori in view of Li discloses that the non-transitory computer readable storage medium as recited in claim 1, further comprising instructions that, when executed by the at least one processor, (Hori: Section 0033, lines 4-6 – thus algorithm modules stored into a memory or a storage as program code)  cause the computing device to extract the definition for the term from the source document by extracting at least a first portion of the definition from a first sentence within the word sequence (Hori: Section 0027, lines 3 “data instance” reads on sentences extracted from the source document) and at least a second portion of the definition from a second sentence (Li: Section 0099, lines 6-8 “known sentence”) within the word sequence. (Hori: assigning multiple labels to each data instance is considered a multi-label classification where each data instance reads on the first and second portion or sentence)  
Claim 4, Hori in view of Li discloses that the non-transitory computer readable storage medium as recited in claim 1, further comprising instructions that, when executed by the at least one processor, (Hori: Section 0033, lines 4-6 – thus algorithm modules stored into a memory or a storage as program code) cause the computing device to generate utilizing the machine-learning model, (Classifiers Section 0007, lines 1-8) dependency-encoded-word vectors indicating a dependency path (Li: Section 0104-0105, hierarchy in the tree method read on the dependency path)  between the term and the definition within the word sequence utilizing a global dependency tree. (Hori: Section 0011, lines 1-3 thus inter-label inter dependency to generate labels for the extracted input data from the source document) 
Claim 5, Hori in view of Li discloses that the non-transitory computer readable storage medium as recited in claim 4, further comprising instructions that, when executed by the at least one processor, (Hori: Section 0033, lines 4-6 – thus algorithm modules stored into a memory or a storage as program code) cause the computing device to generate the dependency-encoded-word vectors (Li: Section 0082 thus the word embeddings) by:
 generating, utilizing a first set of encoding layers from the machine-learning model, (Hori: Classifiers Section 0007, lines 1-8) a word representation vector for a particular word in the word sequence based on a word embedding (Section 0033, lines 10-15- thus extracting a feature vector from an input vector including input data by a feature extractor, relevant vectors for each data instance which is any word extracted from the input source)  and a parts-of-speech embedding vector associated with the particular word; (Li: Section 0086, lines 3-4 “POS tags”) 
and generating, utilizing a second set of encoding layers from the machine-learning model, (Hori: Classifiers Section 0007, lines 1-8)  a dependency-encoded-word vector for the particular word based on the word representation vector for the particular word (Hori: Section 0043, lines 1-2-feature vector for input data instance) and word representation vectors for neighboring words in the global dependency tree. (Hori: Section 0043, lines 1-9- “Vocabulary size reads on the neighboring words) 
Claim 6, Hori in view of Li discloses further comprising instructions that, when executed by the at least one processor cause the computing device to generate utilizing the machine-learning model, (Hori: Classifiers Section 0007, lines 1-8) the labels for the words within the word sequence by determining for the particular word in the word sequence, (Section 0016, lines 6-8 “a method of generating multi-relevant labels”)
a feature vector by concatenating the word representation vector and the dependency-encoded-word vector of the particular word; (Hori: Section 0043, lines 1-6 feature vector from the input data instance includes the salient features for classification) 
and based on the feature vector determining a label for the particular word indicating that the particular word is part of the term, the definition a qualifier for the definition, or a non-definitional word. (Hori: Section 0043, lines 4-9- the feature vector also includes a sequence of length T or world Id numbers  which represents the vocabulary size where a label is selected or predicted)
Claim 7, Hori in view of Li  discloses further comprising instructions that, when executed by the at least one processor, cause the computing device to determine the label for the particular word by converting the feature vector into a score vector (Hori: Section 0034, lines 9-11 “scores or probabilities of labels” ) comprising a set of label scores corresponding to a set of possible labels for the particular word; (Hori: Section 0043, lines 2-8 input data instance or text document of word sequence of length T is converted to feature vector which is used for generating labels-Section 0016, lines 6-8- generating multi-relevant labels) and determining utilizing a conditional random field model, the label for the particular word according to the score vector. (Hori: Section 0034, lines 8-10 “selecting a label with the highest score or probability” means labels are generated based on the scores vector) 
Claim 8, Hori in view of Li   discloses further comprising instructions that, when executed by the at least one processor, cause the computing device to classify, utilizing the machine-learning model, (Hora: Section 0037, lines 1-2, multi-label classifier)  the word sequence as comprising the definition for the term by generating, utilizing a first set of sequence classification layers from the machine-learning model, (Hora: Section 0043, lines 2-8 input data instance or text document of word sequence of length T is converted to feature vector)
 a sequence representation vector for the word sequence by aggregating the dependency- encoded-word vectors for the words in the word sequence; (Hori: Section 0043, lines 1-6 feature vector from the input data instance includes the salient features for classification) 
and determining, utilizing a second set of sequence classification layers from the machine- learning model, (Hora: Section 0037, lines 1-2, multi-label classifier) the word sequence comprises the definition for the term based on the sequence representation vector. (Hora: Section 0043- “Vocabulary size”)
Claim 9, Hori in view of Li discloses further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the global dependency tree by parsing, utilizing natural language processing, a plurality of sentences of the word sequence to determine a plurality of dependency trees for the plurality of sentences; (Hora: Section 0007, lines 5-8- thus label inter-dependency  which means sports and baseball are related but sports and presidential election are not related)
and generating the global dependency tree by linking the plurality of dependency trees using a global root node. (Hora: Section 0036 inference  and label inter dependency reads on the global dependency because it is use to determine relationship between extracted text to be labeled) 
Claim 10, Hora discloses a system comprising: 
at least one computer memory device comprising a neural network (Section 0037, lines 1-2, multi-label classifier) and a source input comprising a word sequence arranged in a set of sentences; (Section 0019, lines1-4 receiving input data (source document)  from an input device-see lines 8-10)
and one or more servers (Section 0033, lines 4-6) configured to cause the system to generate dependency-encoded-word vectors indicating a dependency path between a term and a definition within the word sequence utilizing a global dependency tree; (Section 0011, lines 1-3 thus inter-label inter dependency to generate labels for the extracted input data from the source document)
classify, utilizing the neural network, (Section 0037, lines 1-2, multi-label classifier) the word sequence as comprising the definition for the term based on the dependency-encoded-word vectors; (Section 0043- “Vocabulary size” reads on the definition for the words) 
generate, utilizing the neural network, (label predictor 403 in Fig. 4) labels for words from the word sequence (Section 0019, lines 2-4 thus the input data reads on the words sequence) corresponding to the term and the definition based on the dependency-encoded-word vectors; Section 0016, lines 6-8 “a method of generating multi-relevant labels”)


    PNG
    media_image1.png
    675
    611
    media_image1.png
    Greyscale


and extract the definition for the term from the source document based on classifying the word sequence and the labels for the word sequence. (Section 0019, lines 2-4 and lines 8-13 the labels are generated based on the extracted input data (word sequence) from the source document)
Hori does not disclose clearly teaches that the input data is a document. 
Li discloses a deep multi-level labels learning technology that receives an input data as a source document. (Section 0087, lines 1-3 “the mentioned same document” is a source document). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of receiving an input data from a source document.  The motivation is it gives the system more options to extract data from. 
Claim 11, Hora in view of  Li discloses wherein the one or more servers are further configured to generate the dependency-encoded-word vectors utilizing encoding layers of the neural network; (Hora: Section 0011, lines 1-3 thus inter-label inter dependency)
classify the word sequence based on the dependency-encoded-word vectors utilizing sequence classification layers of the machine-learning model; (Hora: Section 0037, lines 1-2, multi-label classifier)
 and generate the labels for the words within the word sequence based on the dependency- encoded-word vectors utilizing sequence labeling layers of the machine-learning model. (Hora: Section 0007, lines 5-8- thus label inter-dependency  which means sports and baseball are related but sports and presidential election are not related)

Claim 12, Hora in view of Li discloses wherein the one or more servers are further configured to classify, utilizing the neural network, (Hora: Section 0037, lines 1-2, multi-label classifier) the word sequence as comprising the definition for the term by generating, utilizing max pooling, (Li: Section 0047 lines 6 “Max-pooling layer” ) a sequence representation vector for the word sequence by aggregating the dependency-encoded-word vectors for the words in the word sequence; (Hora: Section 0048, lines 1-5 Sequence of one hot vectors, where the sequence is represents words from the source document) 
 and determining, utilizing a feed forward network, that the word sequence comprises the definition for the term based on the sequence representation vector. (Hora: Section 0005, lines 1-3- thus forward LSTM layer processes the input data to generate the multi-labels) 
Claim 13, Hora in view of Li discloses wherein the one or more servers are further configured to generate the dependency-encoded-word vectors (Hora: Section 0045 the vector shown in Section 0044 reads on the encoded word vectors) by:
 generating, utilizing a bi-directional long-short-term-memory network from the neural network, (Li: Section 0043, lines 1-5 “BI-LSTM”) a word representation vector for a particular word in the word sequence based on a word embedding (Section 0033, lines 10-15- thus extracting a feature vector from an input vector including input data by a feature extractor, relevant vectors for each data instance which is any word extracted from the input source) and a parts-of-speech embedding vector associated with the particular word; (Li: Section 0086, lines 3-4 “POS tags”) 
and generating, utilizing a graph convolutional network from the neural network, (Li: Section 0049, lines 2-4 first convolutional neural network for word and topic feature extraction) a dependency-encoded-word vector for the particular word based on the word representation vector for the particular word and word representation vectors for neighboring words in the global dependency tree. (Section 0011, lines 1-3 thus inter-label inter dependency to generate labels for the extracted input data from the source document)

Claim 14, Hora in view of Li  discloses wherein the one or more servers are further configured to determine a lowest common ancestor of the term and the definition from the global dependency tree; (Hora: Section 0011, lines 1-3 thus inter-label inter dependency to generate labels for the extracted input data from the source document)
and determine a dependency path associated with the term and the definition according to the lowest common ancestor of the term and the definition. (Li: Section 0069) 

    PNG
    media_image2.png
    558
    904
    media_image2.png
    Greyscale

Figure 2: The hierarchical relations shows the dependency tree

Claim 15, Hora in view of Li  discloses wherein the one or more servers are further configured to determine a latent sequence label based on a sequence representation vector for the word sequence and a latent term-definition label based on word representation vectors for the term and the definition; (Hora: Section 0065, lines 1-5 the hidden activation vector is obtained using the LSTM functions vectors are converted to an M-dimensional one-hot vector, where the hidden activation vector reads on the Latent term definition) 
and determine that a sequence representation vector is semantically consistent with word representation vectors for the term and the definition in response to determining that the latent sequence label is equal to the latent term-definition label. (Hora: Section 0097, lines 3-5 thus the true positives reads on where there is equal vectors representation for the term  and definition label) 
Claim 16, Hora in view of Li discloses wherein the one or more servers are further configured to generate a classification probability distribution associated with classifying the word sequence as comprising the definition for the term;  (Hora: Section 0061-0062 the relevance probability reads on the classification probability because it show how accurate the prediction) 
determine a classification loss utilizing a classification loss function according to the classification probability distribution; (Hora: Section 0034 computing a relevance score or probability of each label in a pre-defined label set to the feature vector and the previously generated label)
generate a labeling probability distribution associated with generating the labels for the words within the word sequence; (Hora: Section 0007, lines 5-8- thus label inter-dependency)
and determine a labeling loss utilizing a labeling loss function based on the labeling probability distribution. (Hora: Section 0068- thus the cross entropy reads on the loss function) 
Claim 17, Hora in view of Li discloses wherein the one or more servers are further configured to generate a dependency probability distribution associated with the dependency path between the term and the definition within the word sequence; (Hora: Section 0061-0062 the relevance probability reads on the classification probability because it show how accurate the prediction) 

 determine a dependency loss utilizing a dependency loss function based on the dependency probability distribution; (Hora: Section 0068- thus the cross entropy reads on the loss function) 
 and jointly learn parameters of a plurality of layers of the neural network based on a joint loss comprising the dependency loss, the classification loss, and the labeling loss. (Hora: Section 0068- thus the cross entropy reads on the loss function) 
Claim 18, Hora in view of Li discloses wherein the one or more servers are further configured to determine the joint loss by applying a first weight to the dependency loss, (Li: Section 0046 lines 1-2 “Loss function L”) a second weight to the classification loss, (Hora: Section 0088, lines 1-2 Cross entropy) and a third weight to the labeling loss. (Li: Section 0045, “Binary cross entropy loss” reads on the weight of the label loss)
Claim 19, Hora discloses a method comprising receiving by at least one processor, a source input comprising a word sequence arranged in one or more sentences; (Section 0019, lines1-4 receiving input data (source document)  from an input device-see lines 8-10)
 performing a step for jointly determining a sequence classification for the word sequence and labeling for words within the word sequence; (Section 0015, lines 1-3: annotating or labeling raw data such as text documents with a set of labels relevant to the content of the data) 

and extracting, by the at least one processor, (Section 0033, lines 4-6 – thus algorithm modules stored into a memory or a storage as program code) a definition for a term within the word sequence from the source document based on the sequence classification for the word sequence and the labeling for the words within the word sequence. (Section 0019, lines 2-4 and lines 8-13 the labels are generated based on the extracted input data (word sequence) from the source document)
Hori does not disclose clearly teaches that the input data is a document. 
Li discloses a deep multi-level labels learning technology that receives an input data as a source document. (Section 0087, lines 1-3 “the mentioned same document” is a source document). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of receiving an input data from a source document.  The motivation is it gives the system more options to extract data from. 

Claim 20, Hora in view of Li discloses wherein extracting the definition for the term within the word sequence from the source document comprises extracting at least a first portion of the definition from a first sentence within the word sequence (Hori: Section 0027, lines 3 “data instance” reads on sentences extracted from the source document)  and at least a second portion of the definition from a second sentence within the word sequence. (Hori: assigning multiple labels to each data instance is considered a multi-label classification where each data instance reads on the first and second portion or sentence)  



	Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Amrite (US10853580) discloses a first text classifier may be custom built for a first data set that is related to particular subject matter or that is owned or curated by a particular entity, and a second text classifier may be custom built for a second data set that is related to different subject matter or that is owned or curated by a different entity. The custom nature of such text classifiers is due, for example, to the different interests and emphases of the entities that use the text classifiers or data sets, due to intrinsic differences in the data sets, or both.
Hashimoto (US20180121799)  teaches a Multiple levels of linguistic representation are used in a variety of ways in the field of Natural Language Processing (NLP). For example, part-of-speech (POS) tags are applied by syntactic parsers. The POS tags improve higher-level tasks, such as natural language inference, relation classification, sentiment analysis, or machine translation. However, higher level tasks are not usually able to improve lower level tasks, often because systems are unidirectional pipelines and not trained end-to-end.
	
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





	/AKWASI M SARPONG/           Primary  Examiner, Art Unit 2675                                                                                                                                                                                                            09/26/2022