DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(1) because they include the following reference characters that are not oriented in the same direction as the view: “Deployment phase 210” in figure 2 and “412” in Figure 4.
The drawings are objected to because:
In Figure 1, element 102 labeled “receive at least one existing knowledge graph” is inconsistent with paragraph 0056, lines 3-4, “Method 100 comprises receiving 102 a first text document.”.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities:
In paragraph 0003, line 1, “against for a” should read “against a”.
In paragraph 0042, line 6, “be presented” should read “to be presented”.
In paragraph 0044, line 4, “Grated Recurrent Unit” should read “Gated Recurrent Unit”.
In paragraph 0074, line 16, “computing” should read “Computing”.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6 and 13 – 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6 recites the limitation "the group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system" in lines 2-3.  There is insufficient antecedent basis for this limitation in the claim.  For examination purposes, the limitation "the group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system" will be interpreted as “a group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system".
Claim 13 recites the limitation "program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities, wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the first entities and the first edges are used as first training data" in lines 11-14.  This limitation is indefinite because it is not clear how embedding vectors of “the first edges” can be used as training data to “predict first edges”.  For examination purposes, the limitation "program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities, wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the first entities and the first edges are used as first training data" will be interpreted as "program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities, wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the existing entities and the existing edges are used as first training data".
Claims 14 – 19 depend from claim 13, and thus recite the limitations of claim 13, and do not resolve the indefinite language from claim 13.  Claim 16 also recites the limitation "the group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system" in lines 2-3.  There is insufficient antecedent basis for this limitation in the claim.  For examination purposes, the limitation "the group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system" will be interpreted as “a group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system".
Claim 20 recites the limitation "program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities, wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the first entities and the first edges are used as first training data" in lines 9-12.  This limitation is indefinite because it is not clear how embedding vectors of “the first edges” can be used as training data to “predict first edges”.  For examination purposes, the limitation "program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities, wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the first entities and the first edges are used as first training data" will be interpreted as "program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities, wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the existing entities and the existing edges are used as first training data".
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 4, 6 – 7, 9, 12 – 13, 15 – 16 and 19 – 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Pradhan et al. (“Knowledge Graph Generation with Deep Active Learning”), hereinafter Pradhan.
Regarding claim 1, Pradhan discloses a computer-implemented method for building a new knowledge graph, the method comprising:
receiving a first text document (Section I, lines 36-37, "The NER model was trained on publicly available CoNLL 2003 NER [6] dataset"; Section IV-A, lines 1-4, "CoNLL-2003: The CoNLL-2003 [6] dataset has 4 types of entities namely PER, ORG, LOC, and MISC. The details regarding the number of tokens and sentences is shown in Table I.");
training a first machine-learning system to develop a first prediction model adapted to predict first entities in the first text document (Section I, lines 27-29, "Named Entity Recognition is the task of identifying the entities and classifying them into their types."; Section I, lines 33-35, "A trainable NER model was used to make the system domain-independent and deep active learning was used to reduce the data required for domain-specific entity recognition.");
wherein labelled entities from the first text document are used as first training data (Section IV-A, lines 1-4, "CoNLL-2003: The CoNLL-2003 [6] dataset has 4 types of entities namely PER, ORG, LOC, and MISC. The details regarding the number of tokens and sentences is shown in Table I."; Section V, lines 62-66, "The training batch size was set as 8, to increase the number of back propagation steps during each cycle of active learning, as the training starts with a small number of labelled sentences, with small addition of new labelled data.”);
training a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities (Section I, lines 44-47, "Open Relation Extraction (OpenRE) [8] is the task of identifying the word or the sequence of words, which describes the relationships between the entity pairs, in a sentence."; Section I, lines 61-63, "Similar to NER module, a trainable OpenRE module was created to keep the module domain independent and easily adaptable to deep active learning."; The relationships between the entity pairs reads on the edges between the entities.),
wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the existing entities and the existing edges are used as second training data (Section IV-B, lines 1-5, "TACRED Dataset [11] was used for OpenRE system. TACRED Dataset was automatically annotated using OpenIE 5.0 [36] [37] [38] [39]. Only triplets from OpenIE 5.0 having a named entity in both the arguments were used to create the dataset for training OpenRE module."; Section VI, lines 29-33, "Supervised OpenIE model requires more sentences than BERT to attain similar performance. One of the main reason for this can be the language model learnt by the base BERT model as compared to the use of GLOVE embeddings in Supervised OpenIE."; The triplets from OpenIE 5.0 read on the existing knowledge graph, and the GLOVE embeddings in Supervised OpenIE read on the embedding vectors.);
receiving a set of second text documents (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; The text reads on a set of second text documents.);
determining second embedding vectors from text segments from the set of second text documents (Section III-C, lines 6-8, "We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features.");
predicting second entities in the set of second text documents by using the set of second text documents and the second embedding vectors as inputs for the first trained machine-learning model (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; Section III-C, lines 5-8, "The architecture diagram of our NER system is shown in Figure 3. We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features."; Identifying domain specific named entities from the text reads on predicting second entities in the set of second text documents.);
predicting second edges in the set of second text documents by using the second entities and associated embedding vectors of the second entities as input for the second trained machine-learning model (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; Section III-E, lines 4-8, "Our OpenRE task can be considered as a sequence labelling task. We experimented two different architectures for the above sequence labelling task, (i) inspired by Supervised Open Information Extraction [20], and (ii) BERT where it was modelled as an NER task."; Section VI, lines 29-33, "Supervised OpenIE model requires more sentences than BERT to attain similar performance. One of the main reason for this can be the language model learnt by the base BERT model as compared to the use of GLOVE embeddings in Supervised OpenIE."; Identifying open relations from the text reads on predicting second edges in the set of second text documents.);
and building triplets of the second entities and the related second edges to build a new knowledge graph (Abstract, lines 6-10, "The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification)."; Section I, lines 61-63, "To identify the direction of relationship in the sentence, the entities were classified as either entity 1 or entity 2, where the output triplet was in the form of Entity 1 → Relation → Entity 2.").
Regarding claim 4, Pradhan discloses the computer-implemented method as claimed in claim 1.
Pradhan further discloses:
wherein the first machine-learning system and the second machine-learning system are trained using a supervised machine-learning method (Section V, lines 62-66, "The training batch size was set as 8, to increase the number of back propagation steps during each cycle of active learning, as the training starts with a small number of labelled sentences, with small addition of new labelled data.";  Section III-E, lines 4-8, "Our OpenRE task can be considered as a sequence labelling task. We experimented two different architectures for the above sequence labelling task, (i) inspired by Supervised Open Information Extraction [20], and (ii) BERT where it was modelled as an NER task."; Training the Named Entity Recognition model with labelled data reads on training the first machine-learning system using a supervised machine-learning method, and using Supervised Open Information Extraction for the Relation Identification model reads on training the second machine-learning system using a supervised machine-learning method.).
Regarding claim 6, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses the computer-implemented method as claimed in claim 1.
Pradhan further discloses:
wherein the second machine-learning system is selected from the group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system (Abstract, lines 6-14, "The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification). The system uses deep active learning to calculate confidence scores using maximum normalized log-probability on each prediction for both NER, and relation identification. We experimented with both LSTM and transformer based models for NER and relation identification tasks."; The relation identification reads on the second machine-learning system, and the long short-term memory (LSTM) model is an example of a neural network.).
Regarding claim 7, Pradhan discloses the computer-implemented method as claimed in claim 1.
Pradhan further discloses:
wherein an entity of the second entities is of an entity type (Section I, lines 24-29, "Automating knowledge graph extraction involves following two major tasks: (i) Named Entity Extraction and (ii) Relation Identification. a) Named Entity Recognition: Named Entity Recognition is the task of identifying the entities and classifying them into their types.").
Regarding claim 9, Pradhan discloses the computer-implemented method as claimed in claim 1.
Pradhan further discloses:
wherein the first document is a plurality of documents (Section I, lines 36-37, "The NER model was trained on publicly available CoNLL 2003 NER [6] dataset"; Section IV-A, lines 1-4, "CoNLL-2003: The CoNLL-2003 [6] dataset has 4 types of entities namely PER, ORG, LOC, and MISC. The details regarding the number of tokens and sentences is shown in Table I.").
Regarding claim 12, Pradhan discloses the computer-implemented method as claimed in claim 1.
Pradhan further discloses:
wherein, as input for the training of the first machine-learning model, determined first embedding vectors of the labelled entities are used as training data (Section III-C, lines 2-8, "Named entity recognition is a very important and challenging task. NER is one of the most important component of knowledge graph extraction learning system. The architecture diagram of our NER system is shown in Figure 3. We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features."; Section V, lines 62-66, "The training batch size was set as 8, to increase the number of back propagation steps during each cycle of active learning, as the training starts with a small number of labelled sentences, with small addition of new labelled data.").
Regarding claim 13, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses a knowledge graph construction system for building a knowledge graph, the knowledge graph construction system comprising:
one or more computer processors; one or more computer readable storage media; program instructions stored on the computer readable storage media for execution by at least one of the one or more processors (Abstract, lines 6-10, "The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification)."; Section V, lines 1-4, “The two different training methods of active learning were adopted during the training process: (i) training the model from scratch in every iteration, and (ii) fine tuning the model trained during the previous cycle.”; Section VI, lines 1-3, “The LSTM-CNN based NER model described in Section III was able to achieve 88.71% F1 score when trained tested on CoNLL dataset.”; Performing training and testing demonstrates the use of a processor executing instructions stored in memory to implement the system to extract a knowledge graph.),
the program instructions comprising:
program instructions to receive a first text document (Section I, lines 36-37, "The NER model was trained on publicly available CoNLL 2003 NER [6] dataset"; Section IV-A, lines 1-4, "CoNLL-2003: The CoNLL-2003 [6] dataset has 4 types of entities namely PER, ORG, LOC, and MISC. The details regarding the number of tokens and sentences is shown in Table I.");
program instructions to train a first machine-learning system to develop a first prediction model adapted to predict first entities in the first text document (Section I, lines 27-29, "Named Entity Recognition is the task of identifying the entities and classifying them into their types."; Section I, lines 33-35, "A trainable NER model was used to make the system domain-independent and deep active learning was used to reduce the data required for domain-specific entity recognition.");
wherein labelled entities from the first text document are used as training data (Section IV-A, lines 1-4, "CoNLL-2003: The CoNLL-2003 [6] dataset has 4 types of entities namely PER, ORG, LOC, and MISC. The details regarding the number of tokens and sentences is shown in Table I."; Section V, lines 62-66, "The training batch size was set as 8, to increase the number of back propagation steps during each cycle of active learning, as the training starts with a small number of labelled sentences, with small addition of new labelled data.”);
program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities (Section I, lines 44-47, "Open Relation Extraction (OpenRE) [8] is the task of identifying the word or the sequence of words, which describes the relationships between the entity pairs, in a sentence."; Section I, lines 61-63, "Similar to NER module, a trainable OpenRE module was created to keep the module domain independent and easily adaptable to deep active learning."; The relationships between the entity pairs reads on the edges between the entities.),
wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the first entities and the first edges are used as first training data (Section IV-B, lines 1-5, "TACRED Dataset [11] was used for OpenRE system. TACRED Dataset was automatically annotated using OpenIE 5.0 [36] [37] [38] [39]. Only triplets from OpenIE 5.0 having a named entity in both the arguments were used to create the dataset for training OpenRE module."; Section VI, lines 29-33, "Supervised OpenIE model requires more sentences than BERT to attain similar performance. One of the main reason for this can be the language model learnt by the base BERT model as compared to the use of GLOVE embeddings in Supervised OpenIE."; The triplets from OpenIE 5.0 read on the existing knowledge graph, and the GLOVE embeddings in Supervised OpenIE read on the embedding vectors.);
program instructions to receive a set of second text documents (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; The text reads on a set of second text documents.);
program instructions to determine second embedding vectors from text segments from the set of second text documents (Section III-C, lines 6-8, "We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features.");
program instructions to predict second entities in the set of second text documents by using the set of second text documents and the second embedding vectors as inputs for the first trained machine-learning model (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; Section III-C, lines 5-8, "The architecture diagram of our NER system is shown in Figure 3. We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features."; Identifying domain specific named entities from the text reads on predicting second entities in the set of second text documents.);
program instructions to predict second edges in the set of second text documents by using the second entities and associated embedding vectors of the second entities as inputs for the second trained machine-learning model (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; Section III-E, lines 4-8, "Our OpenRE task can be considered as a sequence labelling task. We experimented two different architectures for the above sequence labelling task, (i) inspired by Supervised Open Information Extraction [20], and (ii) BERT where it was modelled as an NER task."; Section VI, lines 29-33, "Supervised OpenIE model requires more sentences than BERT to attain similar performance. One of the main reason for this can be the language model learnt by the base BERT model as compared to the use of GLOVE embeddings in Supervised OpenIE."; Identifying open relations from the text reads on predicting second edges in the set of second text documents.);
and program instructions to build triplets of the second entities and the related second edges to build a new knowledge graph (Abstract, lines 6-10, "The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification)."; Section I, lines 61-63, "To identify the direction of relationship in the sentence, the entities were classified as either entity 1 or entity 2, where the output triplet was in the form of Entity 1 → Relation → Entity 2.").
Regarding claim 15, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses the knowledge graph construction system as claimed in claim 13.
Pradhan further discloses:
wherein the first machine-learning system and the second machine-learning system are trained using a supervised machine-learning method (Section V, lines 62-66, "The training batch size was set as 8, to increase the number of back propagation steps during each cycle of active learning, as the training starts with a small number of labelled sentences, with small addition of new labelled data.";  Section III-E, lines 4-8, "Our OpenRE task can be considered as a sequence labelling task. We experimented two different architectures for the above sequence labelling task, (i) inspired by Supervised Open Information Extraction [20], and (ii) BERT where it was modelled as an NER task."; Training the Named Entity Recognition model with labelled data reads on training the first machine-learning system using a supervised machine-learning method, and using Supervised Open Information Extraction for the Relation Identification model reads on training the second machine-learning system using a supervised machine-learning method.).
Regarding claim 16, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses the knowledge graph construction system as claimed in claim 13.
Pradhan further discloses:
wherein the second machine-learning system is selected from the group consisting of a neural network system, a reinforcement learning system, and a sequence-to-sequence machine-learning system (Abstract, lines 6-14, "The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification). The system uses deep active learning to calculate confidence scores using maximum normalized log-probability on each prediction for both NER, and relation identification. We experimented with both LSTM and transformer based models for NER and relation identification tasks."; The relation identification reads on the second machine-learning system, and the long short-term memory (LSTM) model is an example of a neural network.).
Regarding claim 19, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses the knowledge graph construction system as claimed in claim 13.
Pradhan further discloses:
wherein, as input for the training of the first machine-learning model, determined first embedding vectors of the labelled entities are used (Section III-C, lines 2-8, "Named entity recognition is a very important and challenging task. NER is one of the most important component of knowledge graph extraction learning system. The architecture diagram of our NER system is shown in Figure 3. We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features."; Section V, lines 62-66, "The training batch size was set as 8, to increase the number of back propagation steps during each cycle of active learning, as the training starts with a small number of labelled sentences, with small addition of new labelled data.").
Regarding claim 20, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses a computer program product for building a knowledge graph, the computer program product comprising:
one or more computer readable storage media and program instructions stored on the one or more computer readable storage media (Abstract, lines 6-10, "The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification)."; Section V, lines 1-4, “The two different training methods of active learning were adopted during the training process: (i) training the model from scratch in every iteration, and (ii) fine tuning the model trained during the previous cycle.”; Section VI, lines 1-3, “The LSTM-CNN based NER model described in Section III was able to achieve 88.71% F1 score when trained tested on CoNLL dataset.”; Performing training and testing demonstrates the use of a processor executing instructions stored in memory to implement the system to extract a knowledge graph.),
the program instructions comprising:
program instructions to receive a first text document (Section I, lines 36-37, "The NER model was trained on publicly available CoNLL 2003 NER [6] dataset"; Section IV-A, lines 1-4, "CoNLL-2003: The CoNLL-2003 [6] dataset has 4 types of entities namely PER, ORG, LOC, and MISC. The details regarding the number of tokens and sentences is shown in Table I.");
program instructions to train a first machine-learning system to develop a first prediction model adapted to predict first entities in the first text document (Section I, lines 27-29, "Named Entity Recognition is the task of identifying the entities and classifying them into their types."; Section I, lines 33-35, "A trainable NER model was used to make the system domain-independent and deep active learning was used to reduce the data required for domain-specific entity recognition.");
wherein labelled entities from the first text document are used as training data (Section IV-A, lines 1-4, "CoNLL-2003: The CoNLL-2003 [6] dataset has 4 types of entities namely PER, ORG, LOC, and MISC. The details regarding the number of tokens and sentences is shown in Table I."; Section V, lines 62-66, "The training batch size was set as 8, to increase the number of back propagation steps during each cycle of active learning, as the training starts with a small number of labelled sentences, with small addition of new labelled data.”);
program instructions to train a second machine-learning system to develop a second prediction model adapted to predict first edges between the first entities (Section I, lines 44-47, "Open Relation Extraction (OpenRE) [8] is the task of identifying the word or the sequence of words, which describes the relationships between the entity pairs, in a sentence."; Section I, lines 61-63, "Similar to NER module, a trainable OpenRE module was created to keep the module domain independent and easily adaptable to deep active learning."; The relationships between the entity pairs reads on the edges between the entities.),
wherein existing entities and existing edges of an existing knowledge graph and determined first embedding vectors of the first entities and the first edges are used as first training data (Section IV-B, lines 1-5, "TACRED Dataset [11] was used for OpenRE system. TACRED Dataset was automatically annotated using OpenIE 5.0 [36] [37] [38] [39]. Only triplets from OpenIE 5.0 having a named entity in both the arguments were used to create the dataset for training OpenRE module."; Section VI, lines 29-33, "Supervised OpenIE model requires more sentences than BERT to attain similar performance. One of the main reason for this can be the language model learnt by the base BERT model as compared to the use of GLOVE embeddings in Supervised OpenIE."; The triplets from OpenIE 5.0 read on the existing knowledge graph, and the GLOVE embeddings in Supervised OpenIE read on the embedding vectors.);
program instructions to receive a set of second text documents (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; The text reads on a set of second text documents.);
program instructions to determine second embedding vectors from text segments from the set of second text documents (Section III-C, lines 6-8, "We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features.");
program instructions to predict second entities in the set of second text documents by using the set of second text documents and the second embedding vectors as inputs for the first trained machine-learning model (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; Section III-C, lines 5-8, "The architecture diagram of our NER system is shown in Figure 3. We used Glove [29] embedding and character level CNN to encode the characters in the text to get combined embedding having character-level and word-level features."; Identifying domain specific named entities from the text reads on predicting second entities in the set of second text documents.);
program instructions to predict second edges in the set of second text documents by using the second entities and associated embedding vectors of the second entities as inputs for the second trained machine-learning model (Section III, lines 3-8, "Our knowledge graph extraction learning system has following components: (i) Named Entity Extraction: Identifying domain specific named entity from the text, and (ii) Relation Identification: Identifying open relations from the text and classifying relations into one of the existing knowledge graph population (KGP) relations."; Section III-E, lines 4-8, "Our OpenRE task can be considered as a sequence labelling task. We experimented two different architectures for the above sequence labelling task, (i) inspired by Supervised Open Information Extraction [20], and (ii) BERT where it was modelled as an NER task."; Section VI, lines 29-33, "Supervised OpenIE model requires more sentences than BERT to attain similar performance. One of the main reason for this can be the language model learnt by the base BERT model as compared to the use of GLOVE embeddings in Supervised OpenIE."; Identifying open relations from the text reads on predicting second edges in the set of second text documents.);
and program instructions to build triplets of the second entities and the related second edges to build a new knowledge graph (Abstract, lines 6-10, "The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification)."; Section I, lines 61-63, "To identify the direction of relationship in the sentence, the entities were classified as either entity 1 or entity 2, where the output triplet was in the form of Entity 1 → Relation → Entity 2.").
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2 – 3, 10 – 11, 14 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Pradhan in view of Song et al. ("Building and Querying an Enterprise Knowledge Graph"), hereinafter Song.
Regarding claim 2, Pradhan discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose further comprising: responsive to a second entity having a confidence level value below a predetermined entity threshold value, removing the second entity from the second entities.
Song teaches:
responsive to a second entity having a confidence level value below a predetermined entity threshold value, removing the second entity from the second entities (Paragraph 0035, lines 1-4, "The present disclosure provides systems and methods for generating, maintaining, and using a knowledge graph for an enterprise using multiple mining methods and systems"; Paragraph 0195, lines 23-25, "The probability distribution for the entity may then exceed a threshold and the new entity can become established."; Establishing an entity when the probability distribution for the entity exceeds a threshold reads on removing an entity having a confidence level value below a predetermined entity threshold value.).
Song teaches establishing an entity when the probability distribution for the entity exceeds a threshold in order to generate a knowledge graph with increased flexibility and coverage of information (Paragraph 0004, lines 1-7, "Systems and methods are disclosed for enterprise knowledge graph mining using multiple toolkits and entity annotations with neural entity recognition. The use of multiple toolkits for an enterprise knowledge graph mining allows for more flexibility and coverage of information, as different technologies may tend to specialize on different types of entities based on the same source content").
Pradhan and Song are considered to be analogous to the claimed invention because they are in the same field of automatic knowledge graph generation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Song to establish an entity when the probability distribution for the entity exceeds a threshold.  Doing so would allow for generating a knowledge graph with increased flexibility and coverage of information.
Regarding claim 3, Pradhan discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose further comprising: responsive to a second edge having a confidence level value below a predetermined edge threshold value, removing the second edge from the second edges.
Song teaches:
responsive to a second edge having a confidence level value below a predetermined edge threshold value, removing the second edge from the second edges (Section 3.2, lines 56-59, "For each pair of entities, our system may extract multiple relationships; only those relationships with a confidence score above a pre-defined threshold are then added to our knowledge graph.").
Song teaches adding relationships with a confidence score above a pre-defined threshold to a knowledge graph in order to generate a knowledge graph with increased flexibility and coverage of information (Paragraph 0004, lines 1-7, "Systems and methods are disclosed for enterprise knowledge graph mining using multiple toolkits and entity annotations with neural entity recognition. The use of multiple toolkits for an enterprise knowledge graph mining allows for more flexibility and coverage of information, as different technologies may tend to specialize on different types of entities based on the same source content").
Pradhan and Song are considered to be analogous to the claimed invention because they are in the same field of automatic knowledge graph generation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Song to add relationships with a confidence score above a pre-defined threshold to a knowledge graph.  Doing so would allow for generating a knowledge graph with increased flexibility and coverage of information.
Regarding claim 10, Pradhan discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose further comprising: storing provenance data to a document of the set of second text documents for the second entities and the second edges together with the triplets.
Song teaches:
storing provenance data to a document of the set of second text documents for the second entities and the second edges together with the triplets (Section 4.2, lines 13-19, "In addition to the three basic elements in a triple (i.e., subject, predicate and object), a fourth element can also be added, turning a triple to a quad. This fourth element is generally used to provide provenance information of the triple, such as its source [8] and trustworthiness [9]. Such provenance information can be used to evaluate the quality of a triple.").
Song teaches storing provenance data with knowledge graph subject/predicate/object triples in order to generate a knowledge graph with increased flexibility and coverage of information (Paragraph 0004, lines 1-7, "Systems and methods are disclosed for enterprise knowledge graph mining using multiple toolkits and entity annotations with neural entity recognition. The use of multiple toolkits for an enterprise knowledge graph mining allows for more flexibility and coverage of information, as different technologies may tend to specialize on different types of entities based on the same source content").
Pradhan and Song are considered to be analogous to the claimed invention because they are in the same field of automatic knowledge graph generation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Song to store provenance data with knowledge graph subject/predicate/object triples.  Doing so would allow for generating a knowledge graph with increased flexibility and coverage of information.
Regarding claim 11, Pradhan discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose: wherein the set of second text documents is at least one of an article, a book, a newspaper, conference proceedings, a magazine, a chat protocol, a manuscript, handwritten notes, server log, and email thread.
Song teaches:
wherein the set of second text documents is at least one of an article, a book, a newspaper, conference proceedings, a magazine, a chat protocol, a manuscript, handwritten notes, server log, and email thread (Section I, lines 35-40, "Furthermore, the data we have covers a variety of domains, such as media, geography, finance, legal, academia and entertainment. In terms of the format, data may be structured (e.g., database records) or unstructured (e.g., news articles, court dockets and financial reports).").
Song teaches generating knowledge graphs for articles in order to generate a knowledge graph for an article with increased flexibility and coverage of information (Paragraph 0004, lines 1-7, "Systems and methods are disclosed for enterprise knowledge graph mining using multiple toolkits and entity annotations with neural entity recognition. The use of multiple toolkits for an enterprise knowledge graph mining allows for more flexibility and coverage of information, as different technologies may tend to specialize on different types of entities based on the same source content").
Pradhan and Song are considered to be analogous to the claimed invention because they are in the same field of automatic knowledge graph generation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Song to generate knowledge graphs for articles.  Doing so would allow for generating a knowledge graph for an article with increased flexibility and coverage of information.
Regarding claim 14, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses the knowledge graph construction system as claimed in claim 13, but does not specifically disclose further comprising: responsive to a second entity having a confidence level value below a predetermined entity threshold value, program instructions to remove the second entity from the second entities.
Song teaches:
responsive to a second entity having a confidence level value below a predetermined entity threshold value, program instructions to remove the second entity from the second entities (Paragraph 0035, lines 1-4, "The present disclosure provides systems and methods for generating, maintaining, and using a knowledge graph for an enterprise using multiple mining methods and systems"; Paragraph 0195, lines 23-25, "The probability distribution for the entity may then exceed a threshold and the new entity can become established."; Establishing an entity when the probability distribution for the entity exceeds a threshold reads on removing an entity having a confidence level value below a predetermined entity threshold value.).
Song teaches establishing an entity when the probability distribution for the entity exceeds a threshold in order to generate a knowledge graph with increased flexibility and coverage of information (Paragraph 0004, lines 1-7, "Systems and methods are disclosed for enterprise knowledge graph mining using multiple toolkits and entity annotations with neural entity recognition. The use of multiple toolkits for an enterprise knowledge graph mining allows for more flexibility and coverage of information, as different technologies may tend to specialize on different types of entities based on the same source content").
Pradhan and Song are considered to be analogous to the claimed invention because they are in the same field of automatic knowledge graph generation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Song to establish an entity when the probability distribution for the entity exceeds a threshold.  Doing so would allow for generating a knowledge graph with increased flexibility and coverage of information.
Regarding claim 18, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses the knowledge graph construction system as claimed in claim 13, but does not specifically disclose further comprising: program instructions to store provenance data to a document of the set of second text documents for the second entities and the second edges together with the triplets.
Song teaches:
program instructions to store provenance data to a document of the set of second text documents for the second entities and the second edges together with the triplets ( Section 4.2, lines 13-19, "In addition to the three basic elements in a triple (i.e., subject, predicate and object), a fourth element can also be added, turning a triple to a quad. This fourth element is generally used to provide provenance information of the triple, such as its source [8] and trustworthiness [9]. Such provenance information can be used to evaluate the quality of a triple.").
Song teaches storing provenance data with knowledge graph subject/predicate/object triples in order to generate a knowledge graph with increased flexibility and coverage of information (Paragraph 0004, lines 1-7, "Systems and methods are disclosed for enterprise knowledge graph mining using multiple toolkits and entity annotations with neural entity recognition. The use of multiple toolkits for an enterprise knowledge graph mining allows for more flexibility and coverage of information, as different technologies may tend to specialize on different types of entities based on the same source content").
Pradhan and Song are considered to be analogous to the claimed invention because they are in the same field of automatic knowledge graph generation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Song to store provenance data with knowledge graph subject/predicate/object triples.  Doing so would allow for generating a knowledge graph with increased flexibility and coverage of information.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Pradhan in view of Jain et al. (US Patent Application Publication No. 2021/0089614), hereinafter Jain.
Regarding claim 5, Pradhan discloses the computer-implemented method as claimed in claim 4, but does not specifically disclose: wherein the supervised machine-learning method for the first machine-learning system is a random forest machine-learning method.
Jain teaches:
wherein the supervised machine-learning method for the first machine-learning system is a random forest machine-learning method (Paragraph 0049, lines 1-15, "In one or more implementations, the named entity category identification module 204 is implemented at least in part as a machine learning system. Machine learning systems refer to a computer representation that can be tuned (e.g., trained) based on inputs to approximate unknown functions. In particular, machine learning systems can include a system that utilizes algorithms to learn from, and make predictions on, known data by analyzing the known data to learn to generate outputs that reflect patterns and attributes of the known data. For instance, a machine learning system can include decision trees, support vector machines, linear regression, logistic regression, Bayesian networks, random forest learning, dimensionality reduction algorithms, boosting algorithms, artificial neural networks, deep learning, and so forth."; Implementing named entity identification machine learning as random forest learning reads on using a random forest machine-learning method for the first machine-learning system.).
Jain teaches implementing named entity identification machine learning as random forest learning in order to identify entities to automatically apply a style to occurrences of entity categories in digital content (Paragraph 0005, lines 1-14, "To mitigate the drawbacks of digital content creation systems, an automatic content styling system as implemented by a computing device is described to automatically apply a style to occurrences of one or more named entity categories in digital content. An indication of a style to apply to digital content and an indication of at least one named entity category to which the style is to be applied are obtained. One or more occurrences of the at least one named entity category in the digital content are identified by a machine learning system trained to identify the at least one named entity category. Each of the one or more occurrences of the at least one named entity category in the digital content is automatically formatted with the style, resulting in styled digital content that is caused to be displayed.").
Pradhan and Jain are considered to be analogous to the claimed invention because they are in the same field of entity identification.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Jain to implement named entity identification machine learning as random forest learning.  Doing so would allow for identifying entities to automatically apply a style to occurrences of entity categories in digital content.
Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Pradhan in view of Al-Zaidy et al. (“Extracting Semantic Relations for Scholarly Knowledge Base Construction”), hereinafter Al-Zaidy.
Regarding claim 8, Pradhan discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose further comprising: executing a parser for each predicted first entity; and determining at least one entity instance.
Al-Zaidy teaches:
executing a parser for each predicted first entity (Section III, lines 1-6, "Our system is given a set of documents in PDF format, and generates a knowledge base represented as a graph. The system is comprised of three main modules: a parser, an entity-relation extractor, and a taxonomy graph constructor.");
and determining at least one entity instance (Abstract, lines 10-12, "Our method extracts semantic entities as concepts and instances along with their attributes from the fully body text of documents.").
Al-Zaidy teaches using a parser for entity extraction and extracting entity instances in order to construct a scientific taxonomy or a knowledge graph with increased precision (Abstract, lines 12-25, "We extract two types of relationships between concepts in the text using an iterative learning algorithm. External data sources from the web such as the Microsoft concept graph, as well as query logs, are utilized to evaluate the quality of the extracted concepts and relations. The concepts are used to construct a scientific taxonomy covering the research content of the documents. To evaluate the system we apply our approach on a set of 10k scholarly documents and conduct several evaluations to show the effectiveness of the proposed methods. We show that our system obtains a 23% improvement in precision over existing web IE tools when they are applied to scholarly documents."; Section I, lines 14-15, "One approach for semantically structuring scholarly documents is to represent them as knowledge graphs [2].").
Pradhan and Al-Zaidy are considered to be analogous to the claimed invention because they are in the same field of entity identification.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Al-Zaidy to use a parser for entity extraction and extract entity instances.  Doing so would allow for constructing a scientific taxonomy or a knowledge graph with increased precision.
Regarding claim 17, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pradhan discloses the knowledge graph construction system as claimed in claim 13, but does not specifically disclose further comprising: program instructions to execute a parser for each first entity; and program instructions to determine at least one entity instance.
Al-Zaidy teaches:
program instructions to execute a parser for each first entity (Section III, lines 1-6, "Our system is given a set of documents in PDF format, and generates a knowledge base represented as a graph. The system is comprised of three main modules: a parser, an entity-relation extractor, and a taxonomy graph constructor.");
and program instructions to determine at least one entity instance (Abstract, lines 10-12, "Our method extracts semantic entities as concepts and instances along with their attributes from the fully body text of documents.").
Al-Zaidy teaches using a parser for entity extraction and extracting entity instances in order to construct a scientific taxonomy or a knowledge graph with increased precision (Abstract, lines 12-25, "We extract two types of relationships between concepts in the text using an iterative learning algorithm. External data sources from the web such as the Microsoft concept graph, as well as query logs, are utilized to evaluate the quality of the extracted concepts and relations. The concepts are used to construct a scientific taxonomy covering the research content of the documents. To evaluate the system we apply our approach on a set of 10k scholarly documents and conduct several evaluations to show the effectiveness of the proposed methods. We show that our system obtains a 23% improvement in precision over existing web IE tools when they are applied to scholarly documents."; Section I, lines 14-15, "One approach for semantically structuring scholarly documents is to represent them as knowledge graphs [2].").
Pradhan and Al-Zaidy are considered to be analogous to the claimed invention because they are in the same field of entity identification.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pradhan to incorporate the teachings of Al-Zaidy to use a parser for entity extraction and extract entity instances.  Doing so would allow for constructing a scientific taxonomy or a knowledge graph with increased precision.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657