DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to because:
In Figure 1, element 150 is referenced in the specification on page 13, lines 21-22, as “knowledge graph construction module 150” and on page 14, line 12, as “relation determination module 150”.
In Figure 4, element 410, line 1, “sel-structure attention” should read “self-structure attention”.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities:
On page 17, line 10, “entity/relation capsules 149” should read “entity/relation capsules 142”.
On page 19, line 21, “make a service appointment” should read “making a service appointment”.
On page 19, line 26, “before seeking for help” should read “before seeking help”.
On page 19, lines 28-29, “the user of the administrator” should read “the user or the administrator”.
On page 20, lines 28-29, “has a predefined dimensions” should read “has predefined dimensions”.
On page 21, line 14, “sentence embedding 135 are used” should read “sentence embeddings 135 are used”.
On page 22, line 5, “determines” should read “and determines”.
On page 22, line 10, “run another iteration” should read “runs another iteration”.
On page 22, line 27, “each document have” should read “each document has”.
On page 25, line 10, “which is advantages” should read “which is advantageous”.
On page 25, line 13, “and contains” should read “and contain”.
On page 25, line 19, “use this” should read “uses this”.
Appropriate correction is required.
Claim Objections
Claims 4, 6 and 11 are objected to because of the following informalities:
In claim 4, line 6, “the tokens comprises” should read “the tokens comprise”.
In claim 6, line 6, “the tokens comprises” should read “the tokens comprise”.
In claim 11, line 20, “construct the knowledge graph” should read “constructing the knowledge graph”.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 – 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "form the third number of primary capsule layers, each of the third number of primary capsule layers corresponding to one of the third number of sentence embeddings" in lines 9-11.  There is insufficient antecedent basis for “the third number of primary capsule layers” in this claim limitation.  Claim 1 previously recites “a third number of sentence embeddings”, but it is not clear how “the third number of primary capsule layers” relates to “a third number of sentence embeddings”.  For examination purposes, the limitation “the third number of primary capsule layers will be interpreted to mean a number of primary capsule layers with the same quantity as “a third number of sentence embeddings”.  This limitation is also indefinite because the meaning of "form the third number of primary capsule layers” is not clear.  Forming primary capsule layers can be interpreted as forming primary capsules for the capsule neural network, or it can be interpreted as forming layers of the capsule neural network.  The specification, on page 16, lines 3-4, recites “The fixed length sentence embedding 135 is in a form of 2D matrix, and is regarded as the primary capsules 136.”  From this description, forming primary capsule layers seems to be referring to forming primary capsules for the capsule neural network.  For examination purposes, the limitation "form the third number of primary capsule layers” will be interpreted to mean forming primary capsules for the capsule neural network.
Claim 1 also recites the limitation "use a set transformer to learn the first number of entity capsule layers and the second number of relation capsule layers from the third number of primary capsule layers, an i-th entity and a j-th entity from the first number of entity capsule layers and an m-th relation from the second number of relation capsule layers form a head entity-tail entity-relation triple" in lines 12-16.  There is insufficient antecedent basis for “the first number of entity capsule layers” and “the second number of relation capsule layers” in this claim limitation.  Claim 1 previously recites “a first number of entities” and “a second number of relations”, but it is not clear how “the first number of entity capsule layers” relates to “a first number of entities”, and how “the second number of relation capsule layers” relates to “a second number of relations”.  For examination purposes, the limitation “the first number of entity capsule layers” will be interpreted to mean a number of entity capsule layers with the same quantity as “a first number of entities”, and the limitation “the second number of relation capsule layers” will be interpreted to mean a number of relation capsule layers with the same quantity as “a second number of relations”.  This limitation is also indefinite because the meaning of “to learn the first number of entity capsule layers and the second number of relation capsule layers” is not clear.  Learning entity and relation capsule layers can be interpreted as learning entities and relations for the documents, or it can be interpreted as learning parameters of the capsule neural network.  The specification, on page 16, lines 4-7, recites “After obtaining the primary capsules 136, the knowledge learning module 130 is further configured to use a set transformer mechanism to learn the relationship between the primary capsules 136 and abstract entity and relation capsules, and obtain entity/relation capsules 142.”  From this description, the set transformer seems to be obtaining the entities and relations for the documents.  For examination purposes, the limitation “to learn the first number of entity capsule layers and the second number of relation capsule layers” will be interpreted to mean that the system learns the entities and relations contained in the provided documents.
In addition, Claim 1 recites the limitation “project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection” in lines 17-21.  There is insufficient antecedent basis for “the i-th projection” and “the j-th projection” in this claim limitation.  For examination purposes, the limitation “project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection” will be interpreted as: project the i-th entity in an entity space into a m-th relation space to form an i-th projection, project the j-th entity in the entity space into the m-th relation space to form a j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection.
Claims 2 – 10 depend from claim 1, and thus recite the limitations of claim 1, and do not resolve the indefinite language from claim 1.  
Claim 6 also recites the limitation “encoding words in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence, wherein the tokens comprise words and punctuations” in lines 4-6.  There is insufficient antecedent basis for “the tokens” in this claim limitation.  For examination purposes, the limitation “encoding words in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence, wherein the tokens comprise words and punctuations” will be interpreted as: encoding tokens in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence, wherein the tokens comprise words and punctuations.
Claim 11 recites the limitation "forming, by the computing device, the third number of primary capsule layers, each of the third number of primary capsule layers corresponding to one of the third number of sentence embeddings" in lines 7-9.  There is insufficient antecedent basis for “the third number of primary capsule layers” in this claim limitation.  Claim 11 previously recites “a third number of sentence embeddings”, but it is not clear how “the third number of primary capsule layers” relates to “a third number of sentence embeddings”.  For examination purposes, the limitation “the third number of primary capsule layers will be interpreted to mean a number of primary capsule layers with the same quantity as “a third number of sentence embeddings”.  This limitation is also indefinite because the meaning of " forming, by the computing device, the third number of primary capsule layers” is not clear.  Forming primary capsule layers can be interpreted as forming primary capsules for the capsule neural network, or it can be interpreted as forming layers of the capsule neural network.  The specification, on page 16, lines 3-4, recites “The fixed length sentence embedding 135 is in a form of 2D matrix, and is regarded as the primary capsules 136.”  From this description, forming primary capsule layers seems to be referring to forming primary capsules for the capsule neural network.  For examination purposes, the limitation " forming, by the computing device, the third number of primary capsule layers” will be interpreted to mean forming primary capsules for the capsule neural network.
Claim 11 also recites the limitation “using, by the computing device, a set transformer to learn the first number of entity capsule layers and the second number of relation capsule layers from the third number of primary capsule layers, an i-th entity and a j-th entity from the first number of entity capsule layers and an m-th relation from the second number of relation capsule layers form a head entity-tail entity-relation triple” in lines 10-14.  There is insufficient antecedent basis for “the first number of entity capsule layers” and “the second number of relation capsule layers” in this claim limitation.  Claim 11 previously recites “a first number of entities” and “a second number of relations”, but it is not clear how “the first number of entity capsule layers” relates to “a first number of entities”, and how “the second number of relation capsule layers” relates to “a second number of relations”.  For examination purposes, the limitation “the first number of entity capsule layers” will be interpreted to mean a number of entity capsule layers with the same quantity as “a first number of entities”, and the limitation “the second number of relation capsule layers” will be interpreted to mean a number of relation capsule layers with the same quantity as “a second number of relations”.  This This limitation is also indefinite because the meaning of “to learn the first number of entity capsule layers and the second number of relation capsule layers” is not clear.  Learning entity and relation capsule layers can be interpreted as learning entities and relations for the documents, or it can be interpreted as learning parameters of the capsule neural network. The specification, on page 16, lines 4-7, recites “After obtaining the primary capsules 136, the knowledge learning module 130 is further configured to use a set transformer mechanism to learn the relationship between the primary capsules 136 and abstract entity and relation capsules, and obtain entity/relation capsules 142.”  From this description, the set transformer seems to be obtaining the entities and relations for the documents.  For examination purposes, the limitation “to learn the first number of entity capsule layers and the second number of relation capsule layers” will be interpreted to mean that the system learns the entities and relations contained in the provided documents.
In addition, Claim 11 recites the limitation “projecting, by the computing device, the i-th entity in an entity space into an m-th relation space to form the i-th projection, projecting the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determining the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection” in lines 15-19.  There is insufficient antecedent basis for “the i-th projection” and “the j-th projection” in this claim limitation.  For examination purposes, the limitation “projecting, by the computing device, the i-th entity in an entity space into an m-th relation space to form the i-th projection, projecting the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determining the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection” will be interpreted as: projecting, by the computing device, the i-th entity in an entity space into an m-th relation space to form an i-th projection, projecting the j-th entity in the entity space into the m-th relation space to form a j-th projection, and determining the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection.
Claims 12 – 17 depend from claim 11, and thus recite the limitations of claim 11, and do not resolve the indefinite language from claim 11.
Claim 18 recites the limitation "form the third number of primary capsule layers, each of the third number of primary capsule layers corresponding to one of the third number of sentence embeddings" in lines 8-10.  There is insufficient antecedent basis for “the third number of primary capsule layers” in this claim limitation.  Claim 18 previously recites “a third number of sentence embeddings”, but it is not clear how “the third number of primary capsule layers” relates to “a third number of sentence embeddings”.  For examination purposes, the limitation “the third number of primary capsule layers will be interpreted to mean a number of primary capsule layers with the same quantity as “a third number of sentence embeddings”.  This limitation is also indefinite because the meaning of "form the third number of primary capsule layers” is not clear.  Forming primary capsule layers can be interpreted as forming primary capsules for the capsule neural network, or it can be interpreted as forming layers of the capsule neural network.  The specification, on page 16, lines 3-4, recites “The fixed length sentence embedding 135 is in a form of 2D matrix, and is regarded as the primary capsules 136.”  From this description, forming primary capsule layers seems to be referring to forming primary capsules for the capsule neural network.  For examination purposes, the limitation "form the third number of primary capsule layers” will be interpreted to mean forming primary capsules for the capsule neural network.
Claim 18 also recites the limitation “use a set transformer to learn the first number of entity capsule layers and the second number of relation capsule layers from the third number of primary capsule layers, an i-th entity and a j-th entity from the first number of entity capsule layers and an m-th relation from the second number of relation capsule layers form a head entity-tail entity-relation triple” in lines 11-15.  There is insufficient antecedent basis for “the first number of entity capsule layers” and “the second number of relation capsule layers” in this claim limitation.  Claim 18 previously recites “a first number of entities” and “a second number of relations”, but it is not clear how “the first number of entity capsule layers” relates to “a first number of entities”, and how “the second number of relation capsule layers” relates to “a second number of relations”.  For examination purposes, the limitation “the first number of entity capsule layers” will be interpreted to mean a number of entity capsule layers with the same quantity as “a first number of entities”, and the limitation “the second number of relation capsule layers” will be interpreted to mean a number of relation capsule layers with the same quantity as “a second number of relations”.  This This limitation is also indefinite because the meaning of “to learn the first number of entity capsule layers and the second number of relation capsule layers” is not clear.  Learning entity and relation capsule layers can be interpreted as learning entities and relations for the documents, or it can be interpreted as learning parameters of the capsule neural network. The specification, on page 16, lines 4-7, recites “After obtaining the primary capsules 136, the knowledge learning module 130 is further configured to use a set transformer mechanism to learn the relationship between the primary capsules 136 and abstract entity and relation capsules, and obtain entity/relation capsules 142.”  From this description, the set transformer seems to be obtaining the entities and relations for the documents.  For examination purposes, the limitation “to learn the first number of entity capsule layers and the second number of relation capsule layers” will be interpreted to mean that the system learns the entities and relations contained in the provided documents.
In addition, Claim 18 recites the limitation “project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection” in lines 16-20.  There is insufficient antecedent basis for “the i-th projection” and “the j-th projection” in this claim limitation.  For examination purposes, the limitation “project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection” will be interpreted as: project the i-th entity in an entity space into a m-th relation space to form an i-th projection, project the j-th entity in the entity space into the m-th relation space to form a j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection.
Claims 19 – 20 depend from claim 18, and thus recite the limitations of claim 18, and do not resolve the indefinite language from claim 18.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 3, 11 – 12 and 18 – 19 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng et al. (“Knowledge Graph Representation Learning with Multi-Scale Capsule-Based Embedding Model Incorporating Entity Descriptions”), hereinafter Cheng, in view of Lee et al. (“Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks”), hereinafter Lee, and Y. Lin et al. (“Learning Entity and Relation Embeddings for Knowledge Graph Completion”), hereinafter Lin.
Regarding claim 1, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng discloses a system for knowledge graph construction, wherein the system comprises a computing device, the computing device comprises a processor and a storage device storing computer executable code (Section IV-F, lines 1-2, "We use the Adam Optimizer [45] to train the proposed KGRL model by minimizing the cross entropy loss function in (20)."; Section V-B, lines 1-3, "We evaluate MCapsEED on the task of link prediction, the goal of which is to predict a missing entity given a relation and another entity in a triple."; Training the model and conducting experiments with the model demonstrates the use of a computing device with a processor executing instructions stored in memory.),
and the computer executable code, when executed at the processor, is configured to:
provide a first number of entities, a second number of relations, and a plurality of documents, each of the plurality of documents comprising at least one sentence (Section V-A, lines 1-7, "The experiment contain two parts: (1) Comparison of MCapsEED with existing Capsule-based KGRL models, CapsE and MCapsE, to verify whether incorporating entity description information improves the performance of KGRL models. The experimental datasets are consistent with that of in MCapsE and CapsE, namely FB15k-237 and WN18RR."; The FB15k-237 and WN18RR datasets read on the documents, sentences, entities, and relations.);
convert each of the at least one sentence into a third number of sentence embeddings (Section IV-C, lines 15-17, "In the input layer, word embeddings and position embeddings of an entity description are concatenated to form the sentence embedding, which serve as model input information.");
form the third number of primary capsule layers, each of the third number of primary capsule layers corresponding to one of the third number of sentence embeddings (Section IV-E, lines 24-28, "We use two capsule layers in MCapsE. In the first layer, we construct k capsules for each feature map list. We encapsulate features in the same dimension in the feature map list into a same capsule to capture features at different positions in the triple embedding."; The k capsules encapsulating features in the same dimension as the feature map list reads on primary capsule layers corresponding to the number of sentence embeddings.);
use a transformer to learn the first number of entity capsule layers and the second number of relation capsule layers from the third number of primary capsule layers, an i-th entity and a j-th entity from the first number of entity capsule layers and an m-th relation from the second number of relation capsule layers form a head entity-tail entity-relation triple (Section IV-B, lines 6-17, "The preprocessed entity descriptions are fed into the framework as the input of the Entity Description Encoder, where Transformer in combination with relation attention mechanism is used to encode head and tail entity descriptions into vector representations hd and td. Through dynamic gate mechanism hd and td are integrated with structured representations of head and tail entities from TransE model, hs and ts, to obtain the synthetic representations of the head and tail entities, vh and vt. MCapsE perform representation learning on vh and vt, and the structured representation vr of the relation to obtain the final representations of the head entity, the tail entity and the relation."; The machine learning using a Transformer to obtain the final representations of the head entity, the tail entity, and the relation reads on using a transformer to learn the entities and relations of a head entity-tail entity-relation triple.);
and construct the knowledge graph using the determined m-th relation (Section I, lines 1-4, "A Knowledge Graph (KG) is a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities [1]."; Section I, lines 11-15, "KGs evolved from the Semantic Web [12], [13], the essence of which is a directed graph composed of entities connected by relations. Each edge is a triple of the fact (head entity, relation, tail entity) (denoted as (h; r; t))."; A knowledge graph defined as nodes representing entities of interest and edges representing relations between the entities reads on constructing a knowledge graph using the determined relations.).
Cheng does not specifically disclose: a set transformer.
Lee teaches the use of a set transformer (Section 3, lines 1-7, "In this section, we motivate and describe the Set Transformer: an attention-based neural network that is designed to process sets of data. Similar to other architectures, a Set Transformer consists of an encoder followed by a decoder (cf. Section 2.1), but a distinguishing feature is that each layer in the encoder and decoder attends to their inputs to produce activations.").
Lee teaches using a set transformer in order to model interactions among elements in a set (Abstract, lines 7-17, "We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces computation time of self-attention from quadratic to linear in the number of elements in the set.").
Cheng and Lee are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng to incorporate the teachings of Lee to use a set transformer.  Doing so would allow for modeling interactions among elements in a set.
Cheng in view of Lee does not specifically disclose: project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection.
Y. Lin teaches:
project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection (Page 2181, right column, lines 28-36, "To address this issue, we propose a new method, which models entities and relations in distinct spaces, i.e., entity space and multiple relation spaces (i.e., relation-specific entity spaces), and performs translation in the corresponding relation space, hence named as TransR. The basic idea of TransR is illustrated in Fig. 1. For each triple (h, r, t), entities in the entity space are first projected into r-relation space as hr and tr with operation Mr, and then hr + r ≈ tr.").
Y. Lin teaches modeling entities in entity space, modeling relations in relation space, and projecting the entities into relation space to determine the relationship between the head entity and tail entity by adding the relation to the head entity projection to find the tail entity projection, in order to learn embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction (Abstract, lines 11-21, "In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH.").
Cheng, Lee, and Y. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee to incorporate the teachings of Y. Lin to model entities in entity space, model relations in relation space, and project the entities into relation space to determine the relationship between the head entity and tail entity by adding the relation to the head entity projection to find the tail entity projection.  Doing so would allow for learning embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction.
Regarding claim 2, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the system as claimed in claim 1.
Lee further teaches:
wherein the set transformer comprises an encoder and decoder, the encoder comprises a plurality of self-attention blocks (SABs), and the decoder comprises a pooling by multi-head attention block (PMA) and a plurality of SBAs (Section 3, lines 1-4, "In this section, we motivate and describe the Set Transformer: an attention-based neural network that is designed to process sets of data. Similar to other architectures, a Set Transformer consists of an encoder followed by a decoder"; Section 3.1, lines 19-24, "The MAB is an adaptation of the encoder block of the Transformer (Vaswani et al., 2017) without positional encoding and dropout. Using the MAB, we define the Set Attention Block (SAB) as SAB(X) := MAB(X,X)"; Section 3.2, lines 3-8, "We instead propose to aggregate features by applying multihead attention on a learnable set of k seed vectors S ϵ Rk+d. Let Z ϵ Rn+d be the set of features constructed from an encoder. Pooling by Multihead Attention (PMA) with k seed vectors is defined as PMAk(Z) = MAB(S, rFF(Z))"; Section 3.3, lines 3-4, "The encoder Encoder : X → Z ϵ Rn+d is a stack of SABs or ISABs"; Section 3.3, lines 12-16, "the decoder aggregates them into a single or a set of vectors which is fed into a feed-forward network to get final outputs. Note that PMA with k > 1 seed vectors should be followed by SABs to model the correlation between k outputs.").
Lee teaches using a set transformer with an encoder comprising multiple self-attention blocks (SABs) and a decoder comprising pooling by multi-head attention (PMA) followed by multiple self-attention blocks (SABs), in order to model interactions among elements in a set while reducing computational complexity and computation time (Abstract, lines 7-17, "We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces computation time of self-attention from quadratic to linear in the number of elements in the set.").
Cheng, Lee, and Y. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to further incorporate the teachings of Lee to use a set transformer with an encoder comprising multiple self-attention blocks (SABs) and a decoder comprising pooling by multi-head attention (PMA) followed by multiple self-attention blocks (SABs).  Doing so would allow for modeling interactions among elements in a set while reducing computational complexity and computation time.
Regarding claim 3, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the system as claimed in claim 1.
Y. Lin further teaches:
wherein the computer executable code is configured to project the i-th entity and the j-th entity into the m-th relation space using a projection matrix, and the projection matrix is learned during training (Page 2183, left column, lines 26-28, "For each relation r, we set a projection matrix Mr ϵ Rkxd, which may projects entities from entity space to relation space."; Page 2183, left column, line 45 - right column, line 2, "The basic idea is that, we first segment input instances into several groups. Formally, for a specific relation r, all entity pairs (h, t) in the training data are clustered into multiple groups, and entity pairs in each group are expected to exhibit similar r relation. All entity pairs (h, t) are represented with their vector offsets (h - t) for clustering, where h and t are obtained with TransE. Afterwards, we learn a separate relation vector rc for each cluster and matrix Mr for each relation, respectively.").
Y. Lin teaches projecting entities into relation space using a projection matrix, with the projection matrix learned during training, in order to learn embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction (Abstract, lines 11-21, "In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH.").
Cheng, Lee, and Y. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to further incorporate the teachings of Y. Lin to project entities into relation space using a projection matrix, with the projection matrix learned during training.  Doing so would allow for learning embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction.
Regarding claim 11, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng discloses a method, comprising:
providing, by a computing device, a first number of entities, a second number of relations, and a plurality of documents, each of the plurality of documents comprising at least one sentence (Section V-A, lines 1-7, "The experiment contain two parts: (1) Comparison of MCapsEED with existing Capsule-based KGRL models, CapsE and MCapsE, to verify whether incorporating entity description information improves the performance of KGRL models. The experimental datasets are consistent with that of in MCapsE and CapsE, namely FB15k-237 and WN18RR."; The FB15k-237 and WN18RR datasets read on the documents, sentences, entities, and relations.);
converting, by the computing device, each of the at least one sentence into a third number of sentence embeddings (Section IV-C, lines 15-17, "In the input layer, word embeddings and position embeddings of an entity description are concatenated to form the sentence embedding, which serve as model input information.");
forming, by the computing device, the third number of primary capsule layers, each of the third number of primary capsule layers corresponding to one of the third number of sentence embeddings (Section IV-E, lines 24-28, "We use two capsule layers in MCapsE. In the first layer, we construct k capsules for each feature map list. We encapsulate features in the same dimension in the feature map list into a same capsule to capture features at different positions in the triple embedding."; The k capsules encapsulating features in the same dimension as the feature map list reads on primary capsule layers corresponding to the number of sentence embeddings.);
using, by the computing device, a transformer to learn the first number of entity capsule layers and the second number of relation capsule layers from the third number of primary capsule layers, an i-th entity and a j-th entity from the first number of entity capsule layers and an m-th relation from the second number of relation capsule layers form a head entity-tail entity-relation triple (Section IV-B, lines 6-17, "The preprocessed entity descriptions are fed into the framework as the input of the Entity Description Encoder, where Transformer in combination with relation attention mechanism is used to encode head and tail entity descriptions into vector representations hd and td. Through dynamic gate mechanism hd and td are integrated with structured representations of head and tail entities from TransE model, hs and ts, to obtain the synthetic representations of the head and tail entities, vh and vt. MCapsE perform representation learning on vh and vt, and the structured representation vr of the relation to obtain the final representations of the head entity, the tail entity and the relation."; The machine learning using a Transformer to obtain the final representations of the head entity, the tail entity, and the relation reads on using a transformer to learn the entities and relations of a head entity-tail entity-relation triple.);
and constructing the knowledge graph using the determined m-th relation (Section I, lines 1-4, "A Knowledge Graph (KG) is a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities [1]."; Section I, lines 11-15, "KGs evolved from the Semantic Web [12], [13], the essence of which is a directed graph composed of entities connected by relations. Each edge is a triple of the fact (head entity, relation, tail entity) (denoted as (h; r; t))."; A knowledge graph defined as nodes representing entities of interest and edges representing relations between the entities reads on constructing a knowledge graph using the determined relations.).
Cheng does not specifically disclose: a set transformer.
Lee teaches the use of a set transformer (Section 3, lines 1-7, "In this section, we motivate and describe the Set Transformer: an attention-based neural network that is designed to process sets of data. Similar to other architectures, a Set Transformer consists of an encoder followed by a decoder (cf. Section 2.1), but a distinguishing feature is that each layer in the encoder and decoder attends to their inputs to produce activations.").
Lee teaches using a set transformer in order to model interactions among elements in a set (Abstract, lines 7-17, "We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces computation time of self-attention from quadratic to linear in the number of elements in the set.").
Cheng and Lee are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng to incorporate the teachings of Lee to use a set transformer.  Doing so would allow for modeling interactions among elements in a set.
Cheng in view of Lee does not specifically disclose: projecting, by the computing device, the i-th entity in an entity space into an m-th relation space to form the i-th projection, projecting the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determining the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection.
Y. Lin teaches:
projecting, by the computing device, the i-th entity in an entity space into an m-th relation space to form the i-th projection, projecting the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determining the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection (Page 2181, right column, lines 28-36, "To address this issue, we propose a new method, which models entities and relations in distinct spaces, i.e., entity space and multiple relation spaces (i.e., relation-specific entity spaces), and performs translation in the corresponding relation space, hence named as TransR. The basic idea of TransR is illustrated in Fig. 1. For each triple (h, r, t), entities in the entity space are first projected into r-relation space as hr and tr with operation Mr, and then hr + r ≈ tr.").
Y. Lin teaches modeling entities in entity space, modeling relations in relation space, and projecting the entities into relation space to determine the relationship between the head entity and tail entity by adding the relation to the head entity projection to find the tail entity projection, in order to learn embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction (Abstract, lines 11-21, "In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH.").
Cheng, Lee, and Y. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee to incorporate the teachings of Y. Lin to model entities in entity space, model relations in relation space, and project the entities into relation space to determine the relationship between the head entity and tail entity by adding the relation to the head entity projection to find the tail entity projection.  Doing so would allow for learning embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction.
Regarding claim 12, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the method as claimed in claim 11.
Lee further teaches:
wherein the set transformer comprises an encoder and decoder, the encoder comprises a plurality of self-attention blocks (SABs), and the decoder comprises a pooling by multi-head attention block (PMA) and a plurality of SBAs (Section 3, lines 1-4, "In this section, we motivate and describe the Set Transformer: an attention-based neural network that is designed to process sets of data. Similar to other architectures, a Set Transformer consists of an encoder followed by a decoder"; Section 3.1, lines 19-24, "The MAB is an adaptation of the encoder block of the Transformer (Vaswani et al., 2017) without positional encoding and dropout. Using the MAB, we define the Set Attention Block (SAB) as SAB(X) := MAB(X,X)"; Section 3.2, lines 3-8, "We instead propose to aggregate features by applying multihead attention on a learnable set of k seed vectors S ϵ Rk+d. Let Z ϵ Rn+d be the set of features constructed from an encoder. Pooling by Multihead Attention (PMA) with k seed vectors is defined as PMAk(Z) = MAB(S, rFF(Z))"; Section 3.3, lines 3-4, "The encoder Encoder : X → Z ϵ Rn+d is a stack of SABs or ISABs"; Section 3.3, lines 12-16, "the decoder aggregates them into a single or a set of vectors which is fed into a feed-forward network to get final outputs. Note that PMA with k > 1 seed vectors should be followed by SABs to model the correlation between k outputs.").
Lee teaches using a set transformer with an encoder comprising multiple self-attention blocks (SABs) and a decoder comprising pooling by multi-head attention (PMA) followed by multiple self-attention blocks (SABs), in order to model interactions among elements in a set while reducing computational complexity and computation time (Abstract, lines 7-17, "We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces computation time of self-attention from quadratic to linear in the number of elements in the set.").
Cheng, Lee, and Y. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to further incorporate the teachings of Lee to use a set transformer with an encoder comprising multiple self-attention blocks (SABs) and a decoder comprising pooling by multi-head attention (PMA) followed by multiple self-attention blocks (SABs).  Doing so would allow for modeling interactions among elements in a set while reducing computational complexity and computation time.
Regarding claim 18, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng discloses a non-transitory computer readable medium storing computer executable code, wherein the computer executable code, when executed at a processor of a computing device (Section IV-F, lines 1-2, "We use the Adam Optimizer [45] to train the proposed KGRL model by minimizing the cross entropy loss function in (20)."; Section V-B, lines 1-3, "We evaluate MCapsEED on the task of link prediction, the goal of which is to predict a missing entity given a relation and another entity in a triple."; Training the model and conducting experiments with the model demonstrates the use of a computing device with a processor executing instructions stored in memory.),
is configured to:
provide a first number of entities, a second number of relations, and a plurality of documents, each of the plurality of documents comprising at least one sentence (Section V-A, lines 1-7, "The experiment contain two parts: (1) Comparison of MCapsEED with existing Capsule-based KGRL models, CapsE and MCapsE, to verify whether incorporating entity description information improves the performance of KGRL models. The experimental datasets are consistent with that of in MCapsE and CapsE, namely FB15k-237 and WN18RR."; The FB15k-237 and WN18RR datasets read on the documents, sentences, entities, and relations.);
convert each of the at least one sentence into a third number of sentence embeddings (Section IV-C, lines 15-17, "In the input layer, word embeddings and position embeddings of an entity description are concatenated to form the sentence embedding, which serve as model input information.");
form the third number of primary capsule layers, each of the third number of primary capsule layers corresponding to one of the third number of sentence embeddings (Section IV-E, lines 24-28, "We use two capsule layers in MCapsE. In the first layer, we construct k capsules for each feature map list. We encapsulate features in the same dimension in the feature map list into a same capsule to capture features at different positions in the triple embedding."; The k capsules encapsulating features in the same dimension as the feature map list reads on primary capsule layers corresponding to the number of sentence embeddings.);
use a transformer to learn the first number of entity capsule layers and the second number of relation capsule layers from the third number of primary capsule layers, an i-th entity and a j-th entity from the first number of entity capsule layers and an m-th relation from the second number of relation capsule layers form a head entity-tail entity-relation triple (Section IV-B, lines 6-17, "The preprocessed entity descriptions are fed into the framework as the input of the Entity Description Encoder, where Transformer in combination with relation attention mechanism is used to encode head and tail entity descriptions into vector representations hd and td. Through dynamic gate mechanism hd and td are integrated with structured representations of head and tail entities from TransE model, hs and ts, to obtain the synthetic representations of the head and tail entities, vh and vt. MCapsE perform representation learning on vh and vt, and the structured representation vr of the relation to obtain the final representations of the head entity, the tail entity and the relation."; The machine learning using a Transformer to obtain the final representations of the head entity, the tail entity, and the relation reads on using a transformer to learn the entities and relations of a head entity-tail entity-relation triple.);
and construct the knowledge graph using the determined m-th relation (Section I, lines 1-4, "A Knowledge Graph (KG) is a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities [1]."; Section I, lines 11-15, "KGs evolved from the Semantic Web [12], [13], the essence of which is a directed graph composed of entities connected by relations. Each edge is a triple of the fact (head entity, relation, tail entity) (denoted as (h; r; t))."; A knowledge graph defined as nodes representing entities of interest and edges representing relations between the entities reads on constructing a knowledge graph using the determined relations.).
Cheng does not specifically disclose: a set transformer.
Lee teaches the use of a set transformer (Section 3, lines 1-7, "In this section, we motivate and describe the Set Transformer: an attention-based neural network that is designed to process sets of data. Similar to other architectures, a Set Transformer consists of an encoder followed by a decoder (cf. Section 2.1), but a distinguishing feature is that each layer in the encoder and decoder attends to their inputs to produce activations.").
Lee teaches using a set transformer in order to model interactions among elements in a set (Abstract, lines 7-17, "We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces computation time of self-attention from quadratic to linear in the number of elements in the set.").
Cheng and Lee are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng to incorporate the teachings of Lee to use a set transformer.  Doing so would allow for modeling interactions among elements in a set.
Cheng in view of Lee does not specifically disclose: project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection.
Y. Lin teaches:
project the i-th entity in an entity space into a m-th relation space to form the i-th projection, project the j-th entity in the entity space into the m-th relation space to form the j-th projection, and determine the m-th relation exists for the i-th entity and the j-th entity if a sum of the i-th projection and the m-relation substantially equals to the j-th projection (Page 2181, right column, lines 28-36, "To address this issue, we propose a new method, which models entities and relations in distinct spaces, i.e., entity space and multiple relation spaces (i.e., relation-specific entity spaces), and performs translation in the corresponding relation space, hence named as TransR. The basic idea of TransR is illustrated in Fig. 1. For each triple (h, r, t), entities in the entity space are first projected into r-relation space as hr and tr with operation Mr, and then hr + r ≈ tr.").
Y. Lin teaches modeling entities in entity space, modeling relations in relation space, and projecting the entities into relation space to determine the relationship between the head entity and tail entity by adding the relation to the head entity projection to find the tail entity projection, in order to learn embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction (Abstract, lines 11-21, "In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH.").
Cheng, Lee, and Y. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee to incorporate the teachings of Y. Lin to model entities in entity space, model relations in relation space, and project the entities into relation space to determine the relationship between the head entity and tail entity by adding the relation to the head entity projection to find the tail entity projection.  Doing so would allow for learning embeddings that improve models to perform the tasks of link prediction, triple classification, and relational fact extraction.
Regarding claim 19, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the non-transitory computer readable medium as claimed in claim 18.
Lee further teaches:
wherein the set transformer comprises an encoder and decoder, the encoder comprises a plurality of self-attention blocks (SABs), and the decoder comprises a pooling by multi-head attention block (PMA) and a plurality of SBAs (Section 3, lines 1-4, "In this section, we motivate and describe the Set Transformer: an attention-based neural network that is designed to process sets of data. Similar to other architectures, a Set Transformer consists of an encoder followed by a decoder"; Section 3.1, lines 19-24, "The MAB is an adaptation of the encoder block of the Transformer (Vaswani et al., 2017) without positional encoding and dropout. Using the MAB, we define the Set Attention Block (SAB) as SAB(X) := MAB(X,X)"; Section 3.2, lines 3-8, "We instead propose to aggregate features by applying multihead attention on a learnable set of k seed vectors S ϵ Rk+d. Let Z ϵ Rn+d be the set of features constructed from an encoder. Pooling by Multihead Attention (PMA) with k seed vectors is defined as PMAk(Z) = MAB(S, rFF(Z))"; Section 3.3, lines 3-4, "The encoder Encoder : X → Z ϵ Rn+d is a stack of SABs or ISABs"; Section 3.3, lines 12-16, "the decoder aggregates them into a single or a set of vectors which is fed into a feed-forward network to get final outputs. Note that PMA with k > 1 seed vectors should be followed by SABs to model the correlation between k outputs.").
Lee teaches using a set transformer with an encoder comprising multiple self-attention blocks (SABs) and a decoder comprising pooling by multi-head attention (PMA) followed by multiple self-attention blocks (SABs), in order to model interactions among elements in a set while reducing computational complexity and computation time (Abstract, lines 7-17, "We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces computation time of self-attention from quadratic to linear in the number of elements in the set.").
Cheng, Lee, and Y. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to further incorporate the teachings of Lee to use a set transformer with an encoder comprising multiple self-attention blocks (SABs) and a decoder comprising pooling by multi-head attention (PMA) followed by multiple self-attention blocks (SABs).  Doing so would allow for modeling interactions among elements in a set while reducing computational complexity and computation time.
Claims 4, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Lee and Y. Lin, and further in view of Francis et al. (“Embedding Images and Sentences in a Common Space with a Recurrent Capsule Network”), hereinafter Francis, and Z. Lin et al. (“A Structured Self-attentive Sentence Embedding”), hereinafter Z. Lin.
Regarding claim 4, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the system as claimed in claim 1, but does not specifically disclose: wherein the computer executable code is configured to convert each of the at least one sentence into the third number of sentence embeddings by: encoding tokens in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence, wherein the tokens comprise words and punctuations; embedding each of the plurality of one-hot vectors into a word embedding.
Francis teaches:
wherein the computer executable code is configured to convert each of the at least one sentence into the third number of sentence embeddings by: encoding tokens in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence, wherein the tokens comprise words and punctuations (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn).");
embedding each of the plurality of one-hot vectors into a word embedding (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn). It is transformed into a list of word embeddings (x1, , xn) through a multiplication by a word embedding matrix Ww.").
Francis teaches representing a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings in order to implement a knowledge graph learning system that can be trained with less amount of data (Abstract, lines 2-5, "In this paper, we propose a domain-independent semi-automatic knowledge graph learning system that can be trained with less amount of data, to identify entities and relations from a large text corpus.").
Cheng, Lee, Y. Lin, and Francis are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Francis to represent a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings.  Doing so would allow for implementing a knowledge graph learning system that can be trained with less amount of data.
Cheng in view of Lee and Y. Lin and further in view of Francis does not specifically disclose: performing LSTM on the word embeddings to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence; and performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings.
Z. Lin teaches:
performing LSTM on the word embeddings to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; Section 2.1, lines 11-12, "Suppose we have a sentence, which has n tokens, represented in a sequence of word embeddings. S = (w1, w2, . . . wn)"; Section 2.1, lines 20-22, "Let the hidden unit number for each unidirectional LSTM be u. For simplicity, we note all the n hts as H, who have the size n-by-2u. H = (h1, h2, . . . hn)"; The hidden states read on feature vectors, and the n number of word embeddings and the n number of hidden states reads on each feature vector corresponding to one of the tokens in the at least one sentence.);
and performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; The hidden states read on feature vectors.).
Z. Lin teaches the use of a long short-term memory neural network (LSTM) to convert word embeddings to hidden states, and the use of a self-attention mechanism to convert the hidden states to a sentence embedding, in order to implement a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks (Abstract, lines 1-10, "This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.").
Cheng, Lee, Y. Lin, Francis, and Z. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis to incorporate the teachings of Z. Lin to use a long short-term memory neural network (LSTM) to convert word embeddings to hidden states, and use a self-attention mechanism to convert the hidden states to a sentence embedding.  Doing so would allow for implementing a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks.
Regarding claim 13, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the method as claimed in claim 11, but does not specifically disclose: wherein the step of converting each of the at least one sentence into a third number of sentence embeddings comprises: encoding tokens in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence; embedding each of the plurality of one-hot vectors into a word embedding.
Francis teaches:
wherein the step of converting each of the at least one sentence into a third number of sentence embeddings comprises: encoding tokens in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn).");
embedding each of the plurality of one-hot vectors into a word embedding (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn). It is transformed into a list of word embeddings (x1, , xn) through a multiplication by a word embedding matrix Ww.").
Francis teaches representing a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings in order to implement a knowledge graph learning system that can be trained with less amount of data (Abstract, lines 2-5, "In this paper, we propose a domain-independent semi-automatic knowledge graph learning system that can be trained with less amount of data, to identify entities and relations from a large text corpus.").
Cheng, Lee, Y. Lin, and Francis are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Francis to represent a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings.  Doing so would allow for implementing a knowledge graph learning system that can be trained with less amount of data.
Cheng in view of Lee and Y. Lin and further in view of Francis does not specifically disclose: performing LSTM on the word embeddings to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence; and performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings.
Z. Lin teaches:
performing LSTM on the word embeddings to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; Section 2.1, lines 11-12, "Suppose we have a sentence, which has n tokens, represented in a sequence of word embeddings. S = (w1, w2, . . . wn)"; Section 2.1, lines 20-22, "Let the hidden unit number for each unidirectional LSTM be u. For simplicity, we note all the n hts as H, who have the size n-by-2u. H = (h1, h2, . . . hn)"; The hidden states read on feature vectors, and the n number of word embeddings and the n number of hidden states reads on each feature vector corresponding to one of the tokens in the at least one sentence.);
and performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; The hidden states read on feature vectors.).
Z. Lin teaches the use of a long short-term memory neural network (LSTM) to convert word embeddings to hidden states, and the use of a self-attention mechanism to convert the hidden states to a sentence embedding, in order to implement a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks (Abstract, lines 1-10, "This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.").
Cheng, Lee, Y. Lin, Francis, and Z. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis to incorporate the teachings of Z. Lin to use a long short-term memory neural network (LSTM) to convert word embeddings to hidden states, and use a self-attention mechanism to convert the hidden states to a sentence embedding.  Doing so would allow for implementing a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks.
Regarding claim 20, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the non-transitory computer readable medium as claimed in claim 18, but does not specifically disclose: wherein the computer executable code is configured to convert each of the at least one sentence into the third number of sentence embeddings by: encoding words in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence; embedding each of the plurality of one-hot vectors into a word embedding.
Francis teaches:
wherein the computer executable code is configured to convert each of the at least one sentence into the third number of sentence embeddings by: encoding words in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn).");
embedding each of the plurality of one-hot vectors into a word embedding (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn). It is transformed into a list of word embeddings (x1, , xn) through a multiplication by a word embedding matrix Ww.").
Francis teaches representing a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings in order to implement a knowledge graph learning system that can be trained with less amount of data (Abstract, lines 2-5, "In this paper, we propose a domain-independent semi-automatic knowledge graph learning system that can be trained with less amount of data, to identify entities and relations from a large text corpus.").
Cheng, Lee, Y. Lin, and Francis are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Francis to represent a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings.  Doing so would allow for implementing a knowledge graph learning system that can be trained with less amount of data.
Cheng in view of Lee and Y. Lin and further in view of Francis does not specifically disclose: performing LSTM on the word embeddings to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence; and performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings.
Z. Lin teaches:
performing LSTM on the word embeddings to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; Section 2.1, lines 11-12, "Suppose we have a sentence, which has n tokens, represented in a sequence of word embeddings. S = (w1, w2, . . . wn)"; Section 2.1, lines 20-22, "Let the hidden unit number for each unidirectional LSTM be u. For simplicity, we note all the n hts as H, who have the size n-by-2u. H = (h1, h2, . . . hn)"; The hidden states read on feature vectors, and the n number of word embeddings and the n number of hidden states reads on each feature vector corresponding to one of the tokens in the at least one sentence.);
and performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; The hidden states read on feature vectors.).
Z. Lin teaches the use of a long short-term memory neural network (LSTM) to convert word embeddings to hidden states, and the use of a self-attention mechanism to convert the hidden states to a sentence embedding, in order to implement a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks (Abstract, lines 1-10, "This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.").
Cheng, Lee, Y. Lin, Francis, and Z. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis to incorporate the teachings of Z. Lin to use a long short-term memory neural network (LSTM) to convert word embeddings to hidden states, and use a self-attention mechanism to convert the hidden states to a sentence embedding.  Doing so would allow for implementing a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Lee, Y. Lin, Francis, and Z. Lin, and further in view of Wang et al. (“Text-Enhanced Representation Learning for Knowledge Graph”), hereinafter Wang.
Regarding claim 5, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee, Y. Lin, Francis, and Z. Lin discloses the system as claimed in claim 4, but does not specifically disclose: wherein the step of embedding each of the plurality of one-hot vectors into the word embedding is performed using word2vec.
Wang teaches:
wherein the step of embedding each of the plurality of one-hot vectors into the word embedding is performed using word2vec (Section 3.2, lines 24-30, "we train a word2vec model [Mikolov et al., 2013] on the entity-annotated text corpus D0 by treating each entity as an ordinary word. Thus we get the node representation x ϵ Rk for each node x in G, because the co-occurrence network is directly generated from D0. Based on these representations, we define the pointwise textual context embedding of xi as the weighted average of the vectors of the nodes in n(xi)"; The node vectors read on the one-hot vectors, and the textual context embedding reads on the word embedding.).
Wang teaches the use of a word2vec model for generating embeddings from node vectors in order to implement a knowledge graph learning method that expands the semantic structure of the knowledge graph (Abstract, lines 11-23, "In this paper, we propose a novel knowledge graph representation learning method by taking advantage of the rich context information in a text corpus. The rich textual context information is incorporated to expand the semantic structure of the knowledge graph and each relation is enabled to own different representations for different head and tail entities to better handle 1-to-N, N-to-1 and N-to-N relations. Experiments on multiple benchmark datasets show that our proposed method successfully addresses the above issues and significantly outperforms the state-of-the-art methods.").
Cheng, Lee, Y. Lin, Francis, Z. Lin, and Wang are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin, Francis, and Z. Lin to incorporate the teachings of Wang to use a word2vec model to generate embeddings from node vectors.  Doing so would allow for implementing a knowledge graph learning method that expands the semantic structure of the knowledge graph.
Claims 6 – 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Lee and Y. Lin, and further in view of Francis, Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”), hereinafter Devlin, and Z. Lin.
Regarding claim 6, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the system as claimed in claim 1, but does not specifically disclose: wherein the computer executable code is configured to convert each of the at least one sentence into the third number of sentence embeddings by: encoding words in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence, wherein the tokens comprise words and punctuations.
Francis teaches:
wherein the computer executable code is configured to convert each of the at least one sentence into the third number of sentence embeddings by: encoding words in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence, wherein the tokens comprise words and punctuations (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn).").
Francis teaches representing a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings in order to implement a knowledge graph learning system that can be trained with less amount of data (Abstract, lines 2-5, "In this paper, we propose a domain-independent semi-automatic knowledge graph learning system that can be trained with less amount of data, to identify entities and relations from a large text corpus.").
Cheng, Lee, Y. Lin, and Francis are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Francis to represent a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings.  Doing so would allow for implementing a knowledge graph learning system that can be trained with less amount of data.
Cheng in view of Lee and Y. Lin and further in view of Francis does not specifically disclose: transforming one-hot vectors by a transformer to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence.
Devlin teaches:
transforming one-hot vectors by a transformer to obtain a plurality of feature vectors, each feature vector corresponding to one of the tokens in the at least one sentence (Abstract, lines 1-4, "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers."; Section 3, lines 63-67, "As shown in Figure 1, we denote input embedding as E, the final hidden vector of the special [CLS] token as C ϵ RH, and the final hidden vector for the ith input token as Ti ϵ RH."; The hidden vectors read on the feature vectors, and generating a hidden vector for each input token reads on each feature vector corresponding to one of the tokens.).
Devlin teaches the use of a transformer to generate hidden vectors from input tokens in order to generate bidirectional representations from unlabeled text for use in question answering and language inference tasks (Abstract, lines 6-15, "BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.").
Cheng, Lee, Y. Lin, Francis, and Devlin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis to incorporate the teachings of Devlin to use a transformer to generate hidden vectors from input tokens.  Doing so would allow for generating bidirectional representations from unlabeled text for use in question answering and language inference tasks.
Cheng in view of Lee and Y. Lin and further in view of Francis and Devlin does not specifically disclose: performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings.
Z. Lin teaches:
performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; The hidden states read on feature vectors.).
Z. Lin teaches the use of a self-attention mechanism to convert the hidden states to a sentence embedding in order to implement a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks (Abstract, lines 1-10, "This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.").
Cheng, Lee, Y. Lin, Francis, Devlin, and Z. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis and Devlin to incorporate the teachings of Z. Lin to use a self-attention mechanism to convert the hidden states to a sentence embedding.  Doing so would allow for implementing a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks.
Regarding claim 7, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin and further in view of Francis, Devlin, and Z. Lin discloses the system as claimed in claim 6.
Devlin further teaches:
wherein the transformer comprises bidirectional encoder representations from transformers (BERT) (Abstract, lines 1-4, "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers."; Section 3, lines 63-67, "As shown in Figure 1, we denote input embedding as E, the final hidden vector of the special [CLS] token as C ϵ RH, and the final hidden vector for the ith input token as Ti ϵ RH."; The hidden vectors read on the feature vectors, and generating a hidden vector for each input token reads on each feature vector corresponding to one of the tokens.).
Devlin teaches the use of a bidirectional encoder representations from transformers (BERT) model to generate hidden vectors from input tokens in order to generate bidirectional representations from unlabeled text for use in question answering and language inference tasks (Abstract, lines 6-15, "BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.").
Cheng, Lee, Y. Lin, Francis, Devlin, and Z. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis, Devlin, and Z. Lin to further incorporate the teachings of Devlin to use a bidirectional encoder representations from transformers (BERT) model to generate hidden vectors from input tokens.  Doing so would allow for generating bidirectional representations from unlabeled text for use in question answering and language inference tasks.
Regarding claim 14, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the method as claimed in claim 11, but does not specifically disclose: wherein the step of converting each of the at least one sentence into a third number of sentence embeddings comprises: encoding tokens in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence.
Francis teaches:
wherein the step of converting each of the at least one sentence into a third number of sentence embeddings comprises: encoding tokens in the at least one sentence into a plurality of one-hot vectors, each of the plurality of one-hot vectors corresponding to one of the tokens in the at least one sentence (Figure 3, lines 1-2, "In the sentence embedding part, a sentence is represented by a sequence of one-hot vectors (w1, , wn).").
Francis teaches representing a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings in order to implement a knowledge graph learning system that can be trained with less amount of data (Abstract, lines 2-5, "In this paper, we propose a domain-independent semi-automatic knowledge graph learning system that can be trained with less amount of data, to identify entities and relations from a large text corpus.").
Cheng, Lee, Y. Lin, and Francis are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Francis to represent a sentence by a sequence of one-hot vectors and transforming the sequence one-hot vectors to word embeddings.  Doing so would allow for implementing a knowledge graph learning system that can be trained with less amount of data.
Cheng in view of Lee and Y. Lin and further in view of Francis does not specifically disclose: transforming one-hot vectors by a transformer to obtain a plurality of feature vectors, each feature vector corresponding to one of the words in the at least one sentence.
Devlin teaches:
transforming one-hot vectors by a transformer to obtain a plurality of feature vectors, each feature vector corresponding to one of the words in the at least one sentence (Abstract, lines 1-4, "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers."; Section 3, lines 63-67, "As shown in Figure 1, we denote input embedding as E, the final hidden vector of the special [CLS] token as C ϵ RH, and the final hidden vector for the ith input token as Ti ϵ RH."; The hidden vectors read on the feature vectors, and generating a hidden vector for each input token reads on each feature vector corresponding to one of the tokens.).
Devlin teaches the use of a transformer to generate hidden vectors from input tokens in order to generate bidirectional representations from unlabeled text for use in question answering and language inference tasks (Abstract, lines 6-15, "BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.").
Cheng, Lee, Y. Lin, Francis, and Devlin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis to incorporate the teachings of Devlin to use a transformer to generate hidden vectors from input tokens.  Doing so would allow for generating bidirectional representations from unlabeled text for use in question answering and language inference tasks.
Cheng in view of Lee and Y. Lin and further in view of Francis and Devlin does not specifically disclose: performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings.
Z. Lin teaches:
performing a self-structure attention on the plurality of feature vectors to obtain the third number of sentence embeddings (Section 2.1, lines 1-5, "The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence."; The hidden states read on feature vectors.).
Z. Lin teaches the use of a self-attention mechanism to convert the hidden states to a sentence embedding in order to implement a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks (Abstract, lines 1-10, "This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.").
Cheng, Lee, Y. Lin, Francis, Devlin, and Z. Lin are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin and further in view of Francis and Devlin to incorporate the teachings of Z. Lin to use a self-attention mechanism to convert the hidden states to a sentence embedding.  Doing so would allow for implementing a sentence embedding method that improves performance of author profiling, sentiment classification, and textual entailment tasks.
Claims 8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Lee and Y. Lin, and further in view of Xu et al. (“Product Knowledge Graph Embedding for E-commerce”), hereinafter Xu.
Regarding claim 8, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the system as claimed in claim 1, but does not specifically disclose: wherein the plurality of documents are product descriptions, the entities are a plurality of products, the relations comprise a fitting relation between the plurality of products, and the computer executable code is further configured to: upon receiving a query product, query the knowledge graph using the query product to obtain a query entity corresponding to the query product and a fitting entity having the fitting relation to the query entity; and provide a fitting product corresponding to the fitting entity.
Xu teaches:
wherein the plurality of documents are product descriptions (Page 673, right column, lines 4-7, "In e-commerce, available data sources often include customer view, purchase, search, substitution records, as well as product descriptions and hierarchical category information."),
the entities are a plurality of products (Page 672, right column, lines 19-22, "By treating products, words and category labels as entities and relations as edges, the multi-relation product knowledge can be efficiently summarized by product knowledge graph like Figure 1b."),
the relations comprise a fitting relation between the plurality of products (Page 672, right column, lines 19-22, "By treating products, words and category labels as entities and relations as edges, the multi-relation product knowledge can be efficiently summarized by product knowledge graph like Figure 1b."; Page 672, right column, lines 1-3, "Product relations, including complement (co buy), co-view and substitute, are central for marketing, advertising and recommendation."; Page 672, right column, lines 9-15, "We use the search and describe relation to summarizing interactions between natural language and products. On top of the descriptions, products are often grouped into hierarchical categories shown in Figure 1a, which motivates the IsA relationship. In this paper, we focus on the above six key relations for product knowledge, which should satisfy most e-commerce applications."; The six relations for product knowledge used for the product knowledge graph relations (complement, co-view, substitute, search, describe, and IsA) read on the fitting relation.),
and the computer executable code is further configured to: upon receiving a query product, query the knowledge graph using the query product to obtain a query entity corresponding to the query product and a fitting entity having the fitting relation to the query entity (Page 674, left column, lines 43-48, "As for PKG, product knowledge completion is notably more important due to the sparsity issue in e-commerce data. Relation extraction and question answering can also find their counterparts in e-commerce settings, such as user understanding and searching. A key downstream application for PKG is recommender system."; Page 674, left column, lines 16-21, "Relations in PKG are semantically more complicated, as we illustrate with the below examples of complement for TV: • (Remote control, complement, TV): accessory; • (TV mount frame, complement, TV): structural attachment; • (Audio speaker, complement, TV): enhancement; • (HDMI Cable switcher, complement, TV): add-on."; Relation extraction and question answering reads on querying the knowledge graph, and the example of using a complement relation to identify a remote control as an accessory to a TV reads on obtaining a query entity corresponding to the query product and a fitting entity having the fitting relation to the query entity.);
and provide a fitting product corresponding to the fitting entity (Page 674, left column, lines 16-21, "Relations in PKG are semantically more complicated, as we illustrate with the below examples of complement for TV: • (Remote control, complement, TV): accessory; • (TV mount frame, complement, TV): structural attachment; • (Audio speaker, complement, TV): enhancement; • (HDMI Cable switcher, complement, TV): add-on."; The example of using a complement relation to identify a remote control as an accessory to a TV reads on providing a fitting product corresponding to the fitting entity.).
Xu teaches the use of a product knowledge graph, trained on product descriptions, with entities representing products and relations representing relations between products, to query a product to find a related product, in order to learn intrinsic product relations as product knowledge for e-commerce applications including marketing, advertisement, search ranking, and recommendation (Abstract, lines 1-6, "In this paper, we propose a new product knowledge graph (PKG) embedding approach for learning the intrinsic product relations as product knowledge for e-commerce. We define the key entities and summarize the pivotal product relations that are critical for general e-commerce applications including marketing, advertisement, search ranking and recommendation.").
Cheng, Lee, Y. Lin, and Xu are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Xu to use a product knowledge graph, trained on product descriptions, with entities representing products and relations representing relations between products, to query a product to find a related product.  Doing so would allow for learning intrinsic product relations as product knowledge for e-commerce applications including marketing, advertisement, search ranking, and recommendation.
Regarding claim 15, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the method as claimed in claim 11, but does not specifically disclose: wherein the plurality of documents are product descriptions, the entities are a plurality of products, the relations comprise a fitting relation between the plurality of products, and the computer executable code is further configured to: upon receiving a query product, query the knowledge graph using the query product to obtain a query entity corresponding to the query product and a fitting entity having the fitting relation to the query entity; and provide a fitting product corresponding to the fitting entity.
Xu teaches:
wherein the plurality of documents are product descriptions (Page 673, right column, lines 4-7, "In e-commerce, available data sources often include customer view, purchase, search, substitution records, as well as product descriptions and hierarchical category information."),
the entities are a plurality of products (Page 672, right column, lines 19-22, "By treating products, words and category labels as entities and relations as edges, the multi-relation product knowledge can be efficiently summarized by product knowledge graph like Figure 1b."),
the relations comprise a fitting relation between the plurality of products (Page 672, right column, lines 19-22, "By treating products, words and category labels as entities and relations as edges, the multi-relation product knowledge can be efficiently summarized by product knowledge graph like Figure 1b."; Page 672, right column, lines 1-3, "Product relations, including complement (co buy), co-view and substitute, are central for marketing, advertising and recommendation."; Page 672, right column, lines 9-15, "We use the search and describe relation to summarizing interactions between natural language and products. On top of the descriptions, products are often grouped into hierarchical categories shown in Figure 1a, which motivates the IsA relationship. In this paper, we focus on the above six key relations for product knowledge, which should satisfy most e-commerce applications."; The six relations for product knowledge used for the product knowledge graph relations (complement, co-view, substitute, search, describe, and IsA) read on the fitting relation.),
and the computer executable code is further configured to: upon receiving a query product, query the knowledge graph using the query product to obtain a query entity corresponding to the query product and a fitting entity having the fitting relation to the query entity (Page 674, left column, lines 43-48, "As for PKG, product knowledge completion is notably more important due to the sparsity issue in e-commerce data. Relation extraction and question answering can also find their counterparts in e-commerce settings, such as user understanding and searching. A key downstream application for PKG is recommender system."; Page 674, left column, lines 16-21, "Relations in PKG are semantically more complicated, as we illustrate with the below examples of complement for TV: • (Remote control, complement, TV): accessory; • (TV mount frame, complement, TV): structural attachment; • (Audio speaker, complement, TV): enhancement; • (HDMI Cable switcher, complement, TV): add-on."; Relation extraction and question answering reads on querying the knowledge graph, and the example of using a complement relation to identify a remote control as an accessory to a TV reads on obtaining a query entity corresponding to the query product and a fitting entity having the fitting relation to the query entity.);
and provide a fitting product corresponding to the fitting entity (Page 674, left column, lines 16-21, "Relations in PKG are semantically more complicated, as we illustrate with the below examples of complement for TV: • (Remote control, complement, TV): accessory; • (TV mount frame, complement, TV): structural attachment; • (Audio speaker, complement, TV): enhancement; • (HDMI Cable switcher, complement, TV): add-on."; The example of using a complement relation to identify a remote control as an accessory to a TV reads on providing a fitting product corresponding to the fitting entity.).
Xu teaches the use of a product knowledge graph, trained on product descriptions, with entities representing products and relations representing relations between products, to query a product to find a related product, in order to learn intrinsic product relations as product knowledge for e-commerce applications including marketing, advertisement, search ranking, and recommendation (Abstract, lines 1-6, "In this paper, we propose a new product knowledge graph (PKG) embedding approach for learning the intrinsic product relations as product knowledge for e-commerce. We define the key entities and summarize the pivotal product relations that are critical for general e-commerce applications including marketing, advertisement, search ranking and recommendation.").
Cheng, Lee, Y. Lin, and Xu are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Xu to use a product knowledge graph, trained on product descriptions, with entities representing products and relations representing relations between products, to query a product to find a related product.  Doing so would allow for learning intrinsic product relations as product knowledge for e-commerce applications including marketing, advertisement, search ranking, and recommendation.
Claims 9 – 10 and 16 – 17 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Lee and Y. Lin, and further in view Oltramari et al. (US Patent Application Publication No. 2021/0303990), hereinafter Oltramari.
Regarding claim 9, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the system as claimed in claim 1, but does not specifically disclose: wherein the plurality of documents are product question and answers, the entities are a plurality of product questions and a plurality of product answers, the relations comprise a true relation linking one of the plurality of product answers to one of the plurality of product questions, and the computer executable code is further configured to: upon receiving a product question, query the knowledge graph using the product question to obtain one of the product question entities, and obtain one of the product answer entities having the true relation to the one of the product question entities; and provide an answer corresponding to the one of the product answer entities.
Oltramari teaches:
wherein the plurality of documents are product question and answers (Paragraph 0038, lines 11-17, " According to an example, HMI 14a may be configured and/or programmed to provide product or service information to a user, and when the user poses a query to the HMI 14a (e.g., via speech, typed text, or the like), the HMI 14a may provide an accurate answer using dialogue computer 10 which may be located remotely from HMI 14a."; Paragraph 0040, lines 1-3, "According to at least one example, the structured data 42 comprises at least a question-and-answer (Q&A) pair feature set."; Providing product or service information to a user reads on product question and answers, and the question-and-answer (Q&A) pair feature set reads on product question and answer documents.),
the entities are a plurality of product questions and a plurality of product answers, the relations comprise a true relation linking one of the plurality of product answers to one of the plurality of product questions (Paragraph 0043, lines 1-11, "The various knowledge types of the knowledge graph 46 may be comprised of triples which are interconnected to form data structure. According to an example, a triple may comprise a subject element, a relationship element, and an object element. According to an example, knowledge graph 46 may be configured to improve the answer determination of the language model 44 - e.g., the triples may comprise a subject of a sentence (subject element), a predicate of a sentence (relationship element), and an object of the sentence (object element), wherein the object is part of the predicate of the sentence."; The subject elements read on the product question entities, the object elements read on the product answer entities, and the relationship elements read on the true relations linking product question entities to product answer entities.),
and the computer executable code is further configured to: upon receiving a product question, query the knowledge graph using the product question to obtain one of the product question entities, and obtain one of the product answer entities having the true relation to the one of the product question entities (Paragraph 0060, lines 1-12, "In block 720, computer 10 may determine one or more triples from knowledge graph 46 using the query. According to a non-limiting example, the computer 10 may evaluate the query sentence, and based on the evaluation, computer 10 may determine at least one associated knowledge type (e.g., one of a declarative commonsense knowledge type, a taxonomic knowledge type, a relational knowledge type, a procedural knowledge type, a sentiment knowledge type, a metaphorical knowledge type, or other suitable type) and provide one or more triples from the knowledge graph 46 based on one or more appropriate knowledge types."; Paragraph 0044, lines 1-3, "As shown in FIGS. 4 and 5, the triples may be injected into the language model 44 to assist in the determination of an answer to the user's query."; The query sentence reads on the product question, and the knowledge graph triples read on the product question entity, the product answer entity, and the relation between the product question entity and the product answer entity.);
and provide an answer corresponding to the one of the product answer entities (Paragraph 0046, lines 1-6, "According to an example (see FIGS. 4-5), output node values of at least some of the output nodes j27-j30 are provided to an output selection 48. Output selection 48 is configured to determine which of the answers provided by the output nodes should be selected as an answer the user's query.").
Oltramari teaches the use of a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product questions, in order to improve the user experience by providing more accurate responses to user queries (Paragraph 0024, lines 24-28, "The dialogue computer 10 described herein improves the user experience - e.g., by providing more accurate responses to user queries, users are less likely to become frustrated with a system that provides a computer-generated response.").
Cheng, Lee, Y. Lin, and Oltramari are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Oltramari to use a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product questions.  Doing so would allow for improving the user experience by providing more accurate responses to user queries.
Regarding claim 10, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the system as claimed in claim 1, but does not specifically disclose: wherein the plurality of documents are product service requests and answers, the entities are a plurality of product service request entities and a plurality of product service answer entities, the relations comprise a true relation linking one of the plurality of product service request entities to one of the plurality of product service answer entities, and the computer executable code is further configured to: upon receiving a product service request, query the knowledge graph using the product service request to obtain one of the product service request entities, and obtain one of the product service answer entities having the true relation to the one of the product service request entities; and provide an answer corresponding to the one of the product service answer entities.
Oltramari teaches:
wherein the plurality of documents are product service requests and answers (Paragraph 0022, line 5-7, "In other embodiments, the query may pertain to a predefined category of information (e.g., customer technical support for a product or service)."; Paragraph 0040, lines 1-3, "According to at least one example, the structured data 42 comprises at least a question-and-answer (Q&A) pair feature set."; Customer technical support for a product or service reads on product service, and the question-and-answer (Q&A) pair feature set reads on product service requests and answers documents.),
the entities are a plurality of product service request entities and a plurality of product service answer entities, the relations comprise a true relation linking one of the plurality of product service request entities to one of the plurality of product service answer entities (Paragraph 0043, lines 1-11, "The various knowledge types of the knowledge graph 46 may be comprised of triples which are interconnected to form data structure. According to an example, a triple may comprise a subject element, a relationship element, and an object element. According to an example, knowledge graph 46 may be configured to improve the answer determination of the language model 44 - e.g., the triples may comprise a subject of a sentence (subject element), a predicate of a sentence (relationship element), and an object of the sentence (object element), wherein the object is part of the predicate of the sentence."; The subject elements read on the product service request entities, the object elements read on the product service answer entities, and the relationship elements read on the true relations linking product service request entities to product service answer entities.),
and the computer executable code is further configured to: upon receiving a product service request, query the knowledge graph using the product service request to obtain one of the product service request entities, and obtain one of the product service answer entities having the true relation to the one of the product service request entities (Paragraph 0060, lines 1-12, "In block 720, computer 10 may determine one or more triples from knowledge graph 46 using the query. According to a non-limiting example, the computer 10 may evaluate the query sentence, and based on the evaluation, computer 10 may determine at least one associated knowledge type (e.g., one of a declarative commonsense knowledge type, a taxonomic knowledge type, a relational knowledge type, a procedural knowledge type, a sentiment knowledge type, a metaphorical knowledge type, or other suitable type) and provide one or more triples from the knowledge graph 46 based on one or more appropriate knowledge types."; Paragraph 0044, lines 1-3, "As shown in FIGS. 4 and 5, the triples may be injected into the language model 44 to assist in the determination of an answer to the user's query."; The query sentence reads on the product service request, and the knowledge graph triples read on the product service request entity, the product service answer entity, and the relation between the product service request entity and the product service answer entity.);
and provide an answer corresponding to the one of the product service answer entities (Paragraph 0046, lines 1-6, "According to an example (see FIGS. 4-5), output node values of at least some of the output nodes j27-j30 are provided to an output selection 48. Output selection 48 is configured to determine which of the answers provided by the output nodes should be selected as an answer the user's query.").
Oltramari teaches the use of a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product customer service questions, in order to improve the user experience by providing more accurate responses to user queries (Paragraph 0024, lines 24-28, "The dialogue computer 10 described herein improves the user experience - e.g., by providing more accurate responses to user queries, users are less likely to become frustrated with a system that provides a computer-generated response.").
Cheng, Lee, Y. Lin, and Oltramari are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Oltramari to use a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product customer service questions.  Doing so would allow for improving the user experience by providing more accurate responses to user queries.
Regarding claim 16, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the method as claimed in claim 11, but does not specifically disclose: wherein the plurality of documents are product question and answers, the entities are a plurality of product questions and a plurality of product answers, the relations comprise a true relation linking one of the plurality of product answers to one of the plurality of product questions, and the computer executable code is further configured to: upon receiving a product question, query the knowledge graph using the product question to obtain one of the product question entities, and obtain one of the product answer entities having the true relation to the one of the product question entities; and provide an answer corresponding to the one of the product answer entities.
Oltramari teaches:
wherein the plurality of documents are product question and answers (Paragraph 0038, lines 11-17, " According to an example, HMI 14a may be configured and/or programmed to provide product or service information to a user, and when the user poses a query to the HMI 14a (e.g., via speech, typed text, or the like), the HMI 14a may provide an accurate answer using dialogue computer 10 which may be located remotely from HMI 14a."; Paragraph 0040, lines 1-3, "According to at least one example, the structured data 42 comprises at least a question-and-answer (Q&A) pair feature set."; Providing product or service information to a user reads on product question and answers, and the question-and-answer (Q&A) pair feature set reads on product question and answer documents.),
the entities are a plurality of product questions and a plurality of product answers, the relations comprise a true relation linking one of the plurality of product answers to one of the plurality of product questions (Paragraph 0043, lines 1-11, "The various knowledge types of the knowledge graph 46 may be comprised of triples which are interconnected to form data structure. According to an example, a triple may comprise a subject element, a relationship element, and an object element. According to an example, knowledge graph 46 may be configured to improve the answer determination of the language model 44 - e.g., the triples may comprise a subject of a sentence (subject element), a predicate of a sentence (relationship element), and an object of the sentence (object element), wherein the object is part of the predicate of the sentence."; The subject elements read on the product question entities, the object elements read on the product answer entities, and the relationship elements read on the true relations linking product question entities to product answer entities.),
and the computer executable code is further configured to: upon receiving a product question, query the knowledge graph using the product question to obtain one of the product question entities, and obtain one of the product answer entities having the true relation to the one of the product question entities (Paragraph 0060, lines 1-12, "In block 720, computer 10 may determine one or more triples from knowledge graph 46 using the query. According to a non-limiting example, the computer 10 may evaluate the query sentence, and based on the evaluation, computer 10 may determine at least one associated knowledge type (e.g., one of a declarative commonsense knowledge type, a taxonomic knowledge type, a relational knowledge type, a procedural knowledge type, a sentiment knowledge type, a metaphorical knowledge type, or other suitable type) and provide one or more triples from the knowledge graph 46 based on one or more appropriate knowledge types."; Paragraph 0044, lines 1-3, "As shown in FIGS. 4 and 5, the triples may be injected into the language model 44 to assist in the determination of an answer to the user's query."; The query sentence reads on the product question, and the knowledge graph triples read on the product question entity, the product answer entity, and the relation between the product question entity and the product answer entity.);
and provide an answer corresponding to the one of the product answer entities (Paragraph 0046, lines 1-6, "According to an example (see FIGS. 4-5), output node values of at least some of the output nodes j27-j30 are provided to an output selection 48. Output selection 48 is configured to determine which of the answers provided by the output nodes should be selected as an answer the user's query.").
Oltramari teaches the use of a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product questions, in order to improve the user experience by providing more accurate responses to user queries (Paragraph 0024, lines 24-28, "The dialogue computer 10 described herein improves the user experience - e.g., by providing more accurate responses to user queries, users are less likely to become frustrated with a system that provides a computer-generated response.").
Cheng, Lee, Y. Lin, and Oltramari are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Oltramari to use a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product questions.  Doing so would allow for improving the user experience by providing more accurate responses to user queries.
Regarding claim 17, as best understood based on the 35 U.S.C. 112(b) issues identified above, Cheng in view of Lee and Y. Lin discloses the method as claimed in claim 11, but does not specifically disclose: wherein the plurality of documents are product service requests and answers, the entities are a plurality of product service request entities and a plurality of product service answer entities, the relations comprise a true relation linking one of the plurality of product service request entities to one of the plurality of product service answer entities, and the computer executable code is further configured to: upon receiving a product service request, query the knowledge graph using the product service request to obtain one of the product service request entities, and obtain one of the product service answer entities having the true relation to the one of the product service request entities; and provide an answer corresponding to the one of the product service answer entities.
Oltramari teaches:
wherein the plurality of documents are product service requests and answers (Paragraph 0022, line 5-7, "In other embodiments, the query may pertain to a predefined category of information (e.g., customer technical support for a product or service)."; Paragraph 0040, lines 1-3, "According to at least one example, the structured data 42 comprises at least a question-and-answer (Q&A) pair feature set."; Customer technical support for a product or service reads on product service, and the question-and-answer (Q&A) pair feature set reads on product service requests and answers documents.),
the entities are a plurality of product service request entities and a plurality of product service answer entities, the relations comprise a true relation linking one of the plurality of product service request entities to one of the plurality of product service answer entities (Paragraph 0043, lines 1-11, "The various knowledge types of the knowledge graph 46 may be comprised of triples which are interconnected to form data structure. According to an example, a triple may comprise a subject element, a relationship element, and an object element. According to an example, knowledge graph 46 may be configured to improve the answer determination of the language model 44 - e.g., the triples may comprise a subject of a sentence (subject element), a predicate of a sentence (relationship element), and an object of the sentence (object element), wherein the object is part of the predicate of the sentence."; The subject elements read on the product service request entities, the object elements read on the product service answer entities, and the relationship elements read on the true relations linking product service request entities to product service answer entities.),
and the computer executable code is further configured to: upon receiving a product service request, query the knowledge graph using the product service request to obtain one of the product service request entities, and obtain one of the product service answer entities having the true relation to the one of the product service request entities (Paragraph 0060, lines 1-12, "In block 720, computer 10 may determine one or more triples from knowledge graph 46 using the query. According to a non-limiting example, the computer 10 may evaluate the query sentence, and based on the evaluation, computer 10 may determine at least one associated knowledge type (e.g., one of a declarative commonsense knowledge type, a taxonomic knowledge type, a relational knowledge type, a procedural knowledge type, a sentiment knowledge type, a metaphorical knowledge type, or other suitable type) and provide one or more triples from the knowledge graph 46 based on one or more appropriate knowledge types."; Paragraph 0044, lines 1-3, "As shown in FIGS. 4 and 5, the triples may be injected into the language model 44 to assist in the determination of an answer to the user's query."; The query sentence reads on the product service request, and the knowledge graph triples read on the product service request entity, the product service answer entity, and the relation between the product service request entity and the product service answer entity.);
and provide an answer corresponding to the one of the product service answer entities (Paragraph 0046, lines 1-6, "According to an example (see FIGS. 4-5), output node values of at least some of the output nodes j27-j30 are provided to an output selection 48. Output selection 48 is configured to determine which of the answers provided by the output nodes should be selected as an answer the user's query.").
Oltramari teaches the use of a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product customer service questions, in order to improve the user experience by providing more accurate responses to user queries (Paragraph 0024, lines 24-28, "The dialogue computer 10 described herein improves the user experience - e.g., by providing more accurate responses to user queries, users are less likely to become frustrated with a system that provides a computer-generated response.").
Cheng, Lee, Y. Lin, and Oltramari are considered to be analogous to the claimed invention because they are in the same field of machine learning systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cheng in view of Lee and Y. Lin to incorporate the teachings of Oltramari to use a product knowledge graph, trained on question and answers, with entities representing questions and answers and relations representing connections between questions and answers, to answer product customer service questions.  Doing so would allow for improving the user experience by providing more accurate responses to user queries.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657