DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendments and Arguments
Regarding a rejection to claims 1-20 under 35 U.S.C. §102(a)(1), applicant amended independent claims 1, 12 and 18 by adding several new limitations. Applicant argued (Remarks, pages 8-11) that the cited Tang (“HIN: Hierarchical Inference Network for Document-Level Relation Extraction”, submitted by the applicant in an IDS, published in May 2020) fails to teach the newly added limitations in the amended independent claims 1, 12 and 18. 

The examiner has reviewed prior art of the record and performed an update search. The examiner believed that Tang implicitly disclosed the argued limitations because Tang refers people to see Yao et al. (“DocRED: A Large-Scale Document-Level Relation Extraction Dataset”, 2019) for more details about relation extraction using bilinear functions.   

Yao reference was also submitted by the applicant in an IDS filed on 08/26/2020. Yao discloses calculating probabilities for different relation type for each entity pairs using bilinear functions with a sigmoid activation function (Yao, Section 5, equation 1 and 2). 

    PNG
    media_image1.png
    123
    624
    media_image1.png
    Greyscale

Please note in Eq. 1 and 2, Wr and br are relation type dependent trainable parameters. There are many relation types (Yao, Section 5, different supporting evidence types, section 2.1, a large number of fine-grained relation types; Table 1, shows DocRED dataset has 96 relation types). 

The claimed “a first bilinear layer”, “a second bilinear layer” and “a third bilinear layer” are just different bilinear functions for different relation types such as “a head entity – tail entity”, “sentence relation” or “a combination of sentence relation with attention”. Yao disclosed using many different bilinear functions because of many different relation types.

 Applicant’s arguments with respect to amended independent claims have been considered but are moot because the arguments do not apply to a rejection under 35 U.S.C. §103 based on the combined references.


	Claim Rejections - 35 USC § 103
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tang et al. (“HIN: Hierarchical Inference Network for Document-Level Relation Extraction”, applicant submitted IDS, published in May 2020, referred to as Tang) in view of Yao et al. (“DocRED: A Large-Scale Document-Level Relation Extraction Dataset”, applicant submitted IDS, published in 2019, referred to as Yao).

Tang discloses using hierarchical inference network (HIN) for extracting relation between entities at document levels (Introduction). For comparison, Tang refers interested people to read Yao reference about predicting relations using bilinear functions (Tang, 4.2, Section 5, related work). 

Yao discloses a large-scale document level relation extraction dataset. Yao further discloses calculating probability for each relation type for each entity pairs (Section 5).

Regarding claim 1, 12, 18, Tang teaches a system comprising a computing device, the computing device comprising a processer and a storage device storing computer executable code, (see section 4, p. 7-8 where the model is computationally executed on the stored DocRED dataset) wherein the computer executable code, when executed at the processor, is configured to: provide a head entity and a document containing the head entity; (see p. 3, section 2, para. 1, where “The RE model is given a relation candidate (Ea,Eb,D) and expected to output the relations between Ea and Eb, where Ea and Eb are entities in the document D.”)
process the head entity and the document by a language model to obtain head extraction corresponding to the head entity, tail extractions corresponding to tail entities in the document, (see p. 4, section 3.1, which describes information extracted corresponding to the entities, such as entity type embeddings: “utilize the entity type information to enrich the representation of the input.”) and sentence extraction corresponding to sentences in the document; (see p. 2, section 1, para. 3, where “sentence-level inference information represents the information related to relational facts in each sentence, document-level inference information aggregates all the necessary information in supporting sentences (sentence 1, 3 and 8) and discards information in noise sentences.”) 
predict a head entity-tail entity relation between the head extraction and the tail extractions using a first bilinear layer; (see p. 2, section 1, para. 4, “Specifically, inspired by translation constraint [1], which models a relational fact r(eh, et) with eh+ r ≈ et, we apply this translation constraint to target entity pair. Besides, a bi-affine layer is also used to obtain bilinear representation for the target entity pair.”, and see also p. 4,  section 3.2, para. 1, “we compute the entity-level inference information for target entity pair.” where the entity-level inference information is the predicted relation)
combine the sentence extraction and a relation vector corresponding to the head extraction and the tail extraction using a second bilinear layer to obtain a sentence-relation combination; (see p. 6, section 3.3, para. 5, for sentence-level inference representation Isj: “comparing sentence vector Sj with entity-level inference representation Ie, we can derive sentence-level inference representation Isj for the j-th sentence”)
predict an evidence sentence from the document using a third bilinear layer based on the sentence-relation combination and attention extracted from the language model, wherein the evidence sentence supports the head-tail relation; (see p. 6, section 3.3, para. 5, “We build a BiLSTM followed by the attention network on top of the sentence-level inference vectors (Is) to aggregate all essential evidence information scattered in different sentences”)
and update parameters of the language model and the first, second and third bilinear layers based on the predicted head entity-tail entity relation, the predicted evidence sentence (see p. 5, section 3.2, para. 3, where “learnable projection matrices corre-sponding to the k-th subspace. For each of these projected versions, we perform the entity-level inference in parallel.” and p. 6-7, section 3.3, para. 3 and 7, where there are learnable parameters in the function of the layer to “aggregate the representations of those informative words to form a sentence vector.” and to “aggregate all essential evidence information scattered in different sentences:”), 
and labels of the document containing true head entity-tail entity relation and true evidence sentence (see p. 7, section 3.4, para. 2, where “A binary label vector y is set to indicate the set of true relations holding between the entity pair, where 1 means a relation is in the set, and 0 otherwise.” and the vector is used in training loss).  
	
        Tang does not explicitly disclose more details about using different bilinear layers with sigmoid activation function and refers interested people to read Yao’s reference. 

Yao discloses using various bilinear functions for many relation types (Yao, Section 5, different supporting evidence types, section 2.1, a large number of fine-grained relation types; Table 1, shows DocRED dataset used in their experiments have 96 relation types).

The claimed “a first bilinear layer”, “a second bilinear layer” and “a third bilinear layer” are just different bilinear functions for different relation types such as “a head entity – tail entity”, “sentence relation” or “a combination of sentence relation with attention”. Yao discloses using sigmoid activation functions (See Eq. 2 in section 5). 

         It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to combine Tang’s teaching with Yao’s teaching to use three different bilinear layers with sigmoid activation functions. One having ordinary skill in the art would have been motivated to make such a modification to improve performance of relation extraction at a document level (Yao, section 7, conclusion). 

Regarding claim 2, Tang in view Yao further teaches wherein the language model and the first, second and third bilinear layers are trained using a plurality of labeled documents, at least one of the labeled documents has E number of entities, the at least one of the labeled documents is expanded to E number of samples, (see p. 7, section 4.1, para. 1, for “DocRED dataset [15], which is the largest human-annotated document-level RE dataset constructed from Wikidata and Wikipedia. DocRED contains over 5,053 documents, 40,276 sentences, 132,375 entities and 96 frequent relation types.”)
each of the E number of samples comprises the at least one of the labeled documents and a head entity corresponding to one of the E number of entities, and E is a positive integer. (see p. 2, Fig 1, for an example where DocRED comprises a head entity (eg. labeled as the Subject) corresponding to one of the plurality of entities)  

Regarding claim 3, Tang in view Yao further discloses the computer executable code is configured to update the parameters based on a loss function, the loss function is defined by Loss = LRE + λ1 * LαEvi , LRE is relation prediction loss, LαEvi, is sentence prediction loss, λ1, is a weight factor having a value equals to or greater than 0.  (see p. 7, Equation 15, for the binary cross entropy losswhere “yr ϵ {0, 1} is the true value on label r and l is the number of relations… threshold is used to control the number of extracted relational facts on test set.”)

Regarding claim 4, Tang in view Yao further teaches wherein the language model comprises at least one of generative pre-training model (GPT), GPT-2, bidirectional encoder representations from transformers (BERT), robustly optimized BERT approach (roBERTa), and reparameterized transformer-XL network (XLnet).  (see p.1, Abstract, “We also demonstrate that using BERT representations can further substantially boost the performance.”)

Regarding claim 5, Tang in view Yao further teaches wherein the computer executable code is configured to extract the attention from last 2 to 5 layers of the language model.  (see p. 4, Fig. 2, where the attention is extracted from last 3 layers, and see p. 6, section 3.3, para. 2-5, where the attention mechanism used to “to enable our model to selectively assign higher weights for the informative words and lower weights for the other words.” and to “distinguish crucial sentence-level inference information for overall document-level inference representation”)

Regarding claim 6, Tang in view Yao further teaches wherein the computer executable code is configured to extract the attention from last 3 layers of the language model.  (see p. 4, Fig. 2, where the attention is extracted from last 3 layers, and see p. 6, section 3.3, para. 2-5, where the attention mechanism used to “to enable our model to selectively assign higher weights for the informative words and lower weights for the other words.” and to “distinguish crucial sentence-level inference information for overall document-level inference representation”)

Regarding claim 7, 8, 9, 16, and 20, Tang in view Yao further teaches wherein the first bilinear layer is defined by 
    PNG
    media_image2.png
    28
    31
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    31
    128
    media_image3.png
    Greyscale
 , and
    PNG
    media_image2.png
    28
    31
    media_image2.png
    Greyscale
 is a predicted value of an i-th of a plurality of relations between head entity h and j-th tail entity tk, δ represents a sigmoid function, Wi is learned weights of the first bilinear layer, and bi is a bias of the first bilinear layer.  (see p. 6, Equations 5-9, where uw, bw ϵ Rd and Ww ϵ Rdxd are learnable parameters, Ie ϵ Rd describes relation features between Ea and Eb at entity level)
wherein the second bilinear layer is defined by 
    PNG
    media_image4.png
    28
    53
    media_image4.png
    Greyscale
 
    PNG
    media_image5.png
    33
    271
    media_image5.png
    Greyscale
  is a predicted possibility of a j-th sentence sj in the document being a supporting sentence with regard to an i-th relations ri,
    PNG
    media_image6.png
    24
    59
    media_image6.png
    Greyscale
 and 
    PNG
    media_image7.png
    24
    56
    media_image7.png
    Greyscale
  are learnable parameters of the second bilinear layer in regard to the i-th relation.  (see p. 7, Equations 11-13, para. 7 where us, bs ϵ Rd and Ws ϵ Rdxd are learnable parameters, Id ϵ Rd document-level inference representation which represents all the inference information that we can obtain from the document”)
wherein the third bilinear layer is defined by 
    PNG
    media_image8.png
    26
    35
    media_image8.png
    Greyscale

    PNG
    media_image9.png
    29
    144
    media_image9.png
    Greyscale
, 
    PNG
    media_image8.png
    26
    35
    media_image8.png
    Greyscale
is predicted probability of the j-th sentence in the document being a support sentence to the i-th relation with regard to the k-th tail entity, δ represents a sigmoid function, Wia is learned weights of the third bilinear layer, and bia is a bias of the third bilinear layer.  (see p. 8, Equations 15 and 16 for the sigmoid function calculating the probability of each relation and they concatenate “entity-level inference representation Ie and document-level inference representation Id together to form the final inference representation.”, where Wr, br are the weight matrix and bias for the linear transformation)

Regarding claim 10, Tang in view Yao further teaches wherein after training, the language model and the first, second and third bilinear layers are configured to provide relation prediction and evidence prediction for a query entry having a query head entity and a query document containing the query head entity.  (see p. 1, Abstract, where “Translation constraint and bilinear transformation are applied to target entity pair (eg. comprising the head and tail entity) in multiple subspaces to get entity-level inference information (relation prediction). Next, we model the inference between entity-level information and sentence representation (eg. evidence prediction) to achieve sentence-level inference information. Finally, a hierarchical aggregation approach is adopted to obtain the document-level inference information.”)

Regarding claim 11, Tang in view Yao further teaches wherein the computer executable code is further configured to provide tail entities of the document. (see p.11, Table 3, where the predicted results of the executable HIN-BERT architecture includes tail entities)




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JIALONG HE/Primary Examiner, Art Unit 2659