DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
2.         The information disclosure statement (IDS) submitted on 04/14/2020 has been received, entered into the record, and considered.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Duty of Disclosure
3.	The applicant is reminded that they have a duty to disclose all pertinent references.  From Section 2001.04 of the MPEP:  “A patent by its very nature is affected with a public interest. The public interest is best served, and the most effective patent examination occurs when, at the time an application is being examined, the Office is aware of and evaluates the teachings of all information material to patentability. Each individual associated with the filing and prosecution of a patent application has a duty of candor and good faith in dealing with the Office, which includes a duty to disclose to the Office all information known to that individual to be material to patentability as defined in this section. The duty to disclose information exists with respect to each pending claim until the claim is cancelled or withdrawn from consideration, or the application becomes abandoned. Information material to the patentability of a claim that is cancelled or withdrawn from consideration need not be submitted if the information is not material to the patentability of any claim remaining under consideration in the application. There is no duty to submit information which is not material to the patentability of any existing claim. The duty to disclose all information known to be material to patentability is deemed to be satisfied if all information known to be material to patentability of any claim issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by §§  1.97(b)-(d) and 1.98. However, no patent will be granted on an application in connection with which fraud on the Office was practiced or attempted or the duty of disclosure was violated through bad faith or intentional misconduct. The  Office encourages applicants to carefully examine:
(1)    Inventor Monath has written numerous papers directed to co-reference (including discussions of hierarchical/agglomerative clustering) (See for example “Supervised hierarchal clustering with exponential linkage” dated 2019, “Scalable hierarchical clustering with tree grafting”, dated 2019, “Gradient-based hierarchical clustering using continuous representations of trees in hyperbolic space”, dated 2019, “Scalable Hierarchical Agglomerative Clustering”, dated 2021), etc.  These papers (including their cited references) have not been disclosed to the Office.  
Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
7.	Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Barhom et al. (Article entitled “Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution”, dated 4 June 2019) in view of Clark (Article entitled “Neural Coreference Resolution”, dated 2015).
8.	Regarding claims 1, 8, and 15, Barhom teaches a computer-implemented method, system, and computer program product comprising:
A)  determining, by a processor, mentions in the input documents by parsing the input documents (Abstract, Section 3); 
B)  each mention comprises a first vector and a second vector (Abstract, Section 3.1); 
C)  the first vector comprising spelling data and the second vector comprising context data (Abstract, Section 3.1); 
D)  generating, by the processor, a hierarchical tree by generating a plurality of leaf nodes (Section 3.3, Figure 2); 
E)  each leaf node corresponding to a respective mention that is determined (Sections 3 and 3.3, Figure 2); 
F)  computing, by the processor, for each node, a similarity score based on the first vector and the second vector associated with each node (Sections 3 and 3.3, Figure 2); 
G)  populating, by the processor, the hierarchical tree iteratively until a root node is created (Sections 3 and 3.3, Figure 2); 
H)  each iteration comprising: merging two nodes that have the highest similarity scores (Sections 3 and 3.3, Figure 2); 
I)  wherein merging two nodes comprises creating an entity node at a hierarchical level that is above the two nodes being merged (Sections 3 and 3.3, Figure 2);
J)  computing the similarity score for the entity node (Sections 3 and 3.3, Figure 2).
	The examiner notes that Barhom teaches “determining, by a processor, mentions in the input documents by parsing the input documents” as “Recognizing coreferring events and entities across multiple texts is crucial for many NLP applications. Despite the task’s importance, research focus was given mostly to withindocument entity coreference, with rather little attention to the other variants. We propose a neural architecture for cross-document coreference resolution. Inspired by Lee et al. (2012), we jointly model entity and event coreference. We represent an event (entity) mention using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures” (Abstract) and “We start by computing the mention representations (Section 3.1), which couple the entity and event clustering processes” (Section 3).  The examiner further notes that ascertaining mentions from multiple texts (i.e. documents) entails parsing such documents for such mentions.  The examiner further notes that Barhom teaches “each mention comprises a first vector and a second vector” as “Recognizing coreferring events and entities across multiple texts is crucial for many NLP applications. Despite the task’s importance, research focus was given mostly to withindocument entity coreference, with rather little attention to the other variants. We propose a neural architecture for cross-document coreference resolution. Inspired by Lee et al. (2012), we jointly model entity and event coreference. We represent an event (entity) mention using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures” (Abstract), “We combine word-level and characterlevel features. We compute word-level representations using pre-trained word embeddings… The span vector s(m) is a concatenation of the wordand character-level vectors” (Section 3.1), and “The context surrounding a mention may indicate its compatibility with other candidate mentions… We set the context vector c(m) to the contextual representation of m’s head word, taking the average of the 3 ELMo layers” (Section 3.1).  The examiner further notes that a mention clearly includes a span vector (i.e. the claimed first vector) and a context vector (i.e. the claimed second vector).  The examiner further notes that Barhom teaches “the first vector comprising spelling data and the second vector comprising context data” as “Recognizing coreferring events and entities across multiple texts is crucial for many NLP applications. Despite the task’s importance, research focus was given mostly to withindocument entity coreference, with rather little attention to the other variants. We propose a neural architecture for cross-document coreference resolution. Inspired by Lee et al. (2012), we jointly model entity and event coreference. We represent an event (entity) mention using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures” (Abstract), “We combine word-level and characterlevel features. We compute word-level representations using pre-trained word embeddings… The span vector s(m) is a concatenation of the wordand character-level vectors” (Section 3.1), and “The context surrounding a mention may indicate its compatibility with other candidate mentions… We set the context vector c(m) to the contextual representation of m’s head word, taking the average of the 3 ELMo layers” (Section 3.1).  The examiner further notes that a mention clearly includes a span vector (i.e. spelling data) and a context vector (i.e. context data).  The examiner further notes that Barhom teaches “generating, by the processor, a hierarchical tree by generating a plurality of leaf nodes” as “We use agglomerative clustering that greedily merges multiple cluster pairs with the highest cluster-pair scores (line 8) until the scores are below a pre-defined threshold δ2. The algorithm starts with high-precision merges, leaving less precise decisions to a latter stage, when more information becomes available. We define the cluster-pair score as the average mention linkage score: Scp(ci , cj ) = 1 |ci|·|cj | · P mi∈ci P mj∈cj S(mi , mj ). The same steps are repeated for events (lines 10-12), and repeat iteratively until no merges are available or up to a predefined number of iterations (line 4)” (Section 3.3).  The examiner further notes that the use of agglomerative clustering (which is bottom-up) entails the generation of a hierarchical “tree” that starts with the mentions at the leaf nodes.  Moreover, the instant specification itself explicitly states that it used agglomerative clustering for generating a “tree” (See Paragraph 78).  The examiner further notes that Barhom teaches “each leaf node corresponding to a respective mention that is determined” as “We start by computing the mention representations (Section 3.1), which couple the entity and event clustering processes” (Section 3) and “We use agglomerative clustering that greedily merges multiple cluster pairs with the highest cluster-pair scores (line 8) until the scores are below a pre-defined threshold δ2. The algorithm starts with high-precision merges, leaving less precise decisions to a latter stage, when more information becomes available. We define the cluster-pair score as the average mention linkage score: Scp(ci , cj ) = 1 |ci|·|cj | · P mi∈ci P mj∈cj S(mi , mj ). The same steps are repeated for events (lines 10-12), and repeat iteratively until no merges are available or up to a predefined number of iterations (line 4)” (Section 3.3).  The examiner further notes that the use of agglomerative clustering (which is bottom-up) entails the generation of a hierarchical “tree” that starts with the mentions at the leaf nodes.  Moreover, the instant specification itself explicitly states that it used agglomerative clustering for generating a “tree” (See Paragraph 78).  The examiner further notes that Barhom teaches “populating, by the processor, the hierarchical tree iteratively until a root node is created” as “We start by computing the mention representations (Section 3.1), which couple the entity and event clustering processes” (Section 3) and “We use agglomerative clustering that greedily merges multiple cluster pairs with the highest cluster-pair scores (line 8) until the scores are below a pre-defined threshold δ2. The algorithm starts with high-precision merges, leaving less precise decisions to a latter stage, when more information becomes available. We define the cluster-pair score as the average mention linkage score: Scp(ci , cj ) = 1 |ci|·|cj | · P mi∈ci P mj∈cj S(mi , mj ). The same steps are repeated for events (lines 10-12), and repeat iteratively until no merges are available or up to a predefined number of iterations (line 4)” (Section 3.3).  The examiner further notes that the use of agglomerative clustering (which is bottom-up) entails the generation of a hierarchical “tree” that starts with the mentions at the leaf nodes.  Moreover, the instant specification itself explicitly states that it used agglomerative clustering for generating a “tree” (See Paragraph 78).  Furthermore, an iterative clustering clearly is executed that results in the generation of parent clusters via a merging process of the highest cluster scores (which is based off of similarity scores of the first and second vectors).  Additionally, the number of iterations can be predefined such that no more merges are available (including setting a threshold a number that results in a super root is generated in the bottom-up tree).  The examiner further notes that Barhom teaches “each iteration comprising: merging two nodes that have the highest similarity scores” as “We start by computing the mention representations (Section 3.1), which couple the entity and event clustering processes” (Section 3) and “We use agglomerative clustering that greedily merges multiple cluster pairs with the highest cluster-pair scores (line 8) until the scores are below a pre-defined threshold δ2. The algorithm starts with high-precision merges, leaving less precise decisions to a latter stage, when more information becomes available. We define the cluster-pair score as the average mention linkage score: Scp(ci , cj ) = 1 |ci|·|cj | · P mi∈ci P mj∈cj S(mi , mj ). The same steps are repeated for events (lines 10-12), and repeat iteratively until no merges are available or up to a predefined number of iterations (line 4)” (Section 3.3).  The examiner further notes that the use of agglomerative clustering (which is bottom-up) entails the generation of a hierarchical “tree” that starts with the mentions at the leaf nodes.  Moreover, the instant specification itself explicitly states that it used agglomerative clustering for generating a “tree” (See Paragraph 78).  Furthermore, an iterative clustering clearly is executed that results in the generation of parent clusters via a merging process of the highest cluster scores (which is based off of similarity scores of the first and second vectors) in a greedy nature such that the highest similar clusters are merged at each iteration.  The examiner further notes that Barhom teaches “wherein merging two nodes comprises creating an entity node at a hierarchical level that is above the two nodes being merged” as “We start by computing the mention representations (Section 3.1), which couple the entity and event clustering processes” (Section 3) and “We use agglomerative clustering that greedily merges multiple cluster pairs with the highest cluster-pair scores (line 8) until the scores are below a pre-defined threshold δ2. The algorithm starts with high-precision merges, leaving less precise decisions to a latter stage, when more information becomes available. We define the cluster-pair score as the average mention linkage score: Scp(ci , cj ) = 1 |ci|·|cj | · P mi∈ci P mj∈cj S(mi , mj ). The same steps are repeated for events (lines 10-12), and repeat iteratively until no merges are available or up to a predefined number of iterations (line 4)” (Section 3.3).  The examiner further notes that the use of agglomerative clustering (which is bottom-up) entails the generation of a hierarchical “tree” that starts with the mentions at the leaf nodes.  Moreover, the instant specification itself explicitly states that it used agglomerative clustering for generating a “tree” (See Paragraph 78).  Furthermore, an iterative clustering clearly is executed that results in the generation of parent clusters via a merging process of the highest cluster scores (which is based off of similarity scores of the first and second vectors).  The examiner further notes that Barhom teaches “computing the similarity score for the entity node” as “We start by computing the mention representations (Section 3.1), which couple the entity and event clustering processes” (Section 3) and “We use agglomerative clustering that greedily merges multiple cluster pairs with the highest cluster-pair scores (line 8) until the scores are below a pre-defined threshold δ2. The algorithm starts with high-precision merges, leaving less precise decisions to a latter stage, when more information becomes available. We define the cluster-pair score as the average mention linkage score: Scp(ci , cj ) = 1 |ci|·|cj | · P mi∈ci P mj∈cj S(mi , mj ). The same steps are repeated for events (lines 10-12), and repeat iteratively until no merges are available or up to a predefined number of iterations (line 4)” (Section 3.3).  The examiner further notes that the use of agglomerative clustering (which is bottom-up) entails the generation of a hierarchical “tree” that starts with the mentions at the leaf nodes.  Moreover, the instant specification itself explicitly states that it used agglomerative clustering for generating a “tree” (See Paragraph 78).  Furthermore, an iterative clustering clearly is executed that results in the generation of parent clusters via a merging process of the highest cluster scores (which is based off of similarity scores of the first and second vectors).  Such parent cluster(s) (i.e. entity nodes) then have clustering scores calculated (which are based on the similarity scores) in the interactive process.
	Barhom does not explicitly teach:
K)  identifying, by the processor, the nodes with the similarity scores above a predetermined value as entities for which coreference has been performed in the input documents.
	Clark, however, teaches “identifying, by the processor, the nodes with the similarity scores above a predetermined value as entities for which coreference has been performed in the input documents” as “At test time, the classifier acts as an agent that builds up coreference chains with agglomerative clustering. It begins in a start state where each mention is in a separate single-element cluster. At each step, it observes the current state, which consists of all partially formed coreference clusters produced so far, and selects some action which merges two existing clusters. The action will result in a new state with new candidate actions and the process is repeated.  The number possible cluster merges is quadratic in the number of clusters, so considering all of them would make inference very slow. To deal with this, I took the approach of Clark and Manning [7] and pruned the search space with scores produced from the mention pair classifier. First, the agent orders all mention pairs in the document in descending order according to their pairwise scores. This causes clustering to occur in an easy-first fashion, where harder decisions are delayed until more information is available. Secondly, the agent discards all mention pairs that score below a threshold t under the assumption that the clusters containing these pairs are unlikely to be coreferent. Algorithm 1 shows the full test-time procedure” (Section 3.1).
	The examiner further notes that although the primary reference of Barhom clearly teaches the use of agglomerative clustering for performing co-reference, there is no explicit teaching of the use of a threshold for identifying co-reference above such a threshold.  Nevertheless, the secondary of Clark teaches that merged clusters (i.e. nodes) in agglomerative clustering above a threshold are not discarded as they are co-referent.  The combination would result in the use of a such a threshold to identify co-reference in the merged clusters of the agglomerative clustering of Barhom.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Clark’s would have allowed Barhom’s to provide a method for increasing co-reference identification speed, as noted by Clark (Section 3.1).
9.	Claims 2-3, 6, 9-10, 13, 16-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Barhom et al. (Article entitled “Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution”, dated 4 June 2019) in view of Clark (Article entitled “Neural Coreference Resolution”, dated 2015) as applied to claims 1, 8, and 15 above, and further in view of Wick et al. (Article entitled “A Discriminative Hierarchical Model for Fast Coreference at Large Scale”, dated 14 July 2012). 
10.	Regarding claims 2, 9, and 16, Barhom and Clark do not explicitly teach a computer-implemented method, system, and computer program product comprising: 
A)  wherein merging two nodes further comprises generating a deep set of descendant mentions that are represented by the two nodes being merged.
	Wick, however, teaches “wherein merging two nodes further comprises generating a deep set of descendant mentions that are represented by the two nodes being merged” as “Latent entity nodes (white boxes) summarize subtrees. Pairwise factors (black squares) measure compatibilities between child and parent nodes, avoiding quadratic blow-up. Corresponding decision variables (open circles) indicate whether one node is the child of another. Mentions (gray boxes) are leaves. Deciding whether to merge these two entities requires evaluating just a single factor (red square), corresponding to the new child-parent relationship” (Figure 1) and “The key insight is that each node in the tree functions as a highly compact information-rich summary of its children” (Section 1).
	The examiner further notes that the secondary reference of Wick teaches that the merging process in hierarchical clustering entails the merged clusters including the descendant mentions of the child nodes as clearly shown in Figure 1.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Wick’s would have allowed Barhom’s and Clark’s to provide a method for efficiently reasoning co-reference, as noted by Wick (Section 1).

Regarding claims 3, 10, and 17, Barhom and Clark do not explicitly teach a computer-implemented method, system, and computer program product comprising: 
A)  wherein the descendant mentions include the mentions represented by the two nodes and the mentions of the nodes that are descendants of the two nodes in the hierarchical tree.
	Wick, however, teaches “wherein the descendant mentions include the mentions represented by the two nodes and the mentions of the nodes that are descendants of the two nodes in the hierarchical tree” as “Latent entity nodes (white boxes) summarize subtrees. Pairwise factors (black squares) measure compatibilities between child and parent nodes, avoiding quadratic blow-up. Corresponding decision variables (open circles) indicate whether one node is the child of another. Mentions (gray boxes) are leaves. Deciding whether to merge these two entities requires evaluating just a single factor (red square), corresponding to the new child-parent relationship” (Figure 1) and “The key insight is that each node in the tree functions as a highly compact information-rich summary of its children” (Section 1).
	The examiner further notes that the secondary reference of Wick teaches that the merging process in hierarchical clustering entails the merged clusters including the descendant mentions of the child nodes as clearly shown in Figure 1 (See “each node in the tree functions as a highly compact information-rich summary of its children”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Wick’s would have allowed Barhom’s and Clark’s to provide a method for efficiently reasoning co-reference, as noted by Wick (Section 1).

	Regarding claims 6, 13, and 19, Barhom further teaches a computer-implemented method, system, and computer program product comprising:
A)  wherein merging the two nodes comprises using a neural network to encode the descendant mentions into a fixed length vector by transforming the descendant mention into corresponding scalar values using a non-linear transformation and summing the scalar values (Abstract, Sections 3.1 and 3.3).
	The examiner notes that Barhom teaches “wherein merging the two nodes comprises using a neural network to encode the descendant mentions into a fixed length vector by transforming the descendant mention into corresponding scalar values using a non-linear transformation and summing the scalar values” as “We propose a neural architecture for cross-document coreference resolution” (Abstract), “Recognizing coreferring events and entities across multiple texts is crucial for many NLP applications. Despite the task’s importance, research focus was given mostly to withindocument entity coreference, with rather little attention to the other variants. We propose a neural architecture for cross-document coreference resolution. Inspired by Lee et al. (2012), we jointly model entity and event coreference. We represent an event (entity) mention using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures” (Abstract), “We combine word-level and characterlevel features. We compute word-level representations using pre-trained word embeddings… The span vector s(m) is a concatenation of the wordand character-level vectors” (Section 3.1), “The context surrounding a mention may indicate its compatibility with other candidate mentions… We set the context vector c(m) to the contextual representation of m’s head word, taking the average of the 3 ELMo layers” (Section 3.1), and “We use agglomerative clustering that greedily merges multiple cluster pairs with the highest cluster-pair scores (line 8) until the scores are below a pre-defined threshold δ2. The algorithm starts with high-precision merges, leaving less precise decisions to a latter stage, when more information becomes available. We define the cluster-pair score as the average mention linkage score: Scp(ci , cj ) = 1 |ci|·|cj | · P mi∈ci P mj∈cj S(mi , mj ). The same steps are repeated for events (lines 10-12), and repeat iteratively until no merges are available or up to a predefined number of iterations (line 4)” (Section 3.3).  The examiner further notes that Barhom teaches the use of an equation that includes a summation operation that is used in the merging process that represents the vectors of the child nodes (i.e. descendants).  
Allowable Subject Matter
11.	Claims 4, 11, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	Specifically, although the prior art (See Wick and Xu) clearly stores descendant mentions for each merged node in agglomerative clustering, the detailed limitations directed towards the specific filtering of descendant mentions such that the filtered descendant mentions are stored with each entity node is not taught by the prior art, in conjunction with the rest of the limitations of the parent claims.
	Dependent claims 5 and 12 are deemed allowable for depending on the deemed allowable subject matter of dependent claims 4 and 11 respectively. 
	Claims 7, 14, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	Specifically, although the prior art (See Wick and Xu) clearly stores descendant mentions for each merged node in agglomerative clustering, the detailed limitations directed towards the specific filtering of descendant mentions to be encoded is not taught by the prior art, in conjunction with the rest of the limitations of the parent claims.
Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. PGPUB 2016/0110421 issued to Baron et al. on 25 March 2010.  The subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to perform co-referencing).
U.S. PGPUB 2021/0034701 issued to Fei et al. on 04 February 2021.  The subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to perform co-referencing).
Contact Information
13.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mahesh Dwivedi whose telephone number is (571) 272-2731.  The examiner can normally be reached on Monday to Friday 8:20 am – 4:40 pm.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached (571) 272-4034.  The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


Mahesh Dwivedi
Primary Examiner
Art Unit 2168

June 13, 2022
/MAHESH H DWIVEDI/Primary Examiner, Art Unit 2168