DETAILED ACTION
This action is in response to the request for continuing examination received July 14, 2021. After consideration of applicant's amendments and/or remarks:
Claims 1-21 rejected under 35 USC § 103.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-21 are rejected under 35 U.S.C. 103 as being unpatentable over Subramanya et al., U.S. Patent No. 9,224,103 B1, in view of Abdel-Reheem et al., U.S. PG-Publication No. 2015/0286629 A1, further in view of Roy et al., "Learning domain-specific word embeddings from sparse cybersecurity texts." arXiv preprint arXiv:1709.07470, 2017 (hereinafter Roy), further in view of Burns et al., U.S. PG-Publication No. 2007/0239704 A1.

Claim 1
	Subramanya discloses a system for referral object processing for textual annotations. Subramanya discloses an "automatic entity annotation system" for annotating a "large corpus with inter-document references." Subramanya, 4:16-29.
	Subramanya discloses the system comprising: a memory storing machine executable code; and one or more processors coupled to the memory and configurable to execute the machine executable code. The "automatic entity annotation system 100" is implemented on a computer device 600 comprising a processor 602 and memory 604. Id. at 4:30-37; 12:10-22.
	Subramanya discloses to cause the one or more processors to: parse a document to identify a reference identifier to an external object, the external object associated with information not contained in the document. System 100 comprises an "annotated dataset creation engine 110" that analyzes crawled documents 120 "to create a set of high-quality entity mentions found within the crawled documents." Crawled documents 120 "include documents that have one or more references to documents in [an] authoritative source." Engine 110 "may look for documents that have links to [the authoritative source], parse the links, and generate a data set of entity mentions from the documents." Id. at 5:18-6:16. Figure 2 illustrates a method "for creating automatically annotated documents for evaluating and training of semantic analysis engines" performed by system 100. The "system may extract free text from [a] document" (215) and then "extract anchor text associated with a reference to an authoritative document" (220). In one embodiment, the system parses HTML to identify anchor tags establishing a hyperlink to the authoritative document. Id. at 7:36-8:20. Accordingly, Subramanya discloses parsing a document 120 to identify anchor tags (i.e. reference identifier) to an authoritative document (i.e. external object).
retrieve the external object using the reference identifier. Continuing with FIG. 2, the system determines whether the "reference may be considered an entity mention" (225-235) and then generates "a set of automatically annotated entity mentions . . . used to train and/or evaluate one or more semantic analysis engines (250). Id. at 8:59-9:6. Figure 3 illustrates a method "for using the set of annotated documents to train and/or evaluate an entity tagging engine" that is performed "as part of step 250." System 100 may "obtain the content of the target page for [an] entity mention of a particular document," wherein "the target page is the authoritative document that the reference . . . refers to." Id. at 9:7-45.
	Subramanya discloses extract the information associated with the external object based on [data] detected in the external object. Continuing with FIG. 3, system 100 obtains "the content of the target page" (305), and categories associated with the target page (310). Then, the system uses "categories associated with the target page to associate one or more entity tags with [an] entity mention" (315). In the described example, the system obtains a target page extracts categories (e.g. "1711 Births," "1795 Deaths," and "Professor of Astronomy") from a target page describing British astronomer John Smith. Id. Accordingly, system 100 extracts categories (i.e. information) associated with a linked target page (i.e. external object), based on content in the target page.
	Subramanya discloses convert the extracted information into the textual annotations associated with the reference identifier in the document. Continuing with FIG. 3, system 100 uses "a single category or a combination of categories to assign entity tags to an entity mention." In the described example, the category data is used to associated the entity tag "18th Century Astronomer" with the John Smith entity mention. Id. 
enter the textual annotations to a corpus of content so that the extracted information is associated with the reference identifier in the document by the system. Engine 110 analyze documents to identify links to other documents that are "entity mentions." Then, engine 110 "may store the documents, the entity mentions within the document, and context regarding the entity mention in a file, such as annotated document set 122." Id. at 5:50-6:16. Engine 110 uses "the context of an entity mention in the document to assign one or more entity tags to the mention." Id. at 6:46-48. Annotated document set 122 "provides a large body of documents that have been annotated with entity mentions and their context." Id. at 7:1-4. Accordingly, system 100 generates an annotated document set 12 (i.e. annotated corpus of context), wherein the annotations are associate entity mentions (i.e. links to other documents; reference identifier) with their entity tags (i.e. context; extracted information).
	Subramanya discloses that entity tagging engine 110 "may strive to classify entity mentions found in free text into one or more entity types." Id. at 9:50-52. Semantic analysis, such as a noun-phrase engine, is used "to identify spans in the document that could be mentions of entities." Id.at 11:30-34. However, Subramanya does not expressly disclose extract the information associated with the external object based on at least one data pattern detected in the external object.
	Abdel-Reheem discloses extract the information associated with the external object based on at least one data pattern detected in the external object. Abdel-Reheem discloses a named entity extractor 100 configured to "detect an instance of a named entity in a web page and classify the named entity as being an organization or other predefined class." Abdel-Reheem, ¶¶ 5; 22. A "document augmentation component 110" is configured to take the output of extractor 100 "to augment a document and provide insight to an end user." Id. at ¶¶ 23; 26. In one Id. at ¶¶ 32-34. Further, Abdel-Reheem discloses a web-based document corpus embodiment, wherein a page contains a hyperlink to an external labeled (tagged) page. The page containing the hyperlink "can inherit that label and use it to tag the anchor text in this paragraph automatically with a . . . label." Id. at ¶¶ 42-43.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the system for annotating a corpus of documents with entity tags referring to entity mentions in linked content of Subramanya to incorporate named entity extraction based on identifying patterns as taught by Abdel-Reheem. One of ordinary skill in the art would be motivated to integrate entity extraction based on textual patterns into the entity tagging system Subramanya of, with a reasonable expectation of success, in order to augment documents with entity information for use by an information retrieval system, in order to improve search query results. See Abdel-Reheem, ¶ 28.
	Subramanya, in view of Abdel-Reheem, does not expressly disclose convert the textual annotations to first word embeddings for a machine learning model of a machine learning engine used to search a corpus of content including the document; and combine the first word embeddings with second word embeddings for the document for the machine learning model.
	Roy discloses a framework comprising a "mechanism to encode diverse types of domain knowledge such as domain vocabulary, semantic categories and semantic relations as text annotations" and a "Word and Annotation Embedding (WAE) algorithm to incorporate text annotations in word embeddings."  Roy, 2 (Introduction). 
convert the textual annotations to first word embeddings for a machine learning model of a machine learning engine used to search a corpus of content including the document. Roy uses a "text annotation-based mechanism" to "convert different types of domain knowledge into text annotations." Id. at 3 (Representing Knowledge as Text Annotations). An Annotation and Word Embedding (AWE) algorithm converts these textual annotations into first word embeddings. Id. at 4 (model output "includes a vector representation of each annotation"). Each AWE model is trained using "a text corpus plus some additional domain knowledge." Id. at 6 (Experiments). Roy discloses an embodiment wherein the dataset comprises a corpus of cybersecurity documents. Id. at 2-3 (Datasets). In this embodiment, Roy discloses "applying learned word embeddings to support Information Extraction for cybersecurity texts." Id. at 7 (Conclusion). Accordingly, Roy discloses training an AWE model (i.e. machine learning model of a machine learning engine) using a corpus of textual documents; and then using the model to extract information from (i.e. search for information) from the corpus of textual documents.
	Roy discloses combine the first word embeddings with second word embeddings for the document for the machine learning model. Roy discloses that the output of the model "includes not only a vector representation of each word but also a vector representation of each annotation in the same embedding space." Both the Annotation-Assisted Word Prediction (AAWP) embodiment and Joint Word and Annotation Prediction (JWAP) embodiment train a model to generate a "weight matrix Qw" comprising learned word representations (i.e. vectors representing second word embeddings) and a "weight matrix Qa" comprising learned annotation representations" (i.e. vectors representing first word embeddings). Id. at 4-5.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the system for annotating a corpus of 
	Subramanya/Abdel-Reheem/Roy does not expressly disclose wherein the parsing the document comprises: performing optical character recognition on the document; identifying, using a neural network model, the reference identifier based on a calculated similarity value determined from comparing data from the optical character recognition to one or more reference identifiers used to train the network model; and retrieving the external object using the reference identifier from parsing the document.
	Burns discloses wherein the parsing the document comprises: performing optical character recognition on the document. Burns discloses a "method for aggregating citations for a corpus of documents having disparate formats." Burns, ¶ 5. The method implements an "extraction component 221" comprising a "converter 221B component" that "normalizes the documents to a common format." In one embodiment, optical character recognition is used to convert an image file to a text format." Id. at ¶ 27.
	Burns discloses identifying, using a neural network model, the reference identifier based on a calculated similarity value determined from comparing data from the optical character recognition to one or more reference identifiers used to train the network model. Burns discloses a "citation service 160" for processing documents to generate a citation listing used "to determine Id. at ¶ 20.  Extraction component 221 further comprises a "harvesting engine 221A that "retrieves … references to the documents … based on one or more rules 221[C] that define the document style and structure." The rules 221C comprise "profile information" utilized to "verify that the document includes one or more citations." Harvesting engine 221A searches "for indications that suggest a heading is a known portion of a research article, such as a reference section, title, references, footnote, endnote, etc." When the document includes a reference section, each line is processed "to determine line atoms, such as author, title, year, and publication … to provide access to the citation information for each normalized document." Id. at ¶ 28. Extraction component 221 is used to identify reference identifiers (e.g. citation information, references, footnote, or endnote) within OCR text based on rules defining document style and structure.
	Burns disclose one embodiment wherein "the extraction component includes machine instruction" utilizing "machine-training algorithm that improves the accuracy when extracting information." The machine-training algorithm parses through sample documents "and errors are corrected by a user so that the machine can learn from the errors to modify a neural network that captures specialized knowledge." The algorithm learns from sample documents and user feedback (i.e. reference identifiers used to train the network model). The algorithm provides the "strongest possible extraction probability" (i.e. similarity value) through "utilizing a sample size … stored in the database to tune the extraction component." Id. at ¶ 29. Accordingly, in one embodiment the extraction component 221 is a machine learning neural network algorithm that identifies reference identifiers. The algorithm operates by comparing OCR textual data to known reference identifier examples (e.g. sample size and user error feedback used in training process).

from parsing the document. Burns discloses that the "harvesting component may utilize reference information included in [a] current document to indirectly retrieve a subsequent document." Id. at ¶ 27.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying reference information within text of Subramanya/Abdel-Reheem/Roy to incorporate OCR and the neural network for identifying reference information taught by Burns. One of ordinary skill in the art would be motivated to integrate the OCR and the neural network for identifying reference information into Subramanya/Abdel-Reheem/Roy, with a reasonable expectation of success, in order to increase reliability of extracting citation "from documents having disparate formats." See Burns, ¶ 41.
 
Claim 2
	Subramanya discloses wherein the reference identifier comprises one of a hyperlink, a page identifier, a heading, a location identifier, an image identifier, a callout banner, or a table number. Figure 2 illustrates a method "for creating automatically annotated documents for evaluating and training of semantic analysis engines" performed by system 100. The "system may extract free text from [a] document" (215) and then "extract anchor text associated with a reference to an authoritative document" (220). In one embodiment, the system parses HTML to identify anchor tags establishing a hyperlink to the authoritative document. Subramanya, 7:36-8:20.

Claim 3
train the machine learning model of the machine learning engine using the first word embeddings and the second word embeddings from at least the document and the textual annotations. Roy discloses that the output of the model "includes not only a vector representation of each word but also a vector representation of each annotation in the same embedding space." Both the Annotation-Assisted Word Prediction (AAWP) embodiment and Joint Word and Annotation Prediction (JWAP) embodiment train a model to generate a "weight matrix Qw" comprising learned word representations (i.e. vectors representing second word embeddings) and a "weight matrix Qa" comprising learned annotation representations" (i.e. vectors representing first word embeddings). Id. at 4-5. The training is performed using "a text corpus" and "additional domain knowledge," i.e. annotations. Id. at 6.

Claim 4
	Abdel-Reheem discloses execute, using the machine learning engine a search of the corpus of content based on a received search query, wherein the search is performed using at least the document and the textual annotations. Abdel-Reheem discloses an "information retrieval system 102" that "creates an index 106 of the documents" based on metadata comprising the named entity extractor results. Abdel-Reheem, ¶ 28. Named entity extractor 100 "detects named entities in the document." Then, document augmentation component 110 can generate "a query using the detected named entity" and use "the query to retrieve search results from an information retrieval system 102." Id. at ¶ 24. One embodiment of the named entity extractor 302 is "a machine learning component." Id. at ¶ 32.

Claim 5
wherein the machine executable code further causes the one or more processors to: in response to the search, determine a portion of the document identified by the search comprises one of the textual annotations. Document augmentation component 110 retrieves information corresponding to the query and a "document viewing application may display the retrieved information inline with the document." FIG. 6 illustrates a web-based document corpus embodiment, wherein a page contains a hyperlink to an external labeled (tagged) page. The page containing the hyperlink "can inherit that label and use it to tag the anchor text in this paragraph automatically with a . . . label." Id. at ¶¶ 42-43. FIG. 8 illustrates "a method of labeling sentences for use in the method of FIG. 6," wherein the output is labeled sentences 812 (i.e. identified portions of the documents).
	Abdel-Reheem discloses provide the information associated with the external object based on the portion comprising the one of the textual annotations. In one embodiment, the information retrieved in response to the query is "displayed . . . so that it overlays the document reading application display." Id. at ¶ 25.

Claim 6
	Abdel-Reheem discloses wherein the textual annotations comprise searchable text generated using the information not contained within the document and wherein the searchable text is associated with a portion of the document having the reference identifier in the corpus of content. In one embodiment, the named entity extractor 302 "is a machine learning component . . . trained to label sequences of words and phrases." Extractor 302 "learns the possible patterns in which a named entity may appear based on observations of the named entity in [a] training data." Abdel-Reheem, ¶¶ 32-34. FIG. 6 illustrates a web-based document corpus embodiment, wherein Id. at ¶¶ 42-43. FIG. 8 illustrates "a method of labeling sentences for use in the method of FIG. 6," wherein the labeling process "collects 804 sentences having hyperlinks to labeled pages" (i.e. portion of content having the reference identifier).

Claim 7
	Abdel-Reheem discloses wherein the information is extracted using at least one of natural language processing, image processing, optical character recognition, or website data extraction. In one embodiment, the named entity extractor 302 "is a machine learning component . . . trained to label sequences of words and phrases." Extractor 302 "learns the possible patterns in which a named entity may appear based on observations of the named entity in [a] training data." Abdel-Reheem, ¶¶ 32-34. The patterns involve natural language processing: e.g. "features such as one or more word previous to the named entity, one or more words next to the named entity, the leading and trailing three characters of the named entity," etc.

Claims 8-10 and 11-14
	Claims 8-10 and 11-14 recite a method comprising steps for performing the functions of the system recited in claims 1-3 and 4-7. Accordingly, claims 8-10 and 11-14 are rejected as indicated in the rejection of claims 1-3 and 4-7.

Claims 15-17 and 18-21



Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 8, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. See Burns et al., U.S. PG-Publication No. 2007/0239704 A1.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172.  The examiner can normally be reached on M-F 10-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAVITA PADMANABHAN can be reached on (571)272-8352.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/FRANK D MILLS/Primary Examiner, Art Unit 2176                                                                                                                                                                                                        September 11, 2021