DETAILED ACTION
	Claims 11 and 16-17 rejected under 35 USC § 112.
	Claims 1-19 rejected under 35 USC § 103.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 11 and 16-17 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 11 recites steps of "identifying proper noun vectors … and mapping the proper nouns to vectors in a vector library." It is unclear whether the antecedent basis for "the proper nouns" is the identified 'proper noun vector' or some other 'proper noun.' Examiner suggests vectors to vectors in a vector library."
Claim 16 recite "wherein similar vectors are likely to be hashed to the same locality sensitive hashing bucket." The terms "similar" and "likely" are representative of an approximation. The specification does not provide any standard for measuring the degree of similarity and likelihood of hashing necessary to be considered "similar" or "likely." Further, there is no standard of measuring the degree that is accepted by those of ordinary skill in the art. See MPEP 2173.05(b)(I).
Claim 17 recites "wherein dissimilar vector pairs are not checked for similarity." The term "dissimilar" is representative of an approximation. The specification does not provide any standard for measuring the degree of similarity necessary to be considered "dissimilar." Further, there is no standard of measuring the degree that is accepted by those of ordinary skill in the art. See MPEP 2173.05(b)(I). 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-7, and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Lowe et al., U.S. Patent No. 9,522,412 B1, in view of Jain et al., U.S. PG-Publication No. 2016/0299955 A1, further in view of Moore, U.S. PG-Publication No. 2006/0116867 A1.

Claim 1
	Lowe discloses a method for automated document graphing. Lowe discloses a method of refining Boolean queries comprising steps of "obtaining a query" and "searching a corpus of documents based on the query." Lowe, 2:9-29. The method is implemented using a "computational linguistics system 802" comprising a "natural-language processing module 820" and a "graph repository 822." End users of the system "may submit queries to view … document relationships (e.g., graphs of such relationships) … based on unstructured plain text documents." Id. at 12:7-45; FIG. 10. Module 820 analyzes documents and "store[s] resulting graphs in the graph repository 822." Graphs are constructed using "distributional semantic analysis" performed by "an unsupervised machine learning technique." These techniques represent "the distribution of terms in documents … as relatively high-dimensional vectors," wherein "semantic similarity is measured by … similarity of the vectors, for instance, cosine similarity." Id. at 14:62-15:65. Accordingly, module 820 constructs a document graph from a corpora using unsupervised machine learning techniques (i.e. automatically).
	Lowe discloses in an information processing apparatus comprising at least one computer processor: receiving, from a document repository, a plurality of source documents. System 802 further comprises a "document-ingest module 816" and a "corpora repository 818." Id. at 12:33-45. Module 816 obtains collections of documents "from different document sources 806, such as remote, third-party repositories of documents" and stores those document in repository 818. The corpora of unstructured plain text documents are analyzed to generate graphs. Id. at 13:21-28.
	Lowe discloses extracting text from each of the plurality of source documents. In one embodiment, "the unstructured text may be included within structured portions of other Id. at 13:29-58. Accordingly, Lowe discloses extracting unstructured text from a webpage by removing markup and scripting instructions.
	Lowe discloses removing stop words from the extracted text. The method measures relationship between documents by evaluating n-grams of text, wherein "n-grams are sequences of one or more terms." Lowe expressly discloses that "relatively uninformative terms, like stop words … may be omitted" in this evaluation. Id. at 15:53-65.
	Lowe discloses representing each of the remaining words in the extracted text as a vector. Lowe discloses measuring relationships between documents using "a feature vector in which each value of the vector indicates the presence, number of occurrences, or frequency of an n-gram in the document," Id. Lowe discloses that documents are "clustered according to their corresponding vectors in [a] concept space" or "similarity of documents [are] determined … by comparing their respective vectors in this space, e.g., based on cosine similarity." Id. at 18:4-8. In one embodiment, feature vectors are "determined for documents in a corpus," and the method iterates through each of the vectors to "create a graph of reachable vectors, where nodes on the graph are identified in response to non-core corresponding vectors being within a threshold distance of a core vector in the graph." Id. at 18:32-35.
	Lowe discloses storing the vectors in a library. Lowe discloses that "a corpus may be represented by arranging the feature vectors into a term-document matrix" or a "co-occurrence matrix." Id. at 16:34-54. These matrices are stored in memory for analysis. Id. at 17:4-47.
cleaning the extracted text by removing unnecessary characters from the extracted text.
	Jain discloses cleaning the extracted text by removing unnecessary characters from the extracted text. Jain discloses a "text mining system for extracting relevant text from a plurality of input data sets." Jain, ¶ 6. The system comprises a text analysis module 60 that includes a data handling module 62 "configured to convert … input data sets to an analytics text set … by cleaning up the input data sets." Module 62 filters "non-relevant elements from the input data sets." Id. at ¶¶ 40-41. Data cleaning operations include removal of phone numbers, special characters, stop words, URLs, white spaces, email addresses, and the like." Id. at ¶ 58.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of generating a document graph using text analysis of Lowe to incorporate cleaning extracted text for analysis as taught by Jain. One of ordinary skill in the art would be motivated to integrate cleaning extracted text for analysis into Lowe, with a reasonable expectation of success, in order to "remove noise from the input data sets." Jain, ¶ 58.
	Lowe-Jain does not expressly disclose rewriting multi-word proper nouns in the extracted text; converting a plurality of words in the extracted text to lemma form; and rewriting compound terms in the extracted text.
	Moore discloses rewriting multi-word proper nouns in the extracted text. Moore discloses methods used to determine relationships among words. Moore, ¶ 43; See Also ¶ 61 (method used to generate word association scores or relationships between words in different languages). The method is implemented using parsing components 204/206 configured to parse text into "logical Id. at ¶¶ 62; 81. A statistical word association component 208 "hypothesizes the occurrence of compounds" in textual data and "replaced the hypothesized compounds with a single token." Id. at ¶ 65; See Also ¶¶ 68; 76 (compounds are replaced by fused tokens).
	Moore discloses converting a plurality of words in the extracted text to lemma form. Moore discloses that during parsing "words … are converted to normalized word forms (lemmas)." Id. at ¶¶ 48; 62.
	Moore discloses rewriting compound terms in the extracted text. The method is implemented using parsing components 204/206 configured to parse text into "logical forms." The parsers may "identify certain lexical compounds as if they are single units." These lexical compounds are "multi-word expressions" including "time expressions, dates, measure expressions, etc." Id. at ¶¶ 62; 81; See Also ¶¶ 16-19 (compound terms such as 'back up,' 'roll back,' and 'log on'); ¶¶ 75-78 (discussing a subclass of "lexicalized multi-word expressions" called "captoids"). A statistical word association component 208 "hypothesizes the occurrence of compounds" in textual data and "replaced the hypothesized compounds with a single token." Id. at ¶ 65; See Also ¶¶ 68; 76 (compounds are replaced by fused tokens).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of generating a document graph using text analysis of Lowe-Jain to incorporate converting text to lemma form and rewriting multi-word expressions as taught by Moore. One of ordinary skill in the art would be motivated to integrate converting text to lemma form and rewriting multi-word expressions into Lowe-Jain, with a reasonable expectation of success, in order to increase accuracy in textual 

Claim 2
	Lowe discloses wherein the source documents comprise binary files. The retrieved and stored corpora may include "a web page" or "unstructured text embedded in a serialized data format," or "text … encoded as Unicode or ASCII text." Lowe, 13:29-58.

Claim 3
	Jain discloses wherein the unnecessary characters include at least one of a mark, a symbol, a control character, an email address, and a URL. Data cleaning operations include removal of phone numbers, special characters, stop words, URLs, white spaces, email addresses, and the like." Jain, ¶ 58.

Claim 4
	Moore discloses wherein the step of rewriting multi-word proper nouns comprises: identifying the multi-word proper noun; and rewriting the multi-word proper noun as a single term by inserting an underscore between the words in the multi-word proper noun. Moore discloses methods used to determine relationships among words. Moore, ¶ 43; See Also ¶ 61 (method used to generate word association scores or relationships between words in different languages). The method is implemented using parsing components 204/206 configured to parse text into "logical forms." The parsers may "identify certain lexical compounds as if they are single units." These lexical compounds are "multi-word expressions" including "proper names" Id. at ¶¶ 62; 81. A statistical word association component 208 "hypothesizes the occurrence of compounds" in textual data and "replaced the hypothesized compounds with a single token." Id. at ¶ 65; See Also ¶¶ 68; 76 (compounds are replaced by fused tokens). The fused tokens are generated "by placing underscores between the words in each word sequence." Id. at ¶ 81; See Also ¶¶ 68; 75 (describing fused token of words E2 and E3 as "E2_E3" and fused token of compound expression 'log on' as "log_on").

Claim 6
	Moore discloses wherein the lemma form comprises a dictionary or canonical form. Moore discloses that during parsing words are converted to normalized (i.e. canonical) word forms (lemmas). Id. at ¶¶ 48; 62.

Claim 7
	Moore discloses wherein the step of rewriting compound terms comprises: identifying the compound terms; and rewriting the compound terms as single term by inserting an underscore between the terms in the compound terms. The method is implemented using parsing components 204/206 configured to parse text into "logical forms." The parsers may "identify certain lexical compounds as if they are single units." These lexical compounds are "multi-word expressions" including "time expressions, dates, measure expressions, etc." Id. at ¶¶ 62; 81; See Also ¶¶ 16-19 (compound terms such as 'back up,' 'roll back,' and 'log on'); ¶¶ 75-78 (discussing a subclass of "lexicalized multi-word expressions" called "captoids"). A statistical word association component 208 "hypothesizes the occurrence of compounds" in textual data and "replaced the hypothesized compounds with a single token." Id. at ¶ 65; See Also ¶¶ 68; 76 Id. at ¶ 81; See Also ¶¶ 68; 75 (describing fused token of words E2 and E3 as "E2_E3" and fused token of compound expression 'log on' as "log_on").

Claim 11
	Lowe discloses a method for document vectorization. Lowe discloses a method of refining Boolean queries comprising steps of "obtaining a query" and "searching a corpus of documents based on the query." Lowe, 2:9-29. The method is implemented using a "computational linguistics system 802" comprising a "natural-language processing module 820" and a "graph repository 822." End users of the system "may submit queries to view … document relationships (e.g., graphs of such relationships) … based on unstructured plain text documents." Id. at 12:7-45; FIG. 10. Module 820 analyzes documents and "store[s] resulting graphs in the graph repository 822." Graphs are constructed using "distributional semantic analysis" performed by "an unsupervised machine learning technique." These techniques represent "the distribution of terms in documents … as relatively high-dimensional vectors," wherein "semantic similarity is measured by … similarity of the vectors, for instance, cosine similarity." Id. at 14:62-15:65.
	Lowe discloses in an information processing apparatus comprising at least one computer processor: receiving, from a document repository, a plurality of source documents. System 802 further comprises a "document-ingest module 816" and a "corpora repository 818." Id. at 12:33-45. Module 816 obtains collections of documents "from different document sources 806, such as remote, third-party repositories of documents" and stores those document in repository 818. The corpora of unstructured plain text documents are analyzed to generate graphs. Id. at 13:21-28.
extracting text from each of the plurality of source documents. In one embodiment, "the unstructured text may be included within structured portions of other documents, for example rendered text within markup instructions in a webpage, or unstructured text embedded in a serialized data format." In these instances, the text analyzed for "constructing graph topologies" is the "human readable plain text … with markup instructions and scripting language instructions removed." Id. at 13:29-58. Accordingly, Lowe discloses extracting unstructured text from a webpage by removing markup and scripting instructions.
	Lowe discloses separating the text into a plurality of parts and cleaning the extracted text. The method measures relationship between documents by evaluating n-grams of text, wherein "n-grams are sequences of one or more terms." Lowe expressly discloses that "relatively uninformative terms, like stop words … may be omitted" in this evaluation. Id. at 15:53-65. Accordingly, the text is separated into a plurality of n-grams (i.e. parts of text) and cleaning the text by removing stop words.
	Lowe discloses mapping each of the remaining words to a vector in the vector library. Lowe discloses measuring relationships between documents using "a feature vector in which each value of the vector indicates the presence, number of occurrences, or frequency of an n-gram in the document," Id. Lowe discloses that documents are "clustered according to their corresponding vectors in [a] concept space" or "similarity of documents [are] determined … by comparing their respective vectors in this space, e.g., based on cosine similarity." Id. at 18:4-8. 
	Lowe discloses generating a document vector for the vectors in the document. In one embodiment, feature vectors are "determined for documents in a corpus," and the method iterates through each of the vectors to "create a graph of reachable vectors, where nodes on the graph are Id. at 18:32-35.
	Lowe discloses omitting "stop words" from extracted text. However, Lowe does not expressly disclose cleaning the extracted text by removing unnecessary characters from the extracted text.
	Jain discloses cleaning the extracted text by removing unnecessary characters from the extracted text. Jain discloses a "text mining system for extracting relevant text from a plurality of input data sets." Jain, ¶ 6. The system comprises a text analysis module 60 that includes a data handling module 62 "configured to convert … input data sets to an analytics text set … by cleaning up the input data sets." Module 62 filters "non-relevant elements from the input data sets." Id. at ¶¶ 40-41. Data cleaning operations include removal of phone numbers, special characters, stop words, URLs, white spaces, email addresses, and the like." Id. at ¶ 58.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of generating a document graph using text analysis of Lowe to incorporate cleaning extracted text for analysis as taught by Jain. One of ordinary skill in the art would be motivated to integrate cleaning extracted text for analysis into Lowe, with a reasonable expectation of success, in order to "remove noise from the input data sets." Jain, ¶ 58.
	Lowe-Jain does not expressly disclose identifying proper noun vectors in the plurality of parts; converting a plurality of words in the extracted text to lemma form; and rewriting compound term vectors in the extracted text.
	Moore discloses identifying proper noun vectors in the plurality of parts. Moore discloses methods used to determine relationships among words. Moore, ¶ 43; See Also ¶ 61 (method used Id. at ¶¶ 62; 81. A statistical word association component 208 "hypothesizes the occurrence of compounds" in textual data and "replaced the hypothesized compounds with a single token." Id. at ¶ 65; See Also ¶¶ 68; 76 (compounds are replaced by fused tokens).
	Moore discloses converting a plurality of words in the extracted text to lemma form. Moore discloses that during parsing "words … are converted to normalized word forms (lemmas)." Id. at ¶¶ 48; 62.
	Moore discloses rewriting compound term vectors in the extracted text. The method is implemented using parsing components 204/206 configured to parse text into "logical forms." The parsers may "identify certain lexical compounds as if they are single units." These lexical compounds are "multi-word expressions" including "time expressions, dates, measure expressions, etc." Id. at ¶¶ 62; 81; See Also ¶¶ 16-19 (compound terms such as 'back up,' 'roll back,' and 'log on'); ¶¶ 75-78 (discussing a subclass of "lexicalized multi-word expressions" called "captoids"). A statistical word association component 208 "hypothesizes the occurrence of compounds" in textual data and "replaced the hypothesized compounds with a single token." Id. at ¶ 65; See Also ¶¶ 68; 76 (compounds are replaced by fused tokens).
	Moore, in view of Lowe, suggests mapping the proper nouns to vectors in a vector library and mapping the compound term vectors to vectors in the vector library. First, Lowe discloses mapping terms to vectors in a vector library. Lowe discloses that "a corpus may be represented by arranging the feature vectors into a term-document matrix" or a "co-occurrence matrix." Id. at 17:4-47. Second, Moore discloses identifying proper nouns and multi-word expressions as a single fused token (i.e. a single term). Moore, ¶¶ 62; 81. One of ordinary skill in the art would recognize that the single fused token of Moore could be a term mapped to a vector in a vector library, as taught by Lowe.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of generating a document graph using text analysis of Lowe-Jain to incorporate converting text to lemma form and rewriting multi-word expressions as taught by Moore. One of ordinary skill in the art would be motivated to integrate converting text to lemma form and rewriting multi-word expressions into Lowe-Jain, with a reasonable expectation of success, in order to increase accuracy in textual analysis of lexical relationships by considering compounds multi-word sequences instead of only using single words. See Moore, ¶¶ 13-20.

Claim 12
	Lowe discloses retrieving a plurality of document vectors for a plurality of documents in a document library. Natural language processing module 820 analyzes the corpora and "store[s] resulting graphs in [a] graph repository 822" using "techniques by which distribution of terms in documents is represented as relatively high-dimensional vectors, and semantic similarity is measured by according to similarity of the vectors, for instance cosine similarity." Lowe, 15:17-52.
	Lowe discloses calculating a cosine similarity for the document vector; calculating a cosine similarity for each of the plurality of documents in the document library; and comparing the cosine similarity for the document vector to the cosine similarities for each of the plurality of documents in the document library. Lowe discloses that document similarity is determined by comparing each respective document vector in a concept space, "based on cosine similarity," wherein similarity of n-grams representing entities are "determined based on similarity of resulting vectors in [a] co-occurrence matrix, e.g. based on cosine similarity." Id. at 17:48-18:31.	Lowe discloses storing the cosine similarity comparisons in a document graph. Lowe discloses that feature vectors are "determined for documents in a corpus," and then the method "iterate[s] through each of the vectors" to "create a graph of reachable vectors, where nodes on the graph are identified in response to non-core corresponding vectors being within a threshold distance of a core vectors in the graph." Id. at 18:33-55.


Claims 5 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Lowe et al., U.S. Patent No. 9,522,412 B1, in view of Jain et al., U.S. PG-Publication No. 2016/0299955 A1, further in view of Moore, U.S. PG-Publication No. 2006/0116867 A1, further in view of Williams et al., U.S. PG-Publication No. 2019/0347668 A1.

Claim 5
	Williams discloses wherein the multi-word proper noun is identified as a multi-word proper noun using a knowledge graph. Williams discloses methods "for generating a cluster of correlated content from [a] primary content object" (e.g. a document). Williams, ¶ 62. The method uses models 118 that "access a corpus of content extracted by crawling a relevant set of pages" and apply key phrases 112 "to establish clusters, which arrange topics around a core topic Id. at ¶¶ 66-67; See Also ¶¶ 84-85 (models 118 include vector space wherein "documents that contain similar semantic meaning are close together" using cosine similarity). The method is implemented using a "directed content system 200" comprising "one or more knowledge graphs 210 representing specific types of entities" and a machine learning system 212 "used to extract … entities." Id. at ¶ 114. An information extraction system 204 parses information and extracts information of known types "which can be inserted into the knowledge graph." Id. at ¶ 116. Entities stored in knowledge graphs 210 include "organizations (e.g. companies), people, governments, cities, states, counties … and products" (i.e. proper nouns). Id. at ¶ 117. Figure 12 illustrates "a visual representation of a portion of an example knowledge graph 210 representation." Id. at ¶ 127. Further, Williams discloses knowledge graphs comprising multi-word expressions. Id. at ¶ 159 (Chief Technology Officer); ¶ 263 (gold status, platinum status).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify method of extracting text for analysis of Lowe-Jain-Moore to incorporate the knowledge graphs taught by Williams. One of ordinary skill in the art would be motivated to integrate knowledge graphs into Lowe-Jain-Moore, with a reasonable expectation of success, in order to "allow for direct extraction of information of known types … without much further processing" and to use known entities, events, and/or relationships to "reconcile any newly derived data … with data already present in the knowledge graph." Williams, ¶¶ 116; 131.

Claim 8
wherein the compound terms are identified as a compound terms using a knowledge graph. Williams discloses methods "for generating a cluster of correlated content from [a] primary content object" (e.g. a document). Williams, ¶ 62. The method uses models 118 that "access a corpus of content extracted by crawling a relevant set of pages" and apply key phrases 112 "to establish clusters, which arrange topics around a core topic based on semantic similarity." The method uses the crawled information "to generate specially formatted vectors … used to produce word embeddings in a multi-layer neural network." Id. at ¶¶ 66-67; See Also ¶¶ 84-85 (models 118 include vector space wherein "documents that contain similar semantic meaning are close together" using cosine similarity). The method is implemented using a "directed content system 200" comprising "one or more knowledge graphs 210 representing specific types of entities" and a machine learning system 212 "used to extract … entities." Id. at ¶ 114. An information extraction system 204 parses information and extracts information of known types "which can be inserted into the knowledge graph." Id. at ¶ 116. Entities stored in knowledge graphs 210 include "organizations (e.g. companies), people, governments, cities, states, counties … and products" (i.e. proper nouns). Id. at ¶ 117. Figure 12 illustrates "a visual representation of a portion of an example knowledge graph 210 representation." Id. at ¶ 127. Further, Williams discloses knowledge graphs comprising multi-word expressions. Id. at ¶ 159 (Chief Technology Officer); ¶ 263 (gold status, platinum status).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify method of extracting text for analysis of Lowe-Jain-Moore to incorporate the knowledge graphs taught by Williams. One of ordinary skill in the art would be motivated to integrate knowledge graphs into Lowe-Jain-Moore, with a reasonable expectation of success, in order to "allow for direct extraction of information of .


Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Lowe et al., U.S. Patent No. 9,522,412 B1, in view of Jain et al., U.S. PG-Publication No. 2016/0299955 A1, further in view of Moore, U.S. PG-Publication No. 2006/0116867 A1, further in view of Smith et al., U.S. Patent No. 9,710,544 B1.

Claim 9
	Smith discloses wherein the step of removing stop words from the extracted text comprises: identifying a word in the extracted text as a stop word using a list of stop words. Smith discloses a "computational linguistic analysis of a collection of documents" to generate "a semantic graph … where nodes represent documents, entities mentioned in the document … and edges … are given by pairwise relationships between each couple of documents, mentioned entities, or other features of the documents." Smith, 3:15-32. The analysis is performed on "a corpus of documents that have undergone some processing," the processing including removing stop words. Id. at 5:42-60. Smith discloses that "a list of stop words [is] maintained and compared against parsed values to determine whether to disregard the parsed term." Id. at 8:43-49.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify he method of generating a document .


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Lowe et al., U.S. Patent No. 9,522,412 B1, in view of Jain et al., U.S. PG-Publication No. 2016/0299955 A1, further in view of Moore, U.S. PG-Publication No. 2006/0116867 A1, further in view of Mittal, U.S. PG-Publication No. 2019/0005049 A1.

Claim 10
	Mittal discloses wherein the step of representing each of the remaining words in the extracted text as a vector comprises: applying a skip gram algorithm to the remaining words. Mittal discloses search system methods using "statistical attributes (such as terms frequency or proximity scores) to provide contextually relevant results from searching a corpus of texts relating to a domain of knowledge." Mittal, ¶ 32. The method processes text to generate a "search index … stored in [a] corpus-data database 1140." Id. at ¶ 50.  In one embodiment, the indexer "prepares a skip gram model for a text vector or a text of a corpus of texts." Id. at ¶¶ 489-492; 537-539; See Also ¶¶ 36-37; 42 ("the computational engine uses skip-grams … and/or other algorithms to add feature data to a word-pair feature data structure entry").
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify he method of generating a document .


Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Lowe et al., U.S. Patent No. 9,522,412 B1, in view of Jain et al., U.S. PG-Publication No. 2016/0299955 A1, further in view of Moore, U.S. PG-Publication No. 2006/0116867 A1, further in view of Brinker et al., U.S. PG-Publication No. 2008/0205774 A1.

Claim 13
	Brinker discloses calculating a locality sensitive hash for the document vector. Brinker discloses "a method of clustering a plurality of documents from a data stream." The method comprises steps of generating a "feature vector for each document," retrieving a set of cluster centroids (e.g. feature vectors of corresponding clusters) based on a feature vector of a document, wherein the retrieving uses "a locality sensitive hashing function to limit the number of candidate clusters to be searched." Brinker, ¶¶ 10; 40; 47; 55.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of determining document similarity of Lowe-Jain-Moore to incorporate a locality sensitive hashing function as taught by Brinker. One of ordinary skill in the art would be motivated to integrate a locality sensitive hashing function into Lowe-Jain-Moore, with a reasonable expectation of success, in order to .


	Claims 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Lowe et al., U.S. Patent No. 9,522,412 B1, in view of Rajpara, U.S. Patent No. 10,095,747 B1, further in view of Brinker et al., U.S. PG-Publication No. 2008/0205774 A1.

Claim 14
	Lowe discloses a method of document querying using a document graph. Lowe discloses a method of refining Boolean queries comprising steps of "obtaining a query" and "searching a corpus of documents based on the query." Lowe, 2:9-29. The method is implemented using a "computational linguistics system 802" comprising a "natural-language processing module 820" and a "graph repository 822." End users of the system "may submit queries to view … document relationships (e.g., graphs of such relationships) … based on unstructured plain text documents." Id. at 12:7-45; FIG. 10. Module 820 analyzes documents and "store[s] resulting graphs in the graph repository 822." Graphs are constructed using "distributional semantic analysis" performed by "an unsupervised machine learning technique." These techniques represent "the distribution of terms in documents … as relatively high-dimensional vectors," wherein "semantic similarity is measured by … similarity of the vectors, for instance, cosine similarity." Id. at 14:62-15:65.
in an information processing apparatus comprising at least one computer processor: retrieving a query. Lowe discloses obtaining a query comprising elements (e.g. keywords). Id. at 4:25-40.
	Lowe discloses calculating a cosine similarity score for [a] document vector and a plurality of document vectors for each document. Natural language processing module 820 analyzes the corpora and "store[s] resulting graphs in [a] graph repository 822" using "techniques by which distribution of terms in documents is represented as relatively high-dimensional vectors, and semantic similarity is measured by according to similarity of the vectors, for instance cosine similarity." Lowe, 15:17-52.
	Lowe discloses selecting the documents vector having the closest cosine similarity with the source document vector. Lowe discloses that document similarity is determined by comparing each respective document vector in a concept space, "based on cosine similarity," wherein similarity of n-grams representing entities are "determined based on similarity of resulting vectors in [a] co-occurrence matrix, e.g. based on cosine similarity." Id. at 17:48-18:31. Lowe discloses that entries responsive to a query may be ranked "based on both the number of times the keywords appear in the documents and the context in which those keywords appear, for example, in association with other terms related to those keywords (e.g., having a greater than a threshold co-occurrence rate)."Id. at 5:49-6:4.
	Lowe discloses identifying the document associated with the document vector. Figure 4 illustrates a graphical user interface comprising region 60 of representative articles corresponding to the query, selected "based on the strength of a score by which the articles, or documents were assigned to the item." Id. at 8:9-29.
comprising a source document, wherein the source document is associated with a source document vector.
	Rajpara discloses retrieving a query comprising a source document. Rajpara discloses a method comprising the steps of "receiving a plurality of documents related to e-discovery" and "identifying other documents from the plurality of documents based on a similarity between text in the other documents and the at least one selection of text in [a] seed document." Rajpara, 4:25-51. The method receives a "document search query" including an identification of "a seed document." Id. at 8:14-23.
	Rajpara, in view of Lowe, suggests wherein the source document is associated with a source document vector. Rajpara finds similar documents by determining keywords in the seed document. Rajpara, 8:34-56; 9:48-61; See Also 13:44-48 ("keywords may be keywords from the selected text of the seed document"). Rajpara ranks other documents "based on precedent of important keywords in the other documents." Id. at 10:44-55. Lowe discloses obtaining a query comprised of keywords. Lowe, 4:25-40 (query elements are keywords). One or ordinary skill in the art would recognize that the keywords obtained from the seed document of Rajpara, could be used in the query of Lowe. Further, Lowe discloses determining similar documents using the distribution of terms (e.g. keywords) in documents "represented as relatively high-dimensional vectors, and semantic similarity is measured … according to similarity of the vectors, for instance, cosine similarity." Id. at 15:17-52.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the document querying method using vector cosine similarity of Lowe1 to incorporate a query comprising a seed document as taught by Rajpara. One of ordinary skill in the art would be motivated to integrate the query comprising 
	Lowe-Rajpara does not expressly disclose identifying a locality sensitive hashing bucket for the source document and calculating a cosine similarity score for the source document vector and a plurality of document vectors for each document associated with the locality sensitive hashing bucket.
	Brinker discloses identifying a locality sensitive hashing bucket for the source document and calculating a cosine similarity score for the source document vector and a plurality of document vectors for each document associated with the locality sensitive hashing bucket. Brinker discloses "a method of clustering a plurality of documents from a data stream." The method comprises steps of generating a "feature vector for each document," retrieving a set of cluster centroids (e.g. feature vectors of corresponding clusters) based on a feature vector of a document, wherein the retrieving uses "a locality sensitive hashing function to limit the number of candidate clusters to be searched." Brinker, ¶¶ 10; 40; 47; 55. Further, Brinker discloses that a feature vectors of a document is compared to the feature vector of each cluster to determine a "cosine distance." Id. at ¶ 47.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of determining document similarity of Lowe-Jain-Moore to incorporate a locality sensitive hashing function as taught by Brinker. One of ordinary skill in the art would be motivated to integrate a locality sensitive hashing function into Lowe-Jain-Moore, with a reasonable expectation of success, in order to increase efficiency and accuracy in clustering documents from continuous high density data 

Claim 15
	Lowe discloses identifying a plurality of words common to the source document and the identified document. Figure 3 illustrates a user interface comprising "information about responsive search results" (i.e. results responsive to the query). Lowe, 6:57-61. Region 44 of the interface comprises narrowing terms, wherein the narrowing terms are "entities mentioned" or "keywords appearing in the documents." Id. at 8:30-61. Accordingly, each narrowing term corresponds to an entity or keyword common to all the documents (and the keywords/entity extracted from the seed document of Rajpara).
	Lowe discloses presenting the identified document and the plurality of words common to the source document and the identified document. Region 44 presents items corresponding to words (e.g. keywords or entities) common to all the documents associated with that item. Figure 4 illustrates that selecting an item causes region 60 to display "representative articles" corresponding to the item (i.e. identified documents).

Claim 16
	Brinker discloses wherein similar vectors are likely to be hashed to the same locality sensitive hashing bucket. Brinker discloses that the locality sensitive hashing function is a "probabilistic dimension reduction scheme to return a set of candidate clusters." Brinker, ¶ 40. The candidate clusters include clusters "with the feature vector … most similar to the feature vector of the document with a high probability." Id. at ¶ 43.

Claim 17
	Brinker discloses wherein dissimilar vector pairs are not checked for similarity. Conversely, clusters not determined to be candidate clusters by the locality sensitive hashing feature are not checked for similarity (i.e. determined to have a low probability of similarity). See Brinker, ¶ 43.

Claim 18
	Lowe discloses querying a graph database for documents having a similar cosine similarity. Lowe discloses a method of refining Boolean queries comprising steps of "obtaining a query" and "searching a corpus of documents based on the query." Lowe, 2:9-29. The method is implemented using a "computational linguistics system 802" comprising a "natural-language processing module 820" and a "graph repository 822." End users of the system "may submit queries to view … document relationships (e.g., graphs of such relationships) … based on unstructured plain text documents." Id. at 12:7-45; FIG. 10. Module 820 analyzes documents and "store[s] resulting graphs in the graph repository 822." Graphs are constructed using "distributional semantic analysis" performed by "an unsupervised machine learning technique." These techniques represent "the distribution of terms in documents … as relatively high-dimensional vectors," wherein "semantic similarity is measured by … similarity of the vectors, for instance, cosine similarity." Id. at 14:62-15:65.


Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Lowe et al., U.S. Patent No. 9,522,412 B1, in view of Rajpara, U.S. Patent No. 10,095,747 B1, further in view of Brinker et al., U.S. PG-Publication No. 2008/0205774 A1, further in view of Williams et al., U.S. PG-Publication No. 2019/0347668 A1.

Claim 19
	Williams discloses wherein the graph database comprises a knowledge graph. Williams discloses methods "for generating a cluster of correlated content from [a] primary content object" (e.g. a document). Williams, ¶ 62. The method uses models 118 that "access a corpus of content extracted by crawling a relevant set of pages" and apply key phrases 112 "to establish clusters, which arrange topics around a core topic based on semantic similarity." The method uses the crawled information "to generate specially formatted vectors … used to produce word embeddings in a multi-layer neural network." Id. at ¶¶ 66-67; See Also ¶¶ 84-85 (models 118 include vector space wherein "documents that contain similar semantic meaning are close together" using cosine similarity). The method is implemented using a "directed content system 200" comprising "one or more knowledge graphs 210 representing specific types of entities" and a machine learning system 212 "used to extract … entities." Id. at ¶ 114. An information extraction system 204 parses information and extracts information of known types "which can be inserted into the knowledge graph." Id. at ¶ 116. Entities stored in knowledge graphs 210 include "organizations (e.g. companies), people, governments, cities, states, counties … and products" (i.e. proper nouns). Id. at ¶ 117. Figure 12 illustrates "a visual representation of a portion of an example knowledge graph 210 representation." Id. at ¶ 127. Further, Williams discloses Id. at ¶ 159 (Chief Technology Officer); ¶ 263 (gold status, platinum status).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify method of extracting text for analysis of Lowe-Rajpara-Brinker to incorporate the knowledge graphs taught by Williams. One of ordinary skill in the art would be motivated to integrate knowledge graphs into Lowe-Rajpara-Brinker, with a reasonable expectation of success, in order to "allow for direct extraction of information of known types … without much further processing" and to use known entities, events, and/or relationships to "reconcile any newly derived data … with data already present in the knowledge graph." Williams, ¶¶ 116; 131.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172.  The examiner can normally be reached on M-F 10-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAVITA PADMANABHAN can be reached on (571)272-8352.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/FRANK D MILLS/Primary Examiner, Art Unit 2176                                                                                                                                                                                                        July 31, 2021