Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION           
            This action is response to the communication filed on October 29, 2021. Claims 1-3, 5-45 are pending. 

Response to Arguments
Applicant's arguments filed on October 29, 2021 have been fully considered but they are not persuasive.
 
Applicant argues Hazlehurst fails to teach harmonizing and normalizing queries. 

In response, examiner respectfully disagree. Hazlehurst teaches query is initiated by user 86 in step 260 via a graphical user interface and in step 262, liaison 88 gets the words or phrases entered by user 86. In step 264, that text is passed to a grinder 100 (FIG. 7) which creates a query (document) index 102 (FIG. 7) from the text wherein the grinder 100 performs some initial processing such as parses to identify features, stems inflected word forms and looks up word equivalents via an optional thesaurus and word stemmer 115 to collapse alternative representations of words into singular forms, eliminates "stop words" etc.  (i.e. harmonize and normalize the query), see column 22 lines 32-37, column 9 lines 7-16 which satisfy the arguing limitation. 

Claim objections are withdrawn in the of claims amendment. 

Examiner’s note: To expedite the prosecution, applicant is encouraged to call examiner if they have any concern regarding this rejection.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 5-45 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hazlehurst et al. (Pub. No.: US 5974412 A) 

As to claim 1 Hazlehurst teaches a computer-implemented method for comparing text documents comprising the steps of 
a) building a database comprising first text document data associated with a plurality of first text documents (Column 2 lines 3-23: The system automatically indexes large quantities of documents in a database wherein the documents are chunk of text); and 
Column 22 lines 30-32: query is initiated by user such as free text search);
c) harmonizing the query and converting the query to second text document data (Column 22 line 30-61: In step 262, liaison 88 gets the words or phrases entered by user 86. In step 264, that text is passed to a grinder 100 (FIG. 7) which creates a query (document) index 102 (FIG. 7) from the text) and 
d) comparing second text document data to first text document data (column 10 lines 26-28, Column 22 line 30-61: a collator maintains a corpus of documents which are compared against queries by liaisons 88 to identify documents of interest to users 86) and 
e) computing at least one similarity measure between second text document data and first document data (Column 22 line 30-61: collators 108 utilize the "find_similar" function 352 (FIG. 15B) to find similar documents and return a recommendations list 233).

As to claim 2 Hazlehurst teaches wherein first text document data comprises document vectors generated from keywords comprised in first text documents and/or from words semantically related to said keywords (column 2 lines 57-67). 

As to claim 3 Hazlehurst teaches the query comprises a second text document and/or information identifying a second text document associated with second text document data comprised within the first text document data already stored within the memory component (column 20 lines 39-60, column 22, lines 30 – 61, column 10 lines 22-35). 

As to claim 5 Hazlehurst teaches converting the query to second text document data comprises normalizing the query (column 22 lines 39-60). 

As to claim 6 Hazlehurst teaches normalizing the query comprises retrieving at least synonyms, hypernyms, hyponyms, stop words and/or subject specific stop words from an external database and generating a list of the query's keywords based at least partially based on said retrieved words (column 8 line 66 to column 9 lines 41). 

As to claim 7 Hazlehurst teaches the list of the query's keywords is generated by removing stop words and/or subject-specific stop words and including at least one of the synonyms, hypernyms and hyponyms of the query's words (column 8 line 66 to column 9 lines 41). 

As to claim 8 Hazlehurst teaches converting the query to second text document data comprises generating at least one query vector (column 22 lines 41-47). 

As to claim 9 Hazlehurst teaches the query vector is generated by identifying keywords and/or synonyms of keywords from the query and identifying said keywords with components of a vector in a multidimensional vector space (column 8 line 66 to column 9 lines 41). 

As to claim 10 Hazlehurst teaches the query vector comprises from 100 to 500 components, preferably from 200 to 400 components, even more preferably from 200 to 300 components (column 22 lines 41-60). 

As to claim 11 Hazlehurst teaches converting the query to second text document data comprises generating at least one query vector and the query vector is generated by identifying keywords and/or synonyms of keywords from the query and identifying said keywords with components of a vector in a multidimensional vector space, wherein the keywords are assigned a weight (column 8 line 66 to column 9 lines 41). 

As to claim 12 Hazlehurst teaches weights are assigned at least partially based on the general subject of the query (column 22 lines 6-10). 

As to claim 13 Hazlehurst teaches computing the similarity measure comprises applying at least one or a combination of the Cosine Index, Jaccard Index, Dice Index, Inclusion Index, Person Correlation Coefficient, Levenstein Distance, Jaro-Winkler Distance and/or Needleman-Wunsch Algorithm (column 12 lines 8-13). 

As to claim 14 Hazlehurst teaches after step d) steps f) validating the at least one similarity measure using at least one statistical algorithm and g) outputting the at least one similarity measure (column 4 lines 60-67). 

As to claim 15 Hazlehurst teaches the query is received from a user interface and the similarity measure is returned via said interface (column 12 lines 4-15), 

As to claim 16 Hazlehurst teaches the database comprises patent literature-related text documents and wherein building the database and/or converting the query comprises removing stop words associated with patent literature-related text documents (column 9 lines 7-21). 

As to claim 17 Hazlehurst teaches patent-related stop words are removed by computing the entropy associated with terms comprised in first text document data and/or in the query and removing terms with low entropy (column 8 lines 40-50). 

As to claim 18 Hazlehurst teaches generating a term vector comprising keywords extracted from the plurality of first text documents (column 24 lines 38-43). 

As to claim 19 Hazlehurst teaches the components of the document vectors and the query vector are generated with respect to the components of the term vector (column 13 lines 22-45). 

As to claim 20 Hazlehurst teaches first text document data comprises document vectors generated from keywords comprised in first text documents or from words semantically related to said keywords, wherein converting the query to second text document data comprises generating at least one query vector, and wherein the similarity measure between second text document data and first document data is computed by using the cosine index to compute the distance between the query vector and the document vectors (column 11 lines 33-45, column 13 lines 22-45, column 15 lines 30-43). 

As to claim 21 Hazlehurst teaches a computer-implemented method for processing of similarities in text documents comprising 
a) harmonizing at least one incoming query (Column 22 line 30-61: query is initiated by user such as free text search. In step 262, liaison 88 gets the words or phrases entered by user 86. In step 264, that text is passed to a grinder 100 (FIG. 7) which creates a query (document) index 102 (FIG. 7) from the text); 
b) normalizing the at least one incoming harmonized query (Column 22 line 30-61: query is initiated by user such as free text search. In step 262, liaison 88 gets the words or phrases entered by user 86. In step 264, that text is passed to a grinder 100 (FIG. 7) which creates a query (document) index 102 (FIG. 7) from the text). Note that, liaison 88 receive the query and pass it to a grinder 100 where the query is harmonize and normalized. Examiner interpretation is based on applicant specification page 4 lines 18-32; and 
c) constructing at least one query vector using the at least one normalized harmonized query (column 22 lines 30-60, column 8 lines 66 to column 9 line 41, column 21 lines 35-36: map the query vector into collator centroid space 134); and 
d) computing at least one similarity measure between the at least one query vector and at least one further text document, wherein the at least one further text document underwent the previous steps (column 22 lines 30-60, Column 23 lines 25-39: collators 108 utilize the "find_similar" function 352 (FIG. 15B) to find similar documents and return a recommendations list 233 and steps are repeated for each fact in expert recommendations list).

As to claim 22 Hazlehurst teaches the text document comprises at least one or a combination of technical text, scientific text, patent text, and/or product description (column 7 lines 18-30). 

As to claim 23 Hazlehurst teaches harmonizing comprises correcting typographical errors, choosing a particular spelling convention and physical unit convention and adjusting the text based on it, and/or representing formulas (for example chemical formulas, gene sequences and/or protein representations) in a standard way (column 8 line 66 to column 9 lines 41). 

As to claim 24 Hazlehurst teaches normalizing comprises identifying and removing stop words, reducing words to common word stems, analysing the stems for synonyms and/or identifying word sequences and compound words (column 8 line 66 to column 9 lines 41). 

As to claim 25 Hazlehurst teaches normalizing further comprises identifying and removing stop words associated with a certain type of text documents, preferably by computing the entropy of terms within a plurality of text documents of said type and removing words with low entropy (column 7 lines 32-40). 

As to claim 26 Hazlehurst teaches computing the similarity measure comprises applying at least one or a combination of the Cosine Index, Jaccard Index, Dice Index, Inclusion Index, Person Correlation Coefficient, Levenstein Distance, Jaro-Winkler Distance and/or Needleman-Wunsch Algorithm (column 12 lines 8-13). 

As to claim 27 Hazlehurst teaches after step d), the following steps: f) validating the at least one similarity measure using at least one statistical algorithm; and g) outputting the at least one similarity measure (column 4 lines 60-67). 

As to claim 28 Hazlehurst teaches a computer implemented system comprising: 
a) at least one memory component adapted for at least storing a database comprising a plurality of first text document data associated with first text documents (column 7 lines 8-57: The storage system 60 and IFQE system 84 in one embodiment are located on a computer system and maintain documents in the computer system memory); 
b) at least one input device adapted for receiving a query comprising a second text document and/or information identifying a second text document, said second text document associated with second text document data comprised within first text document data already stored within the memory component (column 22 lines 30-61, column 7 line 8 to column 8 line 11: query is initiated by user such as free text search. In step 262, liaison 88 gets the words or phrases entered by user 86. In step 264, that text is passed to a grinder 100 (FIG. 7) which creates a query (document) index 102 (FIG. 7) from the text and In step 268, collators 108 utilize the "find_similar" function 352 (FIG. 15B) to find similar documents and return a recommendations list 233); and 
c) at least one processing component configured to harmonize the query and normalize the harmonized query, convert the normalized harmonized query into second text document data and/or retrieving second text document data associated with the normalized harmonized query from storage within the at least one memory component and compare second text document data to the first text document data stored within the at least one memory component (column 22, lines 30 – 61, column 10 lines 22-35: query is initiated by user such as free text search. In step 262, liaison 88 gets the words or phrases entered by user 86. In step 264, that text is passed to a grinder 100 (FIG. 7) which creates a query (document) index 102 (FIG. 7) from the text) and In step 268, collators 108 utilize the "find_similar" function 352 (FIG. 15B) to find similar documents and return a recommendations list 233). Note that, liaison 88 receive the query and pass it to a grinder 100 where the query is harmonize and normalized. Examiner interpretation is based on applicant specification page 4 lines 18-32); 
d) at least one output device adapted for returning information identifying at least one similar first text document associated with first text document data, said similar first text document most similar among first text documents to the query (column 10 lines 39-55, column 20 lines 7-13, column 22, lines 30 – 61: step 268, collators 108 utilize the "find_similar" function 352 (FIG. 15B) to find similar documents and return a recommendations list 233). 

As to claim 29 Hazlehurst teaches the first text document data comprises a plurality of document vectors and wherein the second text document data comprises a query vector ((column 2 lines 57-67). 

As to claim 30 Hazlehurst teaches the memory component comprises first text document data associated with scientific articles and/or technological descriptions and/or patent literature and or product description (Column 2 lines 8-23). 

As to claim 31 Hazlehurst teaches second text document data is obtained by harmonizing and normalizing the second text document and constructing at least one query vector (column 20 line 64 to column 22 lines 3). 

As to claim 32 Hazlehurst teaches the comparison between first text document data and second text document data yields a similarity index (column 7 lines 42 to column 8 line 31). 

As to claim 33 Hazlehurst teaches the output device returns information associated with a plurality of first text documents ordered by the similarity index from most similar to least similar, said first text documents associated with first text document data yielding the highest similarity index with second text document data (31) (column 20 line 64 to column 22 lines 3 ). 

As to claim 34 Hazlehurst teaches the similarity index is based on lexical and/or semantic comparison between text documents (column 2 lines 57-67). 

As to claim 35 Hazlehurst teaches the processing component identifies keywords during harmonizing and normalizing of the incoming second text document (column 8 line 66 to column 9 lines 41). 

As to claim 36 Hazlehurst teaches the processing component assigns weight to keywords based on an entropy algorithm (column 22 lines 6-10). 

As to claim 37 Hazlehurst teaches the processing component is adapted to divide the second text document into at least two parts for parallelized computing, preferably into at least four parts (column 7 lines 18-30). 

As to claim 38 Hazlehurst teaches the processing component comprises at least two, preferably at least four, more preferably at least eight kernels (column 7 lines 8-17). 

As to claim 39 Hazlehurst teaches the processing component is adapted to update first document data stored within the memory component regularly (column 7 lines 8-17). 

As to claim 40 Hazlehurst teaches the input device is further adapted to permit specifying the query by listing words and/or sentences that similar text documents must comprise and/or must not comprise (column 8 line 66 to column 9 lines 41). 

As to claim 41 Hazlehurst teaches the input device is further adapted to permit specifying the query by specifying the number of most similar text documents to be outputted (column 22 lines 30-61). 

As to claim 42 Hazlehurst teaches the memory component comprises RAM (random-access memory) (column 7 lines 8-17). 

As to claim 43 Hazlehurst teaches the memory component further comprises a term vector comprising keywords extracted from the plurality of first text documents (column 24 lines 38-43). 

As to claim 44 Hazlehurst teaches the processing component is adapted to generate the components of the document vectors and the query vector with respect to the components of the term vector (column 2 lines 57-67). 

As to claim 45 Hazlehurst teaches the processing component is adapted to compare the second text document data to the first text document data by using the cosine index to compute the distance between the query vector and the document vectors (column 2 lines 57-67).

Examiner's Note: Examiner has cited particular columns and line numbers or paragraphs in the references as applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in its entirety as potentially teaching of all or part of the claimed invention, as well as the context.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

The prior art made of record, listed on form PTO-892, and not relied upon, if any, is considered pertinent to applicant's disclosure.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MD I UDDIN whose telephone number is (571)270-3559. The examiner can normally be reached M-F, 8:00 am to 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: 



/MD I UDDIN/Primary Examiner, Art Unit 2169