Detailed Action

AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on July 19, 2022 has been entered.

Claims
Claims 1, 9, and 17 have been amended. Claims 2 and 10 have been canceled. Claims 1, 3-9, and 11-21 are pending and rejected in the application. 

Arguments 

Applicant Argues: 
Generating a document identifier for a document and then encoding a document identifier with a frequency of a term for inclusion in the term posting for the term (e.g., generating the document identifier for Doc M3 and encoding the document identifier for Doc M3 with the frequency of the term “grocery” for inclusion in the term posting for the term “grocery”) does not teach “generating the reference document content identifier based on the encoded plurality of frequently occurring terms by combining the encoded plurality of frequently occurring terms” let alone “... into a bit array representation populated with the encoded plurality of frequently occurring terms, the bit array representation having bits set at locations representing the identified frequently occurring terms (refer also to FIGS. 7A-7C of the present application), as recited in claim 1 (and recited in a similar manner in claims 9 and 17).

Examiner Responds:
Applicant's 35 USC § 103 arguments, noted above, with respect to claims 1, 3-9, and 11-21 have been considered but are moot in view of the new ground(s) of rejection. 

Claim Rejections – 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 4, 5, 8, 11, 12, 13, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gipp et al. U.S. Patent Publication (2011/0264672; hereinafter: Gipp) in view of Shen et al. U.S. Patent Publication (2012/0271813; hereinafter: Shen) and further in view of Hopcroft et al. U.S. Patent Publication (2016/0378796; hereinafter: Hopcroft) and further in view of King et al. U.S. Patent Publication (2011/0035656; hereinafter: King) 

Claims 1, 9, and 17
As to claims 1, 9, and 17, Gipp discloses a system for document similarity analysis, comprising: 
a computing device comprising a computer processor (paragraph[0014], “With reference to FIG. 1, computing device 100 includes a bus 110 that directly or indirectly couples the following devices: memory 112, one or more processors 114…etc.”); 
a document content identifier encoding engine, executable on the computer processor to (paragraph[0056]-paragraph[0057], “In the example shown in FIG. 2, similar documents are detected for the document Input Document (ID). For this, the document Citing Document (CD) is analyzed and evaluated….etc.”): 
receive, by a processor, a selection of a reference document specified by a user via a user interface (paragraph[0006], “the searching person enters a document (input document), which is considered to be interesting or relevant for a certain subject area…etc.”); 

Gipp does not appear to explicitly disclose convert, by the processor, the selected reference document into a reference document content identifier from content of the selected reference document, the converting further comprising: 
tokenize the content of the selected reference document;
identify a plurality of frequently occurring terms in the tokenized content of the reference document; 
for each of the identified frequently occurring terms, encode the respective identified frequently occurring term by mapping the respective identified frequently occurring term to a number; and 
generate the reference document content identifier based on the encoded plurality of frequently occurring terms by combining the encoded frequently occurring terms into a bit array representation populated with the encoded plurality of frequently occurring terms, the bit array representation having bits set at locations representing the identified frequently occurring terms; 
compare the reference document content identifier with a plurality of archived document content identifiers, each associated with one of a plurality of archived documents stored in a document repository; 
generate a plurality of document similarity values based on comparisons of the reference document content identifier with the plurality of archived document content identifiers; and 
return, by the processor, a document list for presentation on the user interface, the document list listing a set of documents identified from the plurality of archived documents based on the plurality of document similarity values.

However, Shen discloses convert, by the processor, the selected reference document into a reference document content identifier from content of the selected reference document (paragraph[0032], “An alternative encoding mechanism in which the document identifier is interleaved with the term frequency takes into consideration the fact that many term frequencies are below a threshold (e.g., 3). Accordingly, a quantity of bits is reserved to store the encoded document information (i.e., the document identifier and the term frequency)…etc.”), the converting further comprising: 
tokenize the content of the selected reference document (paragraph[0032], “] An alternative encoding mechanism in which the document identifier is interleaved with the term frequency takes into consideration the fact that many term frequencies are below a threshold (e.g., 3). Accordingly, a quantity of bits is reserved to store the encoded document information (i.e., the document identifier and the term frequency). Under this encoding mechanism, when the term frequency is less than a frequency-value threshold (e.g., 3), the quantity of bits is used to encode the term frequency and additional bytes are not used to encode a frequency field...etc.”);
identify a plurality of frequently occurring terms in the tokenized content of the reference document (paragraph[0028]-paragraph[0032], “] FIG. 2 depicts that information set 242 includes information that is relevant to document M3, which includes the term "grocery." For example, information set 242 includes the label "Doc M3 Info," which indicates that information 242 includes information related to Doc M3. Information set 242 is encoded in a manner that balances a desire to limit memory utilization with an objective of reducing CPU usage required to decode information set 242. For example, a document identifier used to identify Doc M3 is interleaved with a term frequency of "grocery."…etc.”); 
for each of the identified frequently occurring terms, encode the respective identified frequently occurring term by mapping the respective identified frequently occurring term to a number (Figure 2, paragraph[0032]-paragraph[0035], “Accordingly, a quantity of bits is reserved to store the encoded document information (i.e., the document identifier and the term frequency…etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Gipp with the teachings of Shen to encode a document id with frequency of terms which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Gipp with the teachings of Shen to have a multi-layer search engine index to serve a query (Shen: paragraph[0011]). 

The combination of Gipp and Shen do not appear to explicitly disclose generate the reference document content identifier based on the encoded plurality of frequently occurring terms by combining the encoded plurality of frequently occurring terms into a bit array representation populated with the encoded plurality frequently occurring terms, the bit array representation having bits set at locations representing the identified frequently occurring term;  
compare the reference document content identifier with a plurality of archived document content identifiers, each associated with one of a plurality of archived documents stored in a document repository; 
generate a plurality of document similarity values based on comparisons of the reference document content identifier with the plurality of archived document content identifiers; and 
return, by the processor, a document list for presentation on the user interface, the document list listing a set of documents identified from the plurality of archived documents based on the plurality of document similarity values.

However, Hopcroft discloses generate the reference document content identifier based on the encoded plurality of frequently occurring terms by combining the encoded plurality of frequently occurring terms into a bit array representation populated with the encoded plurality frequently occurring term (paragraph[0086]-paragraph[0087], “FIG. 7 illustrates an example of a very simple search index 700 using bit vectors. The search index 700 stores 16 bit vectors, each bit vector comprising an array of bits. The bit vectors include four long row bit vectors 702 with each bit corresponding to a single document. As can be seen in FIG. 7, each long row bit vector 702 includes 32 bits such that the search index 700 indexes information for 32 documents. The search index 700 also stores a number of short row bit vectors. In particular, the search index 700 stores four rank-1 bit vectors 704 (i.e., two documents per bit), four rank-2 bit vectors 706 (i.e., four documents per bit), and four rank-3 bit vectors 708 (i.e., eight documents per bit)…etc.”), the bit array representation having bits set at locations representing the identified frequently occurring term (figures 7 and 21, paragraph[0086]-paragraph[0088], “bits are set in the bit vectors based on the presence of terms in the documents…FIG. 7 and the above discussion, each bit vector includes bits that are consecutive in storage to represent which documents contain one or more of the terms represented by the bit vector…etc.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Gipp with the teachings of Shen and Hopcroft to have a bit vector representing terms in documents which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Gipp with the teachings of Shen and Hopcroft to remove invalid matching documents returned from a bit vector search index (Hopcroft: paragraph[0005]). 

The combination of Gipp, Shen, and Hopcroft do not appear to explicitly disclose compare the reference document content identifier with a plurality of archived document content identifiers, each associated with one of a plurality of archived documents stored in a document repository; 
generate a plurality of document similarity values based on comparisons of the reference document content identifier with the plurality of archived document content identifiers; and 
return, by the processor, a document list for presentation on the user interface, the document list listing a set of documents identified from the plurality of archived documents based on the plurality of document similarity values.

However, King discloses compare the reference document content identifier with a plurality of archived document content identifiers, each associated with one of a plurality of archived documents stored in a document repository (Figure 7, paragraph[0595], “In step 730, the system applies a constraint to the group of candidate documents. The system may apply the constraint to reduce the number of candidate documents. The system may constrain the group of documents based on context information, the level of match certainty, and so on…etc.”); 
generate a plurality of document similarity values based on comparisons of the reference document content identifier with the plurality of archived document content identifiers(Figure 7, paragraph[0595], “In step 740, the system identifies a document or documents that match the rendered document to a certain degree of certainty. The system may predict what document or documents are likely to match the rendered document, and attempts to disambiguate the group of candidate documents until an unambiguous choice remains…etc.”); and 
return, by the processor, a document list for presentation on the user interface, the document list listing a set of documents identified from the plurality of archived documents based on the plurality of document similarity values (figure 7, paragraph[0013], “The system may use the document identifier to generate a group of documents having similar document identifiers…etc.”, paragraph[594]-paragraph[0597], the reference describes using a document identifier to obtain candidate documents with similar document identifiers. The reference describes documents as archived documents (e.g., paragraph[0502], “digital archive of any important documents…etc.”).). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Gipp with the teachings of Shen, Hopcroft, and King to use a document identifier to search for similar candidate documents. The skilled artisan would have been motivated to improve the teachings of Gipp with the teachings of Shen, Hopcroft, and King to a document identifier to generate a group of documents having similar document identifiers (King: paragraph[0013]). 

Claims 3, 11, and 18
As to claims 3, 11, and 18, the combination of Gipp, Shen, Hopcroft, and King discloses all the elements in claim 9, as noted above, and Shen further disclose wherein encoding the identified frequently occurring terms includes applying a hash function to each of the identified frequently occurring terms (paragraph[0041], “As indicated, the WordID might include one of two variations: fixed mapping (from 318) or hash mapping (from 320). Generally, popular terms receive a WordID that is generated using a fixed mapping approach, such that each term maps to a pre-defined number in a space of [0 . . . 2.sup.k-1]. Based on a popularity of the term, either 1 or 2 bytes are used to create a respective WordID…etc.”)

Claims 4 and 12
As to claims 4 and 12, the combination of Gipp, Shen, Hopcroft, and King discloses all the elements in claim 10, as noted above, and Shen further disclose wherein encoding thee identified frequently occurring terms further comprises applying a modulo operation on each of the hashed values to generate a term identifier for each of the identified frequently occurring terms (Figure 2, paragraph[0032], “An alternative encoding mechanism in which the document identifier is interleaved with the term frequency takes into consideration the fact that many term frequencies are below a threshold (e.g., 3). Accordingly, a quantity of bits is reserved to store the encoded document information (i.e., the document identifier and the term frequency). Under this encoding mechanism, when the term frequency is less than a frequency-value threshold (e.g., 3), the quantity of bits is used to encode the term frequency and additional bytes are not used to encode a frequency field. In addition, pursuant to this mechanism, when the term frequency exceeds the frequency-value threshold, additional bytes are used to encode the frequency field. Such an encoding mechanism might allow for a better compression ratio, in exchange for reduced decoding performance. Accordingly, exemplary information might take the following forms when maintained in term index 234 and when a TF field is fixed at 2 bits...etc.”). 

Claims 5 and 13
As to claims 5 and 13, the combination of Gipp, Shen, Hopcroft, and King discloses all the elements in claim 10, as noted above, and Shen further disclose wherein encoding the identified frequently occurring terms includes applying an algorithm to each of the identified frequently occurring terms to generate output having a specified length (paragraph[0032]-paragraph[0033, “An alternative encoding mechanism in which the document identifier is interleaved with the term frequency takes into consideration the fact that many term frequencies are below a threshold (e.g., 3). Accordingly, a quantity of bits is reserved to store the encoded document information (i.e., the document identifier and the term frequency). Under this encoding mechanism, when the term frequency is less than a frequency-value threshold (e.g., 3), the quantity of bits is used to encode the term frequency and additional bytes are not used to encode a frequency field. In addition, pursuant to this mechanism, when the term frequency exceeds the frequency-value threshold, additional bytes are used to encode the frequency field. Such an encoding mechanism might allow for a better compression ratio, in exchange for reduced decoding performance. Accordingly, exemplary information might take the following forms when maintained in term index 234 and when a TF field is fixed at 2 bits…etc.”). 

Claims 6, 14, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gipp et al. U.S. Patent Publication (2011/0264672; hereinafter: Gipp) in view of Shen et al. U.S. Patent Publication (2012/0271813; hereinafter: Shen) and further in view of Hopcroft et al. U.S. Patent Publication (2016/0378796; hereinafter: Hopcroft) and further in view of King et al. U.S. Patent Publication (2011/0035656; hereinafter: King) and further in view of Al-Shammari U.S. Patent Publication (2010/0082333; hereinafter: Al-Shammari, in IDS dated February 14, 2020)  

Claims 6, 14, and 19
As to claims 6, 14, and 19, the combination of Gipp, Shen, Hopcroft, and King discloses all the elements in claim 9, as noted above, but do not appear to explicitly disclose wherein converting the selected reference document content identifier further comprises, prior to identifying the frequently occurring terms in the content of the reference document content: removing stop words from the reference document content; and stemming the reference document content. 

However, Al-Shammari discloses wherein converting the selected reference document content identifier further comprises, prior to identifying the frequently occurring terms in the content of the reference document content: 
removing stop words from the reference document content (paragraph[0014], “stop words removal…etc.”); and 
stemming the reference document content (paragraph[0014]-paragraph[0015], “Stemming is a computational process for reducing words to their root (or stem), and it can be viewed as a recall-enhancing…etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Gipp with the teachings of Shen, King, Hopcroft, and Al-Shammari to tokenize and stem text from a document. The skilled artisan would have been motivated to improve the teachings of Gipp with the teachings of Shen, King, Hopcroft, and Al-Shammari to create an effective search and retrieval tool for Arabic information (Al-Shammari: paragraph[0013]). 

Claims 7, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Gipp et al. U.S. Patent Publication (2011/0264672; hereinafter: Gipp) in view of Shen et al. U.S. Patent Publication (2012/0271813; hereinafter: Shen) and further in view of Hopcroft et al. U.S. Patent Publication (2016/0378796; hereinafter: Hopcroft) and further in view of King et al. U.S. Patent Publication (2011/0035656; hereinafter: King) and further in view of Sukhodolov et al. U.S. Patent Publication (2016/0275347; hereinafter: Sukhodolov, in IDS dated February 14, 2020)  

Claims 7, 15, and 20
As to claims 7, 15, and 20, the combination of Gipp, Shen, Hopcroft, and King discloses all the elements in claim 9, as noted above, but do not appear to explicitly disclose wherein the plurality of archived document content identifiers are each organized in a binary search tree.

However, Sukhodolov discloses wherein the plurality of archived document content identifiers are each organized in a binary search tree (paragraph[0053], “With a trie, a string may be used to quickly get its identifier and search for triplets where it is object's value. Number identifiers may also be computed and stored using a trie (a binary tree having a key for each of its nodes)….etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Gipp with the teachings of Shen, King, Hopcroft, and Sukhodolov to search a binary tree for document identifiers. The skilled artisan would have been motivated to improve the teachings of Gipp with the teachings of Shen, King, Hopcroft, and Sukhodolov to efficiently search a binary tree for document identifiers. 

Claims 8, 16, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Gipp et al. U.S. Patent Publication (2011/0264672; hereinafter: Gipp) in view of Shen et al. U.S. Patent Publication (2012/0271813; hereinafter: Shen) and further in view of Hopcroft et al. U.S. Patent Publication (2016/0378796; hereinafter: Hopcroft) and further in view of King et al. U.S. Patent Publication (2011/0035656; hereinafter: King) and further in view of Sukhodolov et al. U.S. Patent Publication (2016/0275347; hereinafter: Sukhodolov) and further in view of Chun et al. U.S. Patent Publication (8,661,023; hereinafter: Chun, in IDS dated February 14, 2020) 

Claims 8, 16, and 21
As to claim 8, 16, and 21, the combination of Gipp, Shen, Hopcroft, King, and Sukhodolov discloses all the elements in claim 15, as noted above, but do not appear to explicitly disclose wherein generating the plurality of document similarity values further comprises traversing the respective binary search trees to identify an archived document content identifier with a highest similarity to the reference document content identifier.

However, Chen discloses wherein generating the plurality of document similarity values further comprises traversing the respective binary search trees to identify an archived document content identifier with a highest similarity to the reference document content identifier (column 11, lines 59-67 and column 12, lines 1-25, “term node may return a document identifier of the first matching document at or after the passed-in document id rather than returning true/false…etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Gipp with the teachings of Shen, Hopcroft, King, Sukhodolov, and Chun to search and return a matching document identifiers in a tree. The skilled artisan would have been motivated to improve the teachings of Gipp with the teachings of Shen, Hopcroft, King, Sukhodolov, and Chun to improve processing time for a search query that uses a hierarchical query tree structure (Chun: Column 1, lines 35-37). 


















Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAWAUNE A CONYERS whose telephone number is (571)270-3552.  The examiner can normally be reached on M-F 8:00am-4:30pm EST. EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on (408) 918-7548.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAWAUNE A CONYERS/Primary Examiner, Art Unit 2152                                                                                                                                                                                                        September 10, 2022