DETAILED ACTION
This office action is in response to Applicant’s arguments and amendments filed on May 13, 2021. The application contains claims 1-20: 
Claims 4, 9, 13, and 18 are cancelled
Claims 1-3, 5-8, 10-12, 14-17, 19, and 20 are allowed

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) were submitted on April 14, 2021 and August 17, 2021. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given by email on August 11, 2021 following a telephone interview with SHANG, PENGJU (Reg. No. 80050) on August 09, 2021. The examiner-initiated interview summary, the examiner’s amendment, and the email authorization for entry of the examiner’s amendment have all been attached to this office action.

The application has been amended as per the attached Examiner’s Amendment, the content of which has also been enclosed as follows: 

Examiner’s Amendment to the Claims
1. (Currently Amended) A non-transitory computer-readable storage medium for generating file fingerprints, the storage medium storing instructions, when executed by one or more processors, cause the one or more processors to perform operations comprising:
obtaining a string of characters within a file;
dividing the string of characters into a plurality of sequences;
generating a plurality of hashes respectively for the plurality of sequences, wherein generation of each of the plurality of hashes includes a first calculation of a hash function based on characters within the corresponding sequence;
for [[one]] each of the plurality of hashes, determining whether the hash is divisible by a predetermined number;
upon determining that the hash is divisible by the predetermined number, selecting a sequence corresponding to the hash from the plurality of sequences;  
determining a new sequence in the string of characters that is shifted from the sequence by one or two characters in a reverse direction, wherein the new sequence and the sequence have one or more overlapping characters;
determining a new hash [[(hk-2)]] based on the new sequence, wherein generation of the new hash includes a second calculation of the hash function based on characters within the new sequence and the second calculation of the hash function reuses the first calculation of the hash function on the one or more overlapping characters; 
k-2)]] and an index [[(k-2)]] of the determined new hash to a hash list
generating a fingerprint for the file based on the hash list.

2. (Currently Amended) The non-transitory computer-readable storage medium of claim 1, wherein the plurality of sequences comprise k-grams, the k-grams comprising sequences of k-characters from the string of characters.

3. (Currently Amended) A system for generating file fingerprints, the system comprising:
one or more processors; and
a memory storing instructions that, when executed by the one or more processors, cause the system to perform:
obtaining a string of characters within a file;
dividing the string of characters into a plurality of sequences;
generating a plurality of first hashes for the plurality of sequences, wherein generation of each of the plurality of first hashes includes a first calculation of a hash function based on characters within the corresponding sequence;
for [[one]] each of the plurality of first hashes, determining whether the first hash is divisible by a predetermined number;
upon determining that the first hash is divisible by the predetermined number, selecting a first sequence corresponding to the first hash from the plurality of sequences ; 
 determining a second sequence that is shifted from the first sequence by one or two characters in a reverse direction, wherein the second sequence and the first sequence have one or more overlapping characters;
k-2)]] for the second sequence based on the first hash corresponding to the  first sequence, wherein generation of the second hash includes a second calculation of the hash function based on characters within the second sequence and the second calculation of the hash function reuses the first calculation of the hash function on the one or more overlapping characters; 
adding at least the second hash [[(hk-2)]] and an index [[(k-2)]] of the second hash to a hash list
generating a fingerprint for the file based on the hash list.

4. (Cancelled) 

5. (Currently Amended) The system of claim [[4]]3, wherein the hash function includes a rolling hash

6. (Previously Presented) The system of claim 3, wherein the dividing the string of characters into a plurality of sequences comprises:
dividing the string of characters into a plurality of sequences that are continuous and equal in length. 

7. (Previously Presented) The system of claim 6, wherein the plurality of sequences comprise k-grams, the k-grams comprising sequences of k-characters from the string of characters.

8. (Previously Presented) The system of claim 3, wherein the generating a fingerprint for the file based on the hash list comprises:


9. (Cancelled) 

10. (Currently Amended) The system of claim 3, further comprising:
adding the first hash [[(hk)]] and an index [[(k)]] of the first hash to the hash list. 

11. (Previously Presented) The system of claim 3, wherein obtaining the string of characters within the file includes:
obtaining the file, the file including text;
extracting the text of the file; and
normalizing the extracted text of the file.

12. (Currently Amended) A method for generating file fingerprints, the method comprising:
obtaining a string of characters within a file;
dividing the string of characters into a plurality of sequences;
generating a plurality of first hashes for the plurality of sequences, wherein generation of each of the plurality of first hashes includes a first calculation of a hash function based on characters within the corresponding sequence;
for [[one]] each of the plurality of first hashes, determining whether the 
upon determining that the 
two characters in a reverse direction, wherein the second sequence and the first sequence have one or more overlapping characters;
generating a second hash [[(hk-2)]] for the second sequence based on the first hash corresponding to the first sequence, wherein generation of the second hash includes a second calculation of the hash function based on characters within the second sequence and the second calculation of the hash function reuses the first calculation of the hash function on the one or more overlapping characters; 
adding at least the second hash [[(hk-2)]] and an index [[(k-2)]] of the second hash into a hash list
generating a fingerprint for the file based on the hash list.

13. (Cancelled) 

14. (Currently Amended) The method of claim [[13]]12, wherein the hash function includes a rolling hash.

15. (Previously Presented) The method of claim 12, wherein the dividing the string of characters into a plurality of sequences comprises:
dividing the string of characters into a plurality of sequences that are continuous and equal in length. 

16. (Previously Presented) The method of claim 15, wherein the plurality of sequences comprise k-grams, the k-grams comprising sequences of k-characters from the string of characters.

17. (Previously Presented) The method of claim 12, wherein the generating a fingerprint for the file based on the hash list comprises:
returning the hash list as the fingerprint for the file. 

18. (Cancelled) 

19. (Currently Amended) The method of claim 12, wherein the second sequence is shifted from the first sequence by one or two characters in a forward direction.

20. (Original) The method of claim 12, wherein obtaining the string of characters within the file includes:
obtaining the file, the file including text;
extracting the text of the file; and
normalizing the extracted text of the file.









Reasons for Allowance
The following is an examiner’s statement of reasons for allowance:
Claims 1-3, 5-8, 10-12, 14-17, 19, and 20 are allowable over the prior art of record. The closest prior art of record Wallace et al. (US 8756249 B1) teaches: 
“a non-transitory computer-readable storage medium for generating file fingerprints, the storage medium storing instructions, when executed by one or more processors, cause the one or more processors to perform operations comprising: 
obtaining a string of characters within a file; 
dividing the string of characters into a plurality of sequences;
generating a plurality of hashes respectively for the plurality of sequences, wherein generation of each of the plurality of hashes includes a first calculation of a hash function based on characters within the corresponding sequence;
selecting, from the plurality of sequences, a subset of sequences with corresponding hashes;”

and Antoun et al. (US 9514312 B1) teaches:
“for each of the subset of sequences, determining a new sequence in the string of characters that is shifted from the each sequence by one or more characters, wherein the new sequence and the each sequence have one or more overlapping characters; 
	determining a new hash (hk-2) based on the new sequence, wherein generation of the new hash includes a second calculation of the hash function based on characters within the new sequence and the second calculation of the hash function reuses the first calculation of the hash function on the one or more overlapping characters.”


 “upon determining that the hash is divisible by the predetermined number, selecting a sequence corresponding to the hash from the plurality of sequences;
determining a new sequence in the string of characters that is shifted from the sequence by one or two characters in a reverse direction, wherein the new sequence and the sequence have one or more overlapping characters;
[…]
adding at least the determined new hash and an index of the determined new hash to a hash list;”

Dependent claims 2, 5-8, 10, 11, 14-17, 19, and 20 are allowable at least for the reasons recited above including all the limitations of the allowable independent base claim upon which they depend.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAOQIN HU whose telephone number is (571)272-1792.  The examiner can normally be reached on Monday-Friday 7:00am-3:30pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIAOQIN HU/Examiner, Art Unit 2168               

/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168