DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to Application filed 10/22/2020.
No priority date is claimed.  Therefore, the effective filing date of this application is 10/22/2020.
Claims 1-20 are pending.

Remarks

Regarding claim 1, claim 1 recites a method comprising a series of steps performed with a data processing hardware (i.e., a machine), which is directed to a process (i.e., a statutory category of invention).  In addition, claim 1 reciting a method/technique for identifying similar files is not directed to any judicial exception, including a nature of law, a natural phenomenon or any abstract idea identified by the courts as defined in the 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG) and the October 2019 Update.  Therefore, claim 1 as well as its dependent claims 2-10 are eligible under 35 U.S.C. §101 according to the 2019 PEG and the October 2019 Update.

Regarding claim 11, claim 11 recites a system comprising a data processing hardware and memory hardware, which is directed to a machine (i.e., a statutory category of invention).  In addition, claim 11 reciting a method/technique for identifying similar files is not directed to any judicial exception, including a nature of law, a natural phenomenon or any abstract idea identified by the courts as defined in the 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG) and the October 2019 Update.  Therefore, claim 11 as well as its dependent claims 12-20 are eligible under 35 U.S.C. §101 according to the 2019 PEG and the October 2019 Update.

Claim Objections

Claims 1-20 are objected to because of the following informalities:  

As to claim 1, the limitation “in file database” in line 14 should be “in the file database”, and the limitation “the database” in line 19 should be “the file database” for being consistent in claim language.

As to claim 11, the limitation “in file database” in line 17 should be “in the file database”, and the limitation “the database” in line 21 should be “the file database” for being consistent in claim language.

	Other dependent claims are objected as incorporating the informality of the objected independent claims 1 and 11 upon which they depend correspondingly.

Appropriate correction is required.
Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3 and 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 3, it is unclear whether limitation “the instructions” in line 2 refers to “the sequence of instructions” recited in line 2.  Also, it is unclear regarding the recitation “the instructions determine whether to continue the sequence of instructions or transition to another  portion of the instruction (e.g., how to continue the sequence of transition? How to transition to another portion of the instructions? Which instruction are the instructions? What is another portion of the instructions?).

Regarding claim 13, it is unclear whether limitation “the instructions” in line 2 refers to “the sequence of instructions” recited in line 2 OR refers to “instructions” recited in line 4 of claim 11.  Also, it is unclear regarding the recitation “the instructions determine whether to continue the sequence of instructions or transition to another  portion of the instruction (e.g., how to continue the sequence of transition? How to transition to another portion of the instructions? Which instruction are the instructions? What is another portion of the instructions?).

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1, 4-6, 10, 11, 14-16 and 20 (effective filing date 10/22/2020) are rejected under 35 U.S.C. 103 as being unpatentable over Davis et al. (U.S. Patent No. 10,484,419, Patent date 11/19/2019), and further in view of Banerjee et al. (U.S. Publication No. 2019/0179937, Publication date 06/13/2019).

As to claim 1, Davis et al. teaches:
“A method” (see Davis et al., Abstract and Fig. 1) comprising:
“receiving, at a data processing hardware, a plurality of files” (see Davis et al., [column 11, lines 17-21] for receiving a collection of software modules; also see [column 5, lines 57-58] wherein each software module can be a binary executable file); 
“for each file of the plurality of files” (see Davis et al., [column 11, lines 17-21] for each software module):
“identifying, by the data processing hardware, executable portions of the respective file” (see Davis et al., [column 11, lines 18-20] for extracting a set of executable code fragments);
“dividing, by the data processing hardware, the identified executable portions of the respective file into code blocks” (see Davis et al., [column 11, lines 18-20] for extracting a set of executable code fragments from the software module, wherein each executable code fragment can be interpreted as a code block, and wherein one or more executable code fragments can be interpreted as an executable portion as recited); 
“for each code block of the respective file, generating, by the data processing hardware, a hash to represent the respective code block” (see Davis et al., [column 11, lines 18-21] for generating a fingerprint for each executable code fragment; also see [column 1, lines 44-46] wherein each fingerprint is equivalent to a hash as recited); and 
“storing, by the data processing hardware, the respective file in a file database as a respective sequence of the hashes generated to represent the code blocks divided from the identified executable portions of the respective file” (see Davis et al., [column 11, lines 17-21] for building a knowledge base (i.e., a file database) from a collection of known software modules; also see Fig. 1 and [column 3, lines 7-25] for the attack database);
“receiving, at the data processing hardware, a query to identify whether a first file of the plurality of files stored in the file database is similar to any other file stored in the file database” (see Davis et al., [column 11, lines 37-40] wherein a request/command for evaluating a new or unknown software module for similarity against a given knowledge base as disclosed can be interpreted as equivalent to the query as recited);
“determining, by the data processing hardware, whether any hash in the respective sequence of the hashes associated with the first file stored in the file database matches any of the hashes in the respective  sequence of the hashes associated with each other file of the plurality of files stored in the database” (see Davis et al., [column 11, lines 53-54] for determining whether software module g and other software model f share one executable code fragment (i.e., having one matching executable code fragment; also see [column 6, lines 56-67] for determining similarity between software modules based on matching between one extracted code fragment of the first software module with one extracted code fragment of the second software module); and
“when one of the hashes in the respective sequence of the hashes associated with the first file matches one of the hashes in the respective sequence of the hashes associated with the second file of the plurality of files stored in the file database, generating, by the data processing hardware, a response to the query indicating that the second file is similar to the first file” (see Davis et al., [column 11, lines 45-55] determining the two files as similar if they share one executable code fragment based on comparing fingerprints of executable code fragments associated with each file and labeling/classifying software module g based on similar/matching software module fi (e.g., classifying software module g as malware if fi is labeled as malware), wherein the determination that software module g and software module fi are similar and/or the classification of software module g can be interpreted as a response to request/command to evaluating software module g).
Thus, Davis et al. teaches processing and comparing a new software module against of database/collection of known software modules (see Davis et al., [column 11, lines 17-58]) and teaching comparing and calculating a similarity score between a first software module and at least one second software module (see Fig. 2 and [column 4, lines 40-57]).
However, Davis et al. does not explicitly teach comparing software modules/files stored in the same database (i.e., the first software module and the at least one second software module are from the same database)
On the other hand, Banerjee et al. teaches a feature of comparing or identifying similar objects/files in the same database.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Banerjee et al.'s teaching to Davis et al.’s system by implementing a feature of identifying similar software modules/files stored in the same database.  Ordinarily skilled artisan would have been motivated to do so to Davis et al.’s system with an effective way to identify similar software modules from existing database.  In addition, feature of identifying similar items/files from the same repository/database is well-known and well-used in the art, e.g., identifying files similar to a selected file in a file system/database, identifying products similar to a selected product from a product catalog/database, etc.).

As to claim 11, Davis et al. teaches:
“A system” (see Davis et al., Abstract and Fig. 1) comprising:
“data processing hardware” (see Davis et al., Fig. 5 for processor); and
“memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising” (see Davis et al., Fig. 5 for memory and Fig. 1 for executing different modules in the threat detection and remediation system):
“receiving a plurality of files” (see Davis et al., [column 11, lines 17-21] for receiving a collection of software modules; also see [column 5, lines 57-58] wherein each software module can be a binary executable file); 
“for each file of the plurality of files” (see Davis et al., [column 11, lines 17-21] for each software module):
“identifying executable portions of the respective file” (see Davis et al., [column 11, lines 18-20] for extracting a set of executable code fragments);
“dividing the identified executable portions of the respective file into code blocks” (see Davis et al., [column 11, lines 18-20] for extracting a set of executable code fragments from the software module, wherein each executable code fragment can be interpreted as a code block, and wherein one or more executable code fragments can be interpreted as an executable portion as recited); 
“for each code block of the respective file, generating a hash to represent the respective code block” (see Davis et al., [column 11, lines 18-21] for generating a fingerprint for each executable code fragment; also see [column 1, lines 44-46] wherein each fingerprint is equivalent to a hash as recited); and 
“storing the respective file in a file database as a respective sequence of the hashes generated to represent the code blocks divided from the identified executable portions of the respective file” (see Davis et al., [column 11, lines 17-21] for building a knowledge base (i.e., a file database) from a collection of known software modules; also see Fig. 1 and [column 3, lines 7-25] for the attack database);
“receiving a query to identify whether a first file of the plurality of files stored in the file database is similar to any other file stored in the file database” (see Davis et al., [column 11, lines 37-40] wherein a request/command for evaluating a new or unknown software module for similarity against a given knowledge base as disclosed can be interpreted as equivalent to the query as recited);
“determining whether any hash in the respective sequence of the hashes associated with the first file stored in the file database matches any of the hashes in the respective  sequence of the hashes associated with each other file of the plurality of files stored in the database” (see Davis et al., [column 11, lines 53-54] for determining whether software module g and other software model f share one executable code fragment (i.e., having one matching executable code fragment; also see [column 6, lines 56-67] for determining similarity between software modules based on matching between one extracted code fragment of the first software module with one extracted code fragment of the second software module); and
“when one of the hashes in the respective sequence of the hashes associated with the first file matches one of the hashes in the respective sequence of the hashes associated with the second file of the plurality of files stored in the file database, generating a response to the query indicating that the second file is similar to the first file” (see Davis et al., [column 11, lines 45-55] determining the two files as similar if they share one executable code fragment based on comparing fingerprints of executable code fragments associated with each file and labeling/classifying software module g based on similar/matching software module fi (e.g., classifying software module g as malware if fi is labeled as malware), wherein the determination that software module g and software module fi are similar and/or the classification of software module g can be interpreted as a response to request/command to evaluating software module g).
Thus, Davis et al. teaches processing and comparing a new software module against of database/collection of known software modules (see Davis et al., [column 11, lines 17-58]) and teaching comparing and calculating a similarity score between a first software module and at least one second software module (see Fig. 2 and [column 4, lines 40-57]).
However, Davis et al. does not explicitly teach comparing software modules/files stored in the same database (i.e., the first software module and the at least one second software module are from the same database)
On the other hand, Banerjee et al. teaches a feature of comparing or identifying similar objects/files in the same database.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Banerjee et al.'s teaching to Davis et al.’s system by implementing a feature of identifying similar software modules/files stored in the same database.  Ordinarily skilled artisan would have been motivated to do so to Davis et al.’s system with an effective way to identify similar software modules from existing database.  In addition, feature of identifying similar items/files from the same repository/database is well-known and well-used in the art, e.g., identifying files similar to a selected file in a file system/database, identifying products similar to a selected product from a product catalog/database, etc.).

As to claims 4 and 14, these claims are rejected based on the same arguments as above to reject claims 1 and 11 respectively and are similarly rejected including the following:
Davis et al. as modified by Banerjee et al. teaches:
“wherein identified the executable portions of the respective file comprises removing at least one non-executable portion of the respective file” (see Davis et al., [column 4, lines 40-41] for extracting one or more code fragments from a first software module, wherein a code fragment comprises executable code (see [column 5, lines 56-66]) (i.e., no non-executable code is extracted/included)).

As to claims 5 and 15, these claims are rejected based on the same arguments as above to reject claims 1 and 11 respectively and are similarly rejected including the following:
Davis et al. as modified by Banerjee et al. teaches:
“wherein generating the hash to represent the respective code block comprises generating the hash having a fixed length” (see Davis et al., [column 1, lines 44-46] for generating each of fingerprints by applying a fuzzy hash function to a given one of the code fragments wherein a hash function converts data of arbitrary length into a fixed length).

As to claims 6 and 16, these claims are rejected based on the same arguments as above to reject claims 1 and 11 respectively and are similarly rejected including the following:
Davis et al. as modified by Banerjee et al. teaches:
“wherein the plurality of files comprise binary file” (see Davis et al., [column 7, lines 32-35] wherein software modules are binary files).

As to claims 10 and 20, these claims are rejected based on the same arguments as above to reject claims 1 and 11 respectively and are similarly rejected including the following:
Davis et al. as modified by Banerjee et al. teaches:
“wherein one of the code block include non-executable portions of the respective file” (see Davis et al., [column 4, lines 40-41] for extracting one or more code fragments from a first software module, wherein a code fragment comprises executable code (see [column 5, lines 56-66]) (i.e., no non-executable code is extracted/included)).

Claims 2, 3, 8, 9, 12, 13, 18 and 19 (effective filing date 10/22/2020) are rejected under 35 U.S.C. 103 as being unpatentable over Davis et al. (U.S. Patent No. 10,484,419, Patent date 11/19/2019), in view of Banerjee et al. (U.S. Publication No. 2019/0179937, Publication date 06/13/2019), and further in view of Topan et al. (U.S. Publication No. 2015/0326585, Publication date 11/12/2015).	

As to claims 2 and 12, Davis et al. as modified by Banerjee et al. teaches all limitations as recited in claims 1 and 11 respectively, including dividing the identified executable portions of the respective file into code blocks (see Davis et al., [column 5, lines 55-66] for extracting one or more code fragments (i.e., code blocks) from a first software module (i.e., binary executable file)).
However, Davis et al. as modified by Banerjee et al. does not explicitly teach:
“wherein dividing the identified executable portions of the respective file into code blocks comprises, for each executable portion of the identified executable portions of the respective file:
identifying one or more locations in a sequence of instructions for the corresponding executable portion of the respective file; and
at each location of the identified one or more locations in the sequence of instructions:
designating an end of a first code block; and
designating a start of a second code block”.
On the other hand, Topan et al. teaches:
“wherein dividing the identified executable portions of the respective file into code blocks comprises, for each executable portion of the identified executable portions of the respective file” (see Topan et al., [0059] for separating the target object into code blocks; also see [0058 for dividing a code function into code blocks):
“identifying one or more locations in a sequence of instructions for the corresponding executable portion of the respective file” (see Topan et al., [0056] wherein each code fragment/block can be a function block); and
“at each location of the identified one or more locations in the sequence of instructions:
designating an end of a first code block; and
designating a start of a second code block” (see Topan et al., [0056] wherein function blocks start with a PUSH EBP; MOV EBP, ESP instruction sequence and end with POP EBP).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Topan et al.'s teaching to Davis et al.’s system by implementing a feature of dividing a code portion into code blocks by identifying locations in the sequence of instructions for the code portion.  Ordinarily skilled artisan would have been motivated to do so to provide Davis et al.’s system with an alternative and effective way to dividing/identify and extracting code fragments/blocks from the software module/file.  In addition, both of the references (Davis et al. and Topan et al.) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as, extracting code blocks from a target file/object, calculating a hash for each code block, and comparing objects/files by comparing a set of hashes associated with them.  This close relation between both of the references highly suggests an expectation of success when combined.

As to claims 3 and 13, these claims are rejected based on the same arguments as above to reject claims 2 and 12 respectively and are similarly rejected including the following:
David et al. as modified by Banerjee et al. and Topan et al. teaches:
“wherein, at the identified one or more locations in the sequence of instructions, the instructions determine whether to continue the sequence of instructions or transition to another portion of the instruction” (see David et al., [column 4, lines 40-41] for extracting one or more code fragments from a first software module, wherein each code fragment must be defined by two locations in the first software module; also see Topan et al., [0079] for separating a target object into a multitude of code blocks, wherein each code block comprises a sequence of consecutive processor instructions (see [0058])). 

As to claims 8 and 18, Davis et al. as modified by Banerjee et al. teaches all limitations as recited in claims 1 and 11 respectively, including generating the hash to represent the respective code block (see Davis et al., [column 6, lines 11-14] for generating a fingerprint to represent a particular code fragments, wherein each fingerprint as disclosed can be interpreted as equivalent to a hash as recited).
However, Davis et al. as modified by Banerjee et al. does not explicitly teach:
“wherein generating the hash to represent the respective code block comprises generating the hash using a cryptographic hash function”.
On the other hand, Topan et al. teaches a feature of using a cryptographic hash function to generate a hash as equivalently recited as follows:
“wherein generating the hash to represent the respective code block comprises generating the hash using a cryptographic hash function” (see Topan et al., [0036] wherein hash functions such as message digest (MD) or secure hashing (SHA) are examples of cryptographic hash functions).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Topan et al.'s teaching to Davis et al.’s system by implementing a feature of using a cryptographic hash function to generate a hash/fingerprint to represent a code fragment.  Ordinarily skilled artisan would have been motivated to do so to provide Davis et al.’s system with an alternative and effective way to generate a hash or fingerprint to represent a code fragment.  In addition, both of the references (Davis et al. and Topan et al.) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as, extracting code blocks from a target file/object, calculating a hash for each code block, and comparing objects/files by comparing a set of hashes associated with them.  This close relation between both of the references highly suggests an expectation of success when combined.

As to claims 9 and 19, these claims are rejected based on the same arguments as above to reject claims 8 and 18 respectively and are similarly rejected including the following:
David et al. as modified by Banerjee et al. and Topan et al. teaches:
“wherein the hash generated using the cryptographic hash function comprise a 256-bit hash” (see Davis et al., [column 6, lines 11-14] for generating a fingerprint (i.e., a hash) to represent a particular code fragments using a fuzzy hash function; see Topan et al., [0036] for using secure hashing (e.g., including a commonly used SHA-256) to generate a hash wherein a hash generated by using SHA-256 is a 256-bit hash).

Claims 7 and 17 (effective filing date 10/22/2020) are rejected under 35 U.S.C. 103 as being unpatentable over Davis et al. (U.S. Patent No. 10,484,419, Patent date 11/19/2019), in view of Banerjee et al. (U.S. Publication No. 2019/0179937, Publication date 06/13/2019), and further in view of Yang et al. (U.S. Publication No. 2015/0363198, Publication date 12/17/2015).	

As to claims 7 and 17, Davis et al. as modified by Banerjee et al. teaches all limitations as recited in claims 1 and 11 respectively.
However, Davis et al. as modified by Banerjee et al. does not explicitly teach a feature of disassembling as equivalently recited as follows:
“for each file of the plurality of files, disassembling, by the data processing hardware, the respective file from machine-executable code to assembly language source code”.
On the other hand, Yang et al. teaches a feature of disassembling as equivalently recited as follows:
“for each file of the plurality of files, disassembling, by the data processing hardware, the respective file from machine-executable code to assembly language source code” (see Yang et al., [0018]-[0019] for the disassembler generating assembly language code from machine-executable code for target modules).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Yang et al.'s teaching to Davis et al.’s system by implementing a feature of disassembling.  Ordinarily skilled artisan would have been motivated to do so to provide Davis et al.’s system with an effective way to generate assembly language source code from machine-executable code of software modules for further analyses as suggested by Yang et al. (see [0019]).  In addition, both of the references (Davis et al. and Yang et al.) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as, managing software modules in a system.  This close relation between both of the references highly suggests an expectation of success when combined.
















Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUONG THAO CAO whose telephone number is (571)272-2735. The examiner can normally be reached Monday - Friday: 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571-272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Phuong Thao Cao/Primary Examiner, Art Unit 2164