DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The following is a non-final, first Office action on the merits. 
Claims 1-20 are pending.

Specification
The specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Drawings
The Drawings filed on 7 November 2019 have been acknowledged. 

Information Disclosure Statement
Information disclosure statement (IDS) was submitted on 11/7/2019. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication, US 20210202040, to Maucicio Antonio Chalita Williams et al, herein after "Williams”, in view of U.S. Patent Application Publication, US 20190318807, to Niamh B. O’Hara et al, herein after "O’Hara”.

Regarding claim 1, Williams teaches a method (Williams, ¶ [0059], discloses the present invention relates to a method for identifying and classifying microbial species in a sample and a system for identifying and classifying microbial species in a sample, using an exact k-mer matching method and bacterial core genes), comprising: providing a database comprising k-mers of one or more organisms classified to a taxonomy (Williams, ¶ [0002], discloses taxonomic classification of microbes contained in a given sample could provide much insight into roles of the microbes in environments. Analysis of databases updated with new genomes publicized annually allows more ; dividing the database into two or more groups of k-mers, wherein each of the groups comprises a unique set of nodes of the taxonomy (At least, Williams, FIG. 2, discloses information in a k-mer database grouped according to different species. Further, at least, Williams, FIG. 4, discloses different species in a hierarchical node formation); wherein all k-mers of a given node reside in only one of the groups (Williams, FIG. 6, discloses multiple k-mer classification nodes. Williams, ¶ [0087], discloses teaches the distinct k-mer refers to a k-mer sequence that is present as one or more copies including repeating k-mers and unique k-mers, but is counted as one copy. In Table 1, thus, the number of distinct k-mers is a sum of the number of the unique k-mers and the number of single copies selected from repeating k-mers. The total k-mer means a sum of all single k-mers in bacterial core genes of a reference microbe population). 
	Williams teaches the limitations as identified above. Further, Williams discloses the creation of the sample k-mer dataset can be performed using a k-mer extractor.
Williams does not explicitly teach: removing k-mers common to two or more of the groups, thereby forming two or more modified groups, each of the modified groups containing a unique set of k-mers; wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms.  
removing k-mers common to two or more of the groups, thereby forming two or more modified groups (O’Hara, ¶ [0005], discloses determining, by the control unit, a set of k-mers of one or more genomic DNA regions from the one or more microorganism populations; comparing, by the control unit, the set of k-mers to a reference database; using an ultra-fast filtering process to remove sequenced reads that do not map unambiguously to one and only one organism without loss of identification performance or accuracy. O’Hara, ¶ [0023], discloses by “catalog” or “database” is meant the structured container or set of elements, in which elements can be, for example, reference sequences, k-mers, etc., that can be inserted, removed and searched automatically thanks to prebuilt functions (e.g., insertion/removal function or query function) available to the user. Further, O’Hara, ¶ [0151], discloses once the program has processed all reference sequences, it reads all k-mers inserted (Step 3.2) and remove all k-mers from the database that do have multiplicity count equal to 1. For each reference sequence R, the program removes each k-mers from R that is present in database more than once. Finally, in step 3.4, the program saves and stores in disk the remaining k-mers in the database index for each taxa), each of the modified groups containing a unique set of k-mers (O’Hara, ¶ [0150], discloses the last steps produce a database containing taxa-specific and taxa-unique k-mers given the list of reference sequences provided that it is stored and saved in disk that can be later on accessed and loaded directly in memory); wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms (O’Hara, ¶ [0151], discloses  the .  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Williams (disclosing identifying and classifying microorganisms included in a sample by using an exact k-mer matching algorithm) to include the teachings of O’Hara (disclosing next generation technologies and analysis using a k-mer based approach which is depth-informed to classify microorganism) and arrive at functions modifying k-mers. One of ordinary skill in the art would have been motivated to make this combination to improve the 

Regarding claim 2, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and O’Hara further teaches the taxonomy is a self-consistent taxonomy which is independent of metadata associated with the k-mers (O’Hara, ¶ [0171], discloses the structure of the network denotes the assumption that each node in the network is conditionally independent of its non-descendants given its parents. To describe a probability distribution satisfying these assumptions, each node in the network is associated with a conditional probability table, which specifies the distribution over any given possible assignment of values to its parents. In this case a Bayesian classifier is a Bayesian network applied to a classification task of calculating the probability of each nucleotide provided by any sequencing system. At each decision point the Bayesian classifier can be combined with a version of shortest path graph algorithm such as Dijkstra's or Floyd's).  

Regarding claim 3, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches the taxonomy is based on calculated genetic distances (Williams, ¶ [0118], discloses if the k-mers exactly match the k-mers of the database, the unique IDs of the k-mers are listed for the genetic information (reads in metagenome data) of the input sample. For example, base .  

Regarding claim 4, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches the genetic distances are genome-genome distances calculated using the MinHash algorithm (Williams, ¶ [0099], discloses whenever a new k-mer occurs, a new space is allocated to the hash table and a unique numerical ID is stored).  

Regarding claim 5, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and O’Hara further teaches the modified groups are stored on different computer nodes when used for said computer queries and/or for taxonomic classifications (O’Hara, ¶ [0134], discloses the data owner sets up one or several servers that are potentially untrusted and uses them to store taxonomic information (e.g., the database of taxa-specific and taxa-unique k-mers). However, the data owner assumes that these servers are indeed untrusted and thus, using a set of cryptographic techniques and a set of secret parameters and keys, he/she creates and stores in these servers an encrypted database of taxa-specific and taxa-unique k-mers in such a manner that only he/she and any authorized user with knowledge of the cryptographic techniques used for encryption and the set of secret parameters and keys can decipher the stored information. This encrypted database of .  

Regarding claim 6, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and O’Hara further teaches the removed k-mers are stored on a computer node separate from the modified groups (O’Hara, ¶ [0132], discloses the database creation and abundance estimation algorithm are described in FIGS. 3 to 6. The database creation may include a filtering process by removing sequence data from over-represented taxa in order to alleviate the computational processing. O’Hara, ¶ [0135], discloses this method may include analyzing metagenomics sequence data collected from the environment or from patient samples and the database may be tailored to include sequences of organisms that are typically found in said source (e.g. species specific to certain environments or body locations for patient samples) as well as all closely related organisms).  

Regarding claim 7, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and O’Hara further teaches the removed k-mers are used to confirm identification of an organism found in the queries and/or the classifications (O’Hara, ¶ [0173], discloses this data structure exploits hash functions, and the supported operations for example the insertion or removal of elements can be quickly performed. The current system may use and implement a Bloom filter for several specific operations. For example, a Bloom filter may be used to test whether or not an k-mer is present in the set of k-mers of a given taxa, or even to test whether this k-mers is present in the set of taxa-specific and taxa-unique k-mers of a given taxa).  

Regarding claim 8, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches the one or more organisms are microorganisms selected from the group consisting of bacteria, fungi, viruses, protozoans, parasites, and combinations thereof (Williams, ¶ [0026], discloses creating a k-mer dataset including one or more k-mers and comparing with the reference k-mer database of reference microbial core genes (bacterial core gene) to select a k-mer whose nucleotide sequence is exactly matched, from the reference k-mer database).  

Regarding claim 9, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches the sample is selected from the group consisting of environmental samples, medical samples, and food samples (Williams, ¶ [0071], discloses the test sample may contain at least one .  

Regarding to claim 10, Williams teaches a method (Williams, ¶ [0059], discloses the present invention relates to a method for identifying and classifying microbial species in a sample and a system for identifying and classifying microbial species in a sample, using an exact k-mer matching method and bacterial core genes), comprising: providing a database comprising k-mers of one or more organisms classified to a taxonomy (Williams, ¶ [0002], discloses taxonomic classification of microbes contained in a given sample could provide much insight into roles of the microbes in environments. Analysis of databases updated with new genomes publicized annually allows more accurate and specific classification. Williams, ¶ [0004], discloses with the increase of the number of publicly available genomes, the “exact k-mer matching” approach has become sufficiently reliable in recent years); assigning a taxonomic threshold level of the taxonomy (Williams, ¶ [0113], discloses the taxonomic profiling method or system for microbes according to the present invention may comprise the steps of (c) comparing the k-mers in the reference k-mer database with the k mers in the sample k-mer dataset according to an exact k-mer matching approach to select an exactly matched k-mers; and (d) using taxon information of the selected k-mers to identify and classify the bacterial species in the sample. Further, . 
Williams teaches the limitations as identified above. Further, Williams discloses the creation of the sample k-mer dataset can be performed using a k-mer extractor.
Williams does not explicitly teach: removing k-mers of the database that are classified to taxonomic levels above the threshold level, thereby forming a modified database having a size in bytes less than the database; wherein the modified database is capable of serving as a k-mer reference database for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms.  
However, O’Hara teaches:
removing k-mers of the database that are classified to taxonomic levels above the threshold level, thereby forming a modified database having a size in bytes less than the database (O’Hara, ¶ [0005], discloses determining, by the control unit, a set of k-mers of one or more genomic DNA regions from the one or more microorganism populations; comparing, by the control unit, the set of k-mers to a reference database; using an ultra-fast filtering process to remove sequenced reads that do not map unambiguously to one and only one organism without loss of ; wherein the modified database is capable of serving as a k-mer reference database for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms (O’Hara, ¶ [0015], discloses the new computational system outlined as described herein, “CLARICE” (from “clams” in Latin meaning “bright”, “clear”), uses an approach which includes i) a depth-informed analysis of sequence reads over regions of genomes unique to specific taxa (e.g., species, sub-species or strains) from an extensive set of reference sequences, ii) an accurate and ultra-fast technique to detect and analyze sequenced reads, iii) probabilistic models to predict the abundance estimation of each organism detected, and iv) a secure protocol to query and retrieve taxonomic information located in one or several outsourced database(s) containing .  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Williams (disclosing identifying and classifying microorganisms included in a sample by using an exact k-mer matching algorithm) to include the teachings of O’Hara (disclosing next generation technologies and analysis using a k-mer based approach which is depth-informed to classify microorganism) and arrive at functions for modifying k-mers. One of ordinary skill in the art would have been motivated to make this combination to improve the effectiveness of searching for and retrieving relevant data (see at least Williams, ¶ [0096] and/or O’Hara, ¶ [0069, 0144]). In addition, the references of Williams and O’Hara teach features that are directed to analogous arts and they are directed to the same field of endeavor related to bioinformatics.

Regarding claim 11, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches the taxonomic threshold level is selected from the group consisting of family, genus, species, sub-species, and strain (Williams, ¶ [0190], discloses using the K-mer analyzer KRAKEN program, the sample k-mer dataset was compared to the k-mer database of reference bacterial core genes. KRAKEN, which is a command-line application program that performs exact match comparison between a database and an input data set, classifies all input reads using a taxonomic tree and the lowest common ancestor (LCA) technique. Through the LCA technique, KRAKEN selects a higher taxonomic rank for a read if the read shows an exact match with a different species).  

Regarding claim 12, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches the taxonomic threshold level is selected by a machine using artificial intelligence (Williams, ¶ [0119], discloses KRAKEN is a command-line application program that performs an exact match comparison of a database and an input data set and classifies all input reads using a taxonomic tree and the lowest common ancestor (LCA) technique. If one read shows an exact match between different species, KRAKEN selects a higher taxonomic rank for the read through the LCA technique. Williams, ¶ [0100], discloses Kraken is an open-source k-mer classifier and is compatible with the JELLYFISH built-in database).  

Regarding claim 13, Williams teaches a method (Williams, ¶ [0059], discloses the present invention relates to a method for identifying and classifying microbial species in a sample and a system for identifying and classifying microbial species in a sample, using an exact k-mer matching method and bacterial core genes), comprising: providing a database comprising k-mers of one or more organisms classified to a taxonomy (Williams, ¶ [0002], discloses taxonomic classification of microbes contained in a given sample could provide much insight into roles of the microbes in environments. Analysis of databases updated with new genomes publicized annually allows more accurate and specific classification. Williams, ¶ [0004], discloses with the increase of the number of publicly available genomes, the “exact k-mer matching” approach has become sufficiently reliable in recent years); assigning a taxonomic threshold level of the taxonomy (Williams, ¶ [0113], discloses the taxonomic profiling method or system for microbes according to the present invention may comprise the steps of (c) comparing the k-mers in the reference k-mer database with the k mers in the sample k-mer dataset according to an exact k-mer matching approach to select an exactly matched k-mers; and (d) using taxon information of the selected k-mers to identify and classify the bacterial species in the sample. Further, Williams, ¶ [0116], discloses by using the “exact k-mer match approach,” “exact k-mer alignment approach”, or “k-mer perfect match” and base sequences of bacterial core genes in combination, microbial classification can be performed faster and more accurately without bias. In this regard, among all of the k-mers generated from the input data, a search is made for k-mers that exactly match the k-mers in the database and indexes containing the taxon information of the k-mers can be listed).P201705344US01MRI01.1808US1Page 44 / 48  
Williams teaches the limitations as identified above. Further, Williams discloses the creation of the sample k-mer dataset can be performed using a k-mer extractor.
	Williams does not explicitly teach: removing k-mers of the database that are classified to taxonomic levels above the threshold level, thereby forming a modified database; dividing the modified database into two or more groups of k-mers, wherein each of the two or more groups comprises a unique set of nodes of the taxonomy and all k-mers of a given node reside in one of the groups; removing k-mers common to the two or more groups, thereby forming two or more modified groups of k-mers; wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms.
However, O’Hara teaches: removing k-mers of the database that are classified to taxonomic levels above the threshold level, thereby forming a modified database (O’Hara, ¶ [0005], discloses determining, by the control unit, a set of k-mers of one or more genomic DNA regions from the one or more microorganism populations; comparing, by the control unit, the set of k-mers to a reference database; using an ultra-fast filtering process to remove sequenced reads that do not map unambiguously to one and only one organism without loss of identification performance or accuracy. O’Hara, ¶ [0023], discloses by “catalog” or “database” is meant the structured container or set of elements, in which elements can be, for example, reference sequences, k-mers, etc., that can be inserted, removed and searched automatically thanks to prebuilt functions (e.g., insertion/removal function or query function) available to the user); dividing the modified database into two or more groups of k-mers, wherein each of the two or more groups comprises a unique set of nodes of the taxonomy and all k-mers of a given node reside in one of the groups (O’Hara, FIG. 3, discloses the steps associated with the creation of specifies ; removing k-mers common to the two or more groups, thereby forming two or more modified groups of k-mers (O’Hara, ¶ [0151], discloses once the program has processed all reference sequences, it reads all k-mers inserted (Step 3.2) and remove all k-mers from the database that do have multiplicity count equal to 1. For each reference sequence R, the program removes each k-mers from R that is present in database more than once. Finally, in step 3.4, the program saves and stores in disk the remaining k-mers in the database index for each taxa. O’Hara, ¶ [0150], discloses the last steps produce a database containing taxa-specific and taxa-unique k-mers given the list of reference sequences provided that it is stored and saved in disk that can be later on accessed and loaded directly in memory); wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms (O’Hara, ¶ [0151], discloses the program checks whether the database exists for the provided parameters (k-mer length and list of reference sequences). If the database already exists then the program can terminate. Otherwise, the program will create it and in order to do so, it starts by creating an empty database index (referred as “H” in the figure), which can typically be an hash-table or any other key-value storage. In the context of a key-value storage, the key is the k-mer (represented by its string value or its binary/numerical value) and the storage value may be a taxa identifier along with a “multiplicity count” (indicating how many taxa the k-mer in the key has been found). Then, the program reads all reference sequences and populates the database with k-mers as they are found in the sequences: For each reference sequence referred as “R”, .
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Williams (disclosing identifying and classifying microorganisms included in a sample by using an exact k-mer matching algorithm) to include the teachings of O’Hara (disclosing next generation technologies and analysis using a k-mer based approach which is depth-informed to classify microorganism) and arrive at functions for modifying k-mers. One of ordinary skill in the art would have been motivated to make this combination to improve the effectiveness of searching for and retrieving relevant data (see at least Williams, ¶ [0096] and/or O’Hara, ¶ [0069, 0144]). In addition, the references of Williams and O’Hara teach features that are directed to analogous arts and they are directed to the same field of endeavor related to bioinformatics.

Regarding claim 14, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and O’Hara further teaches the modified groups are used in parallel when performing a computer query and/or taxonomic classification (O’Hara, ¶ [0165], discloses the matching method of short length (n) sequences may continue in parallel with sequence information generation or collection. .  

Regarding claim 15, Williams teaches a computer program product, comprising a computer readable hardware storage device having a computer-readable program code stored therein, said program code configured to be executed by a processor of a computer system to implement a method (Williams, ¶ [0153], discloses the system includes at least one processor and one or more storage devices having stored computer-executable instructions. The instructions can be executed by one or more processors and receive a set of input data containing nucleotide sequences. The input sequence is compared to a k-mer database of reference bacterial core genes which is pre-built using a k-mer analyzer. Finally, the afore-mentioned k-mer analyzer can generate a taxonomic profile for the input data set) comprising: providing a database comprising k-mers of one or more organisms classified to a taxonomy (Williams, ¶ [0002], discloses taxonomic classification of ; dividing the database into two or more groups of k-mers, wherein each of the groups comprises a unique set of nodes of the taxonomy (At least, Williams, FIG. 2, discloses information in a k-mer database grouped according to different species. Further, at least, Williams, FIG. 4, discloses different species in a hierarchical node formation), wherein all k-mers of a given node reside in one of the groups (Williams, FIG. 6, discloses multiple k-mer classification nodes. Williams, ¶ [0087], discloses teaches the distinct k-mer refers to a k-mer sequence that is present as one or more copies including repeating k-mers and unique k-mers, but is counted as one copy. In Table 1, thus, the number of distinct k-mers is a sum of the number of the unique k-mers and the number of single copies selected from repeating k-mers. The total k-mer means a sum of all single k-mers in bacterial core genes of a reference microbe population). 	Williams teaches the limitations as identified above. Further, Williams discloses the creation of the sample k-mer dataset can be performed using a k-mer extractor.
Williams does not explicitly teach: removing k-mers common to two or more of the groups, thereby forming two or more modified groups, each of the modified groups containing a unique set of k-mers; wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms.  
However, O’Hara teaches: removing k-mers common to two or more of the groups, thereby forming two or more modified groups (O’Hara, ¶ [0005], discloses determining, by the control unit, a set of k-mers of one or more genomic DNA regions from the one or more microorganism populations; comparing, by the control unit, the set of k-mers to a reference database; using an ultra-fast filtering process to remove sequenced reads that do not map unambiguously to one and only one organism without loss of identification performance or accuracy. O’Hara, ¶ [0023], discloses by “catalog” or “database” is meant the structured container or set of elements, in which elements can be, for example, reference sequences, k-mers, etc., that can be inserted, removed and searched automatically thanks to prebuilt functions (e.g., insertion/removal function or query function) available to the user. Further, O’Hara, ¶ [0151], discloses once the program has processed all reference sequences, it reads all k-mers inserted (Step 3.2) and remove all k-mers from the database that do have multiplicity count equal to 1. For each reference sequence R, the program removes each k-mers from R that is present in database more than once. Finally, in step 3.4, the program saves and stores in disk the remaining k-mers in the database index for each taxa), each of the modified groups containing a unique set of k-mers (O’Hara, ¶ [0150], discloses the last steps produce a database containing taxa-specific and taxa-unique k-mers given the list of reference sequences provided that it is stored and saved in disk that can be later on accessed and loaded directly in memory); wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms (O’Hara, ¶ [0151], discloses the program checks whether the database exists for the provided parameters (k-mer length and list of reference sequences). If the database already exists then the program can terminate. Otherwise, the program will create it and in order to do so, it starts by creating an empty database index (referred as “H” in the figure), which can typically be an hash-table or any other key-value storage. In the context of a key-value storage, the key is the k-mer (represented by its string value or its binary/numerical value) and the storage value may be a taxa identifier along with a “multiplicity count” (indicating how many taxa the k-mer in the key has been found). Then, the program reads all reference sequences and populates the database with k-mers as they are found in the sequences: For each reference sequence referred as “R”, and for each k-mer referred as “w” existing in the reference sequences, the program checks (Step 3.1 in the figure) if w is already in the database. If it is, then the program reads what the taxa identifier found for w and if this taxa identifier is different from the identification of R then the program increments the multiplicity of w by 1, if w is not the database index the program creates a new value storage using w as the key and sets the taxa identifier to the identifier value from R and sets the multiplicity count to 1).  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Williams (disclosing identifying and classifying microorganisms included in a sample by using an exact k-mer matching algorithm) to include the teachings of O’Hara (disclosing next generation technologies and analysis using a k-mer based approach which is depth-informed to 

Regaring claim 16, Williams teaches a computer program product, comprising a computer readable hardware storage device having a computer-readable program code stored therein, said program code configured to be executed by a processor of a computer system (Williams, ¶ [0153], discloses the system includes at least one processor and one or more storage devices having stored computer-executable instructions. The instructions can be executed by one or more processors and receive a set of input data containing nucleotide sequences. The input sequence is compared to a k-mer database of reference bacterial core genes which is pre-built using a k-mer analyzer. Finally, the afore-mentioned k-mer analyzer can generate a taxonomic profile for the input data set) to implement a method comprising: providing a database comprising k-mers of one or more organisms classified to a taxonomy (Williams, ¶ [0002], discloses taxonomic classification of microbes contained in a given sample could provide much insight into roles of the microbes in environments. Analysis of databases updated with new genomes publicized annually allows more accurate and specific classification. Williams, ¶ [0004], discloses with the increase of the number of publicly available genomes, the “exact k-mer ; assigning a taxonomic threshold level of the taxonomy (Williams, ¶ [0113], discloses the taxonomic profiling method or system for microbes according to the present invention may comprise the steps of (c) comparing the k-mers in the reference k-mer database with the k mers in the sample k-mer dataset according to an exact k-mer matching approach to select an exactly matched k-mers; and (d) using taxon information of the selected k-mers to identify and classify the bacterial species in the sample. Further, Williams, ¶ [0116], discloses by using the “exact k-mer match approach,” “exact k-mer alignment approach”, or “k-mer perfect match” and base sequences of bacterial core genes in combination, microbial classification can be performed faster and more accurately without bias. In this regard, among all of the k-mers generated from the input data, a search is made for k-mers that exactly match the k-mers in the database and indexes containing the taxon information of the k-mers can be listed). 
Williams teaches the limitations as identified above. Further, Williams discloses the creation of the sample k-mer dataset can be performed using a k-mer extractor.
Williams does not explicitly teach: removing k-mers of the database that are classified to taxonomic levels above the threshold level, thereby forming a modified database having a size in bytes less than the database; wherein the modified database is capable of serving as a k-mer reference database for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms.  
However, O’Hara teaches:
removing k-mers of the database that are classified to taxonomic levels above the threshold level, thereby forming a modified database having a size in bytes less than the database (O’Hara, ¶ [0005], discloses determining, by the control unit, a set of k-mers of one or more genomic DNA regions from the one or more microorganism populations; comparing, by the control unit, the set of k-mers to a reference database; using an ultra-fast filtering process to remove sequenced reads that do not map unambiguously to one and only one organism without loss of identification performance or accuracy. O’Hara, ¶ [0023], discloses by “catalog” or “database” is meant the structured container or set of elements, in which elements can be, for example, reference sequences, k-mers, etc., that can be inserted, removed and searched automatically thanks to prebuilt functions (e.g., insertion/removal function or query function) available to the user. O’Hara, ¶ [0132], discloses the database creation may include a filtering process by removing sequence data from over-represented taxa in order to alleviate the computational processing. Further, O’Hara, ¶ [0151], discloses once the program has processed all reference sequences, it reads all k-mers inserted (Step 3.2) and remove all k-mers from the database that do have multiplicity count equal to 1. For each reference sequence R, the program removes each k-mers from R that is present in database more than once. Finally, in step 3.4, the program saves and stores in disk the remaining k-mers in the database index for each taxa); wherein the modified database is capable of serving as a k-mer reference database for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms (O’Hara, ¶ [0015], discloses the new computational system outlined as .  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Williams (disclosing identifying and classifying microorganisms included in a sample by using an exact k-mer matching algorithm) to include the teachings of O’Hara (disclosing next generation technologies and analysis using a k-mer based approach which is depth-informed to classify microorganism) and arrive at functions for modifying k-mers. One of ordinary skill in the art would have been motivated to make this combination to improve the effectiveness of searching for and retrieving relevant data (see at least Williams, ¶ 

Regarding claim 17, Williams teaches a system (Williams, ¶ [0007], discloses a system for identification and classification of microbes in a sample) comprising one or more computer processor circuits (Williams, ¶ [0045], discloses the system comprising a reference k-mer database of reference microbial core genes, and a processor equipped with a k-mer extractor and a k-mer analyzer) configured and arranged to: provide a database comprising k-mers of one or more organisms classified to a taxonomy (Williams, ¶ [0002], discloses taxonomic classification of microbes contained in a given sample could provide much insight into roles of the microbes in environments. Analysis of databases updated with new genomes publicized annually allows more accurate and specific classification. Williams, ¶ [0004], discloses with the increase of the number of publicly available genomes, the “exact k-mer matching” approach has become sufficiently reliable in recent years); divide the database into two or more groups of k-mers, wherein each of the groups comprises a unique set of nodes of the taxonomy (At least, Williams, FIG. 2, discloses information in a k-mer database grouped according to different species. Further, at least, Williams, FIG. 4, discloses different species in a hierarchical node formation), wherein all k-mers of a given node reside in one of the groups (Williams, FIG. 6, discloses multiple k-mer classification nodes. Williams, ¶ [0087], discloses teaches the distinct k-mer refers to a k-mer sequence that is present as one . 
Williams teaches the limitations as identified above. Further, Williams discloses the creation of the sample k-mer dataset can be performed using a k-mer extractor.
Williams does not explicitly teach: remove k-mers common to two or more of the groups, thereby forming two or more modified groups, each of the modified groups containing a unique set of k-mers; wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms.  
However, O’Hara teaches: remove k-mers common to two or more of the groups, thereby forming two or more modified groups (O’Hara, ¶ [0005], discloses determining, by the control unit, a set of k-mers of one or more genomic DNA regions from the one or more microorganism populations; comparing, by the control unit, the set of k-mers to a reference database; using an ultra-fast filtering process to remove sequenced reads that do not map unambiguously to one and only one organism without loss of identification performance or accuracy. O’Hara, ¶ [0023], discloses by “catalog” or “database” is meant the structured container or set of elements, in which elements can be, for example, reference sequences, k-mers, etc., that can be inserted, removed and searched automatically thanks to prebuilt functions (e.g., insertion/removal function , each of the modified groups containing a unique set of k-mers (O’Hara, ¶ [0150], discloses the last steps produce a database containing taxa-specific and taxa-unique k-mers given the list of reference sequences provided that it is stored and saved in disk that can be later on accessed and loaded directly in memory); wherein the modified groups are capable of serving as reference k-mers for computer queries and/or for taxonomic classifications of k-mers of a sample comprising taxonomically unclassified sequenced nucleic acids of one or more organisms (O’Hara, ¶ [0151], discloses  the program checks whether the database exists for the provided parameters (k-mer length and list of reference sequences). If the database already exists then the program can terminate. Otherwise, the program will create it and in order to do so, it starts by creating an empty database index (referred as “H” in the figure), which can typically be an hash-table or any other key-value storage. In the context of a key-value storage, the key is the k-mer (represented by its string value or its binary/numerical value) and the storage value may be a taxa identifier along with a “multiplicity count” (indicating how many taxa the k-mer in the key has been found). Then, the program reads all reference sequences and populates the database with k-mers as they are found in the sequences: For each reference sequence referred as “R”, and for each k-mer referred as “w” .  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Williams (disclosing identifying and classifying microorganisms included in a sample by using an exact k-mer matching algorithm) to include the teachings of O’Hara (disclosing next generation technologies and analysis using a k-mer based approach which is depth-informed to classify microorganism) and arrive at functions modifying k-mers. One of ordinary skill in the art would have been motivated to make this combination to improve the effectiveness of searching for and retrieving relevant data (see at least Williams, ¶ [0096] and/or O’Hara, ¶ [0069, 0144]). In addition, the references of Williams and O’Hara teach features that are directed to analogous arts and they are directed to the same field of endeavor related to bioinformatics.

Regarding claim 18, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and O’Hara further teaches the modified groups are located on a cloud platform of a computer network (O’Hara, ¶ [0071], discloses CLARICE can take advantage of a cloud server and outsource the database of taxa-specific and unique k-mers as well as all memory-intensive computations).  

Regarding claim 19, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches the system is configured and arranged to assign a taxonomic threshold level of the taxonomy and remove k-mers of the database that are classified to taxonomic levels above the threshold level (Williams, ¶ [0054-0055, 0060], discloses combining taxon information of the unique IDs corresponding to the taxonomic levels assigned to individual sequencing reads, and generating an entire unique ID list with collecting the unique IDs corresponding to the taxonomic levels obtained for the individual sequencing reads for entire sequencing reads included in the sample microbial genome … the method and the system for identifying and classifying the microbial species in a sample according to the present invention may comprise the steps of: providing (a) a sample k-mer dataset for a full genome of microbes in the sample, which is created by utilizing microbial genome information obtained from a sample, … (c) comparing the k-mers in the sample k-mer dataset (a) with the k-mers in the reference k-mer database (b) according to an exact k-mer matching method to select an exactly matched k-mers; and (d) identifying and classifying the microbial species in the sample using taxon information of the selected k-mers. Williams, ¶ [0119], discloses KRAKEN is a command-line application program that performs an exact match comparison of a database and an input data set and classifies all input reads using a taxonomic tree and the lowest common ancestor (LCA) technique. If one read shows an exact match between different species, KRAKEN selects a higher taxonomic rank for the read .  

Regarding claim 20, the modification of Williams and O’Hara teaches the claimed invention substantially as claimed, and Williams further teaches k-mers associated with mobile elements of genomes are removed from the database before the k-mers are classified to the taxonomy (Williams, ¶ [0149-0152], discloses the present invention provides a system of identifying and classifying a microbe in a sample, the system comprising: (a) a reference k-mer database of reference microbial core genes; and (b) a processor equipped with a k-mer extractor and a k-mer analyzer … wherein the reference k-mer database comprises at least one k-mer generated from DNA information of at least one reference microbial core gene, and the k-mer is assigned with microbial taxon information … wherein the k-mer extractor in the processor extracts at least one k-mer from metagenomic information obtained from the sample to generate k-mer database; and … wherein the k-mer analyzer in the processor selects a k-mer exactly identical in nucleic acid sequence information from the k-mers contained in the reference k-mer database of reference core genes with respect to the k-mer contained in a sample k-mer dataset, lists unique IDs accounting for taxon information of the selected k-mer, and identifies and classifies the microbe in the sample, based on the taxonomic information about the selected k-mer).



Conclusion
Examiner has cited particular columns and line and/or paragraph numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.

The examiner requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application. 

When responding to this Office action, applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or objections made. He or she must also show how the amendments avoid such references or objections. See 37 CFR 1.111(c).

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US PGPub 20210249102 (Hurwitz et al) discloses systems and methods for metagenomic analysis are provided. A method of metagenome sequence analysis of two or more samples can include (i) counting the abundance of each k-mer deconstructed from sequencing reads of nucleic acids in each sample, and (ii) using a vector space model to compute the genetic distance between each of the two or more samples according to the abundance of the k-mers. In some embodiments, counting includes (a) constructing a k-mer histogram containing the distribution of k-mers for each sample, and (b) dividing k-mers into partitions having approximately an equal number of k-mers based on the histogram, preparing an inverted index of the k-mers in each partition, and assigning a weight to each k-mer according to its abundance. Method of developing diagnostic and prognostic information using the methods of sequence analysis are also provided.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALICIA M ANTOINE whose telephone number is (571)431-0687.  The examiner can normally be reached on Mon - Fri: 9am - 3pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ALICIA M ANTOINE/Examiner, Art Unit 2162                                                                                                                                                                                                        10/23/2021