Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Herein, “the previous office action”, refers to the non-final rejection of March 28, 2021.
Priority
The application claims priority to U.S. Provisional Application Serial Number 62/402,873 entitled "EFFICIENT CLUSTERING OF NOISY POLYNUCLEOTIDE SEQUENCE READS," filed on September 30. This examination is conducted based on the priority date of September 30, 2016

Amendments Received
Amendment to the claims were received and entered on June 28, 2022.

Election/Restrictions
Newly submitted claim 43 directed to an invention that is independent or distinct from the invention originally claimed for the following reasons: claim 43 is directed to  a system of related products and related processes to the  system presented in  claim 14 (MPEP § 806.05(j)). The difference lies below:
Claim 43 and claim 14 do not overlap in scope: the system of claim 14 includes a sequencer while the system of claim 43 does not; the system of claim 43 has four “means” (functions) while the system of claim 14 has only one clustering module with two “grouping” steps (functions). While the first “means” in claim 43 performs clustering (grouping) functions, the algorithms implemented is different from those implemented in the clustering module in the claim 14. In claim 14 LSH-signature of DNA reads are used in order for clustering, but in claim 43, this order is reversed in the first “means” which is equivalent to the clustering/grouping function.  
These two systems are not obvious variants. 
The inventions as claimed are either not capable of use together or can have a materially different design, mode of operation, function, or effect.

Hence, the systems presented in claims 43 and 14 are related but distinct products/processes. They are not useable together. A restriction to the newly presented claim 43 is necessary.
Since applicant has received an action on the merits for the originally presented invention, this invention has been constructively elected by original presentation for prosecution on the merits.  Accordingly, claim 43 withdrawn from consideration as being directed to a non-elected invention.  See 37 CFR 1.142(b) and MPEP § 821.03.

Status of the Claims
Added: claims 34-43
Amended: claims 14-16, 18, 26, and 31-33.
Cancelled: claims 17, 19-25, 29, and 30.
Restricted: 19-33, 43
Examined herein: claims 14-16, 18, and 34-42.

Withdrawn Objections
The objections to claims 15, 16, and 18 are withdrawn in view of Applicant's amendments. 
Withdrawn Rejections
The rejections to claims 17under 35 USC§ 101 and under 35 USC§ 103 are moot in view of Applicant's cancellation to claim 17.
The rejection of claims 14 under 35 USC § 103 over Rasheed and Bringer is hereby withdrawn in view of Applicant's amendments of claim 14. Specifically Applicant  adds an new element "a polynucleotide sequencer configured to generate a plurality of DNA reads …" to the system of claim 14, which is not taught by Rasheed or Bringer.

Claim Rejections - 35 USC§ 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 14-16, 18, and 34-42 are rejected under 35 USC§ 101 because the claimed inventions are directed to an abstract idea without significantly more. This rejection is maintained from the previous Office action. Minor revisions have been made to the rationale to address the newly-presented claim limitations.

The instant rejection reflects the Guidance published in the Federal Register notice titled 2019 Revised Patent Subject Matter Eligibility Guide lines (Vol. 84, No. 4, Monday January 7, 2019 at 50) and the October 2019 Update d Subject Matter Eligibility Guidance (hereinafter both referred to as the "Guidance"), as outlined in the MPEP at 2106.04. 

Framework with which to Evaluate Subject Matter Eligibility: 
Step 1: Are the claims directed to a process, machine, manufacture, or composition of matter; 
Step 2A, Prong One: Do the claims recite a judicially recognized exception, i.e. a law of nature, a natural phenomenon, or an abstract idea;  
Step 2A, Prong Two: If the claims recite a judicial exception under Prong One, then is the judicial exception integrated into a practical application (Prong Two); and 
Step 2B: If the claims do not integrate the judicial exception, do the claims provide an inventive concept.

Framework Analysis as Pertains to the Instant Claims:
With respect to Step 1: yes, the claims are directed to a (sequencing plus computing) system for clustering sequence reads [Step 1: YES; See MPEP § 2106.03]. 
With respect to Step 2A, Prong One, the claims recite abstract ideas. The MPEP at 2106.04(a)(2) further explains that abstract ideas are defined as: 
•	mathematical concepts (mathematical formulas or equations, mathematical relationships and mathematical calculations);
•	certain methods of organizing human activity (fundamental economic practices or principles, managing personal behavior or relationships or interactions between people); and/or
•	mental processes (procedures for observing, evaluating, analyzing/ judging and organizing information).

With respect to the instant claims, under the Step 2A, Prong One evaluation, the claims are found herein to recite abstract ideas that fall into the grouping of mental processes (in particular procedures for observing, analyzing and organizing information) and mathematical concepts (in particular mathematical relationships and formulas).
 
The claim steps to abstract ideas of mental processes and mathematical concepts as follows: 
Mathematical concepts recited in the claims include:
“to calculate an edit distance between a first read of the plurality of DNA reads and a second read of the plurality of DNA reads” (claim 15);
Steps of evaluating, analyzing or organizing information recited in the claims include:
“to divide the plurality of DNA reads into clusters” (claim 14);
“grouping DNA reads having a same hash as determined by randomized locality-sensitive hashing (LSH) into buckets” (claim 14);
“grouping DNA reads in a same bucket into clusters based at least in part on similarity of signatures of the DNA reads that deterministically embed edit-distance space into Hamming space” (claim 14);
“determines the randomized LSH” (claim 16); 
“split a one of the plurality of DNA reads into sub-reads” (claim 18);
“find k-grams for the sub-reads of the plurality of DNA reads” (claim 18);
“encode the k-grams as bit strings” (claim 18);
“concatenate the bit strings into signatures” (claim 18);
	“determines the randomized LSH based at least in part on nucleotides adjacent to an occurrence of a randomly selected anchor string within a DNA read” (claim 36);
	“assigns two DNA reads in the same bucket to a same cluster based at least in part on a difference in the Hamming space between the two DNA reads being less than a threshold distance” (claim 37);

Hence, the claims explicitly recite numerous elements that, individually and in combination, constitute abstract ideas. The claims must therefore be examined further to determine whether they integrate that abstract idea into a practical application (MPEP 2106.04(d)). (Step 2A Prong One: Yes).

Because the claims do recite judicial exceptions, direction under Step 2A, Prong Two, provides that the claims must be examined further to determine whether they integrate the abstract ideas into a practical application (MPEP 2106.04(d). A claim can be said to integrate a judicial exception into a practical application when it applies, relies on, or uses the judicial exception in a manner that imposes a meaningful limit on the judicial exception. This is performed by analyzing the additional elements of the claim to determine if the abstract idea is integrated into a practical application (MPEP 2106.04(d).I.; MPEP 2106.0S(a-h)). If the claim contains no additional elements beyond the abstract idea, the claim is said to fail to integrate the abstract idea into a practical application (MPEP 2106.04(d).III).

Claims 14, 35, and 38-39 recite additional elements that are not abstract ideas: 
“A system for … ” (claim 14); 
“a polynucleotide sequencer … ” (claim 14); 
“at least one processing unit” (claim 14); 
“a memory in communication with the processing unit” (claim 14); 
“a device interface” (claim 35);
“one processing unit comprises a central processing unit (CPU) with Same Instruction Multiple Data (SIMD) or Single Program Multiple Data (SPMD) architecture” (claim 38);
“the at least one processing unit comprises a multicore processing system” (claim 39);
“a single core of the multicore processing system” (claim 39);

Except the “a polynucleotide sequencer … ” element claimed in claim 14, all the other elements direct to mere instructions to apply the abstract idea using generic computers, and therefore the claims do not integrate that abstract idea into a practical application (see MPEP 2106.04(d) § I; and MPEP 2106.05(f)). 
The claimed steps “a polynucleotide sequencer … ” mentioned above from claim 14 is an insignificant extra-solution activities because they are (1) well known and (2) necessary data gathering, which match two of the three criteria outline in MPEP 2106.05(g).  Hence, they are insignificant extra-solution activities that do not integrate the judicial exceptions into a practical application. (Step 2A Prong Two: No).

As such, the claims are lastly evaluated using the Step 2B analysis, wherein it is determined that because the claims recite abstract ideas which are not integrated into a practical application, the claims also lack a specific inventive concept. Applicant is reminded that the judicial exception alone cannot provide the inventive concept or the practical application and that the identification of whether the additional elements amount to such an inventive concept requires considering the additional elements individually and in combination to determine if they provide significantly more than the judicial exception. (MPEP 2106.05.A i-vi).
None of the dependent claims recite any additional non-abstract elements; they are all directed to further aspects of the information being analyzed, the manner in which that analysis is performed, or the mathematical operations performed on the information. 
Because the claims recite an abstract idea, and do not integrate that abstract idea into a practical application, the claims are directed to that abstract idea. Claims that are directed to abstract ideas must be examined further to determine whether the additional elements besides the abstract idea render the claims significantly more than the abstract idea. Claims that are directed to abstract ideas and that raise a concern of preemption of those abstract ideas must be examined to determine what elements, if any, they recite besides the abstract idea, and whether these additional elements constitute inventive concepts that are sufficient to render the claims significantly more than the abstract idea (MPEP 2106.05). 
As explained above, the mere instructions to implement the abstract idea using a computer are, when considered individually, insufficient to constitute an inventive concept that would render the claims significantly more than an abstract idea (see MPEP 2106.05(f)). 
As explained above, the well-known generic step of data-gathering constitutes insignificant extra-solution activity, and when considered individually, is insufficient to constitute inventive concepts that would render the claims significantly more than an abstract idea (see MPEP 2106.05(g)). (Step 2B: No).

When the claims are considered as a whole, they do not integrate the abstract idea into a practical application; they do not confine the use of the abstract idea to a particular technology; they do not solve a problem rooted in or arising from the use of a particular technology; they do not improve a technology by allowing the technology to perform a function that it previously was not capable of performing; and they do not provide any limitations beyond generally linking the use of the abstract idea to a broad technological environment (i.e. computerized analysis of biological data). See M PEP 2106.05(a) and 2106.05(h).
For these reasons, the claims, when the limitations are considered individually and as a whole, are directed to an abstract idea and lack an inventive concept. Hence, the claimed invention does not constitute significantly more than the abstract idea, so the claims are rejected under 35 USC § 101 as being directed to non-statutory subject matter.

Response to Arguments - Rejections Under 35 USC§ 101
In the reply filed 28 June 2022, Applicant argued “Thus, Applicant’s claimed invention both integrates any judicial exception into a practical application and improves the function of a system that generates clusters of DNA reads generated by a polynucleotide sequencer. This represents an improvement (demonstrated by evidence in the record) using computer-implemented rules” (Reply, para 2, line 1-4, page 9). The argument is not persuasive as  the improvement to reads  clustering is just better data processing. It does not improve the sequencer or the computer in the system of claim 14 or any technical field. Clustering sequence reads is part of the abstract idea, the additional element do not integrate the abstract idea into a practical application. The rejection under U.S.C. §101 maintains.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 14-16, 36-38, and 40-42 are rejected under 35 U.S.C. 103 as being unpatentable over Berlin: (“Assembling large genomes with single-molecule sequencing and locality-sensitive hashing”. Nat Biotechnol 33, 623–630 (2015) ), and further in view of Rasheed (“Efficient clustering of metagenomic sequences using locality sensitive hashing”, Proceedings of the 2012 SIAM International Conference on Data Mining (SDM), 2012), and Ostrovsky: (“Low Distortion Embeddings for Edit Distance”, Journal of the ACM, Volume 54Issue 5, October 2007 pp 23–es https://doi.org/10.1145/1284320.1284322

Claim 14 is directed to a system composed of sequencer and at least a computer configured to group the sequence reads into buckets by LSH and then into clusters based on similarity of DNA read signatures. With respect to claim 14, Berlin disclose the Pacific Biosciences’ SMRT sequencer for sequencing (para 2, line 4-6, col 1, page 623), and the computerized MHAP system to group sequence reads into buckets (para 3, line 6-12, col 1, page 631). However, Berlin’s MHAP system use probabilistic, locality-sensitive hashing (Abstract), other than randomized, locality-sensitive hashing to group DNA reads into buckets. Pursuing DNA sequence clustering like Berlin, Rasheed discloses a computerized MC-LSH (Metagenomic Clustering using LSH).” (page 1025, col 1, section “Method”) that utilize randomized locality-sensitive hashing (col 2, para 3, page 1023) and Hamming distance to cluster sequence reads (para 1-3, col 2, page 1025). Computationally, calculating the Hamming distance is an expensive calculation. Berlin offered no help because he uses the Jaccard similarity (which comes from the Hamming distance, para 1, col 1, page 624) to estimate similarity between two k-mer sets. Aimed to offer a low-distortion method that calculating the Hamming distance based on the Edit distance (which is not expensive to calculate by computer), Ostrovsky  teaches embedding the edit distance space into hamming distance space (“Abstract” section, page 23:1). Ostrovsky’s “fingerprint” (para 2, page 23:3) reads on the “signatures” in the claim limitation, and Ostrovsky’s signatures can be DNA sequences as Ostrovsky’s method applies to any two strings over a finite character alphabet (para 2 under “Introduction”, page 23:2),

With respect to claim 15, Berlin and Rasheed are silent on calculating the edit distance. Ostrovsky  teaches calculating the edit distance between strings (Section “3. The Embedding” para 1-7, page 23:7 to page 23:8 ). Ostrovsky’s string reads on the sequence read in the claim limitation.

With respect to claim 16, Rasheed discloses a hashing algorithm utilizing the LSH with w-mer (“we present a new, scalable metagenomic sequence clustering algorithm (MC-LSH) that utilizes an efficient locality sensitive based hashing function to approximate the pairwise sequence operations. The basic LSH-function was enriched to use gapless, subsequences of fixed length (w-mer), which lead to a reduction in the number of false positives and improvement of cluster accuracy.”  (page 1032, col 2, first 8 lines of “7 Conclusion”). A w-mer is short nucleotide from DNA reads that adjacent to an occurrence (Table 5, page 1033). 

With respect to claim 36, Rasheed disclose gapless subsequences of fixed length w-mer per index (starting at position 1 to n-w of sequence read with a length of n) for the hash function (para 3, col 2, page 1026).

With respect to claim 37, Rasheed disclose a LSH-based hamming distance filter to consider two reads to be equivalent (para 4, col 1, page 1025). The hamming distance filter reads on the threshold distance in the claim limitation.

With respect to claim 38, Rasheed teaches implementing the software on a workstation that featuring the Intel-i5 processor (para -1, col 1, pg 1028), An i5 processor inherently has SIMD architecture. All of Intel's Core processors have SIMD instructions.

With respect to claim 40, Rasheed teaches the fixed length w-mer for the hash function (para 3, col 2, page 1025), which the w become the hash length of the LSH. Rasheed tested w=1, 3, 5, 7 in various testing for LSH parameters (Table 1, page 1028; Table 2, page 1029; Table 3, page 1030; Table 4, page 1031).  Berlin teaches k=16 as the largest value that can be effectively hashed into a 32-bit fingerprint while providing good sensitivity. Berlin also teaches that larger k-mer values would significantly degrade sensitivity, while slightly larger values would double memory usage without providing significant improvement. Combined Rasheed’s and Berlin’s teaching covers the range into which falls k=10 in the claim limitation. 

Claim 41 is equivalent to repeat the operation of claim 14 for 250 times. Simply repeating a prior art step multiple times is insufficient to patentably distinguish an invention from the prior art. Claim 41 us hence rejected similarly as discussed above regarding claim 14.

Claim 42 is not weighted as the claim simply expresses the intended result of a process step without functional description on how to achieve the intended result. Claim 42 is just a description of the intended clustering efficiency of the method, so they don't have any patentable weight.   

An invention would have been obvious to one of ordinary skill in the art if some motivation would have led that person to combine and modify prior art teachings to arrive at the claimed invention. Prior to the time of invention, said person would have been motivated to combine Berlin’s DNA reads grouping (bucketing) and Rasheed’s method of sequence clustering, and modify the hamming distance used by both Berlin (para 1 line 10, col 1, page 624) and Rasheed (para 1 line 1-3, col 2, page 1025), with Ostrovsky’s teaching on embedding the edit distance space into the hamming distance space, because reads bucketing help sequence clustering when dealing with large genome, and . because Ostrovsky’s tool for embedding the edit distance space into the hamming distance space, will make the comparison of similarity between two strings more efficient, and Berlin’s reads grouping combined with  Rasheed’s sequence reads clustering method can benefit from it, as they all succeeded, we can reasonably expect the success.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Berlin, Rasheed  and Ostrovsky, as applied to claim 14 above, and further in view of Kim ("Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings," 2016 IEEE 32nd International Conference on Data Engineering (ICDE), 2016, pp. 169-180, doi: 10.1109/ICDE.2016.7498238.  Date of Conference: 16-20 May 2016).

With respect to claim 18, Rasheed disclose gapless subsequences of fixed length w-mer (which reads on the signature of DNA reads as discussed regarding claims 14 and 16), but Rasheed is silent on encoding bit strings for signatures out of k-grams. Kim disclose generating signatures by combining q-grams (“we propose a flexible way to generate variable-length signatures using a fixed-length q-gram index. The proposed technique groups a few q-grams into a variable-length signature, and generates candidate positions for the variable-length signature using the inverted lists of the q-grams. (section “Abstract”). Further, Kim point out the q-grams are helped by bit vectors (“It generates candidate positions using inverted lists of non-overlapping q-grams with the help of bit vectors”. page 171, col 2, line 3-5 in para 2), and a bit vector is a bit string.

An invention would have been obvious to one of ordinary skill in the art if some motivation would have led that person to modify prior art teachings to arrive at the claimed invention. Prior to the time of invention, said person would have been motivated to modify combined Berlin’s, Rasheed’s and Ostrovsky’s teaching on embedding edit-distance space into Hamming space, with Kim’s teaching in sequence mapping using digital signature, and expected to be successful. Because Kim’s method in mapping the (similar) reads by signatures can enhance Berlin’s bucketing method and Rasheed’s sequence reads clustering method (which need to group reads) and Ostrovsky’s string similarity comparison method, and they all succeed.

Claims 34-35, and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Berlin, Rasheed  and Ostrovsky, as applied to claim 14 above, and further in view of Patro: (“Data-dependent bucketing improves reference-free compression of sequencing reads”, Bioinformatics, 31(17), 2015, 2770–2777).

With respect to claim 34,  Berlin, Rasheed and Ostrovsky are all silent on the number of sequence reads their program handed. Patro teaches compressing the DNA sequences with datasets that over 200,000 reads (Table 1, page 2774).

With respect to claim 35,  Patro teaches receiving a plurality of data reads (Table 1, page 2774) with configured commandlines (section “3. Source of datasets” in “Supplementary Material For: Data-dependent Bucketing Improves Reference-free Compression of Sequencing Reads”, page 3 of 4).

With respect to claim 39,  Patro teaches the Mince software system that bucketing the data (Section “Local bucketing”, col 2, page 2271 to col 1, page 2272) and sub-bucketing the data (para 3-5, Section “Sub-bucketing and bucket ordering”, col 2, page 2272). Patro further teaches that multi-core compression (a process also require clustering of DNA reads) is significantly more practical (para 4 line 7-9, col 1, page 2276).

An invention would have been obvious to one of ordinary skill in the art if some motivation would have led that person to modify prior art teachings to arrive at the claimed invention. Prior to the time of invention, said person would have been motivated to modify combined Berlin’s, Rasheed’s and Ostrovsky’s teaching on clustering sequence reads, with Patro’s teaching in handing large amount of sequence reads using parallel processing, and expected to be successful. Because Patro’s parallelism with multi-core computing, makes the program faster in handing large amount of sequence reads.


Conclusion
No claims are allowed.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GUOZHEN LIU whose telephone number is (571)272-0224. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz R Skowronek can be reached on (571)272-9047. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Soren Harward/Primary Examiner, Art Unit 1631                                                                                                                                                                                                        
GUOZHEN . LIU
Examiner
Art Unit 1631