DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Applicant’s response filed 5/25/2022 has been entered and carefully considered, but is not completely persuasive.
	The IDS filed 3/8/2022 has been entered and considered.
	Claims 21-24, 26-30, 32-33, 35-37, 40-45 are under examination.  Claims 41-45 are newly added. All other claims have been canceled.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 21-24, 26-30, 32-33, 35-37, 40-45 remain rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea of mental steps, mathematic concepts, organizing human activity, or a natural law without significantly more. Applicant’s arguments will be addressed below. The claims have been heavily amended, but the basis for the rejection is the same.
The MPEP at MPEP 2106.03 sets forth four steps for identifying eligible subject matter at 2106.04.
With respect to step (1): yes the claims recite systems for performing a method, methods of genotyping a sample, and computer program products comprising instructions for the same methods.
With respect to step  (2A)(1) The claims recite an abstract idea of comparing sample genetic sequence data, reference genetic sequence data and genetic variation of the reference sequence data by aligning and determining and scoring paths, to “determine a presence of the first genetic structural variation within the genetic sample”. The claims also embrace a natural law, particularly dependent claims drawn to diagnosing disease, or determining a species.  The natural law is the naturally occurring correlation between naturally occurring polynucleotide sequences or sequence variations, and naturally occurring phenotypes, such as disease, or a species distinction.  "Claims directed to nothing more than abstract ideas (such as a mathematical formula or equation), natural phenomena, and laws of nature are not eligible for patent protection" (MPEP 2106.04).  
Claims 21, 36 and 40 are independent.  Each independent claim recites similar limitations: only claim 21 is represented here for clarity.
	Mathematical concepts recited in claim 21 include:
	“determining a plurality of scores corresponding to a respective plurality of alignments between the first sequence read and the graph data structure, the plurality of scores including a first score corresponding to a first alignment between the first sequence read and at least a portion of the graph data structure, the first score being determined based on a degree of overlap between the first sequence read and the first string and a degree of overlap between the first sequence read and the second string;” (mathematical concept of calculating a score, or multiple scores)
	Mental processes recited in claim 21 include:
“aligning at least some of a plurality of sequence reads to the graph data structure, representing the genome of the species, the at least some of the plurality of sequence reads including a first sequence read” (mental step of comparing sequence data, where the alignment includes a graph of differing paths possible for different genetic variations in the sequence data);
“determining, based on results of the aligning, whether the first sequence read aligns to the second path through the graph data structure representing the genome of the species;” (mental step of comparing total numbers of reads aligned to each path).
“upon determining that the first sequence read aligns to the second path in the graph data structure representing the genome of the species, identifying a presence of the first genetic structural variation within the genetic sample” (mental step of judging a result)
The law of nature embraced by the claims includes:
A correlation between naturally occurring reference genetic sequence information, (including naturally occurring variations of the reference sequence information, naturally occurring sequence information of the sample) and naturally occurring phenotype information- a genotype/ phenotype relationship. The naturally occurring genetic sequences and variants exist whether or not they are measured, and any correlation between a naturally occurring sequence and a naturally occurring phenotype also exists whether or not they are measured and analyzed.
	Hence, the claims explicitly recite elements that, individually and in combination, constitute abstract ideas.  
With respect to step 2A(2):  The claims must therefore be examined further to determine whether they integrate that abstract idea into a practical application (MPEP 2106.04(d).  The claimed additional elements are analyzed to determine if the abstract idea is integrated into a practical application (MPEP 2106.04(d).I.; MPEP 2106.05(a-h)). 
Claims 21, 36 and 40 each recite the additional element that is not an abstract idea: obtaining or accessing or storing “a graph data structure representing a genome of a species and genetic variation of the genome”; and “a plurality of sequence reads” which are data gathering elements.
Data gathering steps are not an abstract idea, they are extra-solution activity, as they collect the data needed to carry out the abstract idea.  Merely receiving previously determined data to a computer system, or to a computer-implemented method does not affect any of the actual data obtained.  Receiving data does not change any elements of the judicial exceptions: the step merely provides data upon which the judicial exception(s) act.  Data gathering does not impose any meaningful limitation on the abstract idea, or how the abstract idea is performed.  Data gathering steps are not sufficient to integrate an abstract idea into a practical application. (MPEP 2106.05(g).
	Claims 21, 36 and 41 each recite an additional element that is not an abstract idea: the graph data structure itself comprising paths, representing known genetic variant sequence information aligned with reference sequence data, having at least two paths at the first position.
	This data structure does not implement the judicial exception(s).  The graph data structure is not a computing structure, but can be represented by a drawing by pen and pencil, word processing, or simple drawing software.  (See Fig 1A-B and Fig 2 for a simple example). Similarly a data table can be constructed in simple row and column format as the “graph data structure” which provides the various paths, and sequence read counts of each alignment required to make the final determinations See Fig 3A-B). The inclusion of the “graph data structure” in the claim only generally links the judicial exception(s) to a highly generic technological environment: graphing sequence read alignments to encompass variant sequence information, known or unknown. (see MPEP 2106.05(h)).
	Claims 21, 36 and 40 each also recites the additional non-abstract element of certain computing elements: at least one processor, storage media, tangible non-transitory memory, and other general unstated systems elements such as input, output, display etc.
	The claims do not describe any specific computational steps by which the computing elements perform or carry out the judicial exception(s).  The claims require nothing more than a generic computer to perform the functions that constitute the judicial exception(s).  Hence, these are mere instructions to apply the judicial exception(s) using a general purpose computer, and therefore the claim does not recite integrate that abstract idea into a practical application. (see MPEP 2106.05(f)).
	The claims do not provide a particular additional element which practically apply a result of the methods performed by the claims.  The presence or absence of a variation is not further applied or integrated into a real world application or process.  To integrate a judicial exception into a practical application, the additional limitation must be specifically identified, and not merely instructions to apply the judicial exception.  The additional element must have more than a nominal or insignificant relationship to the identified judicial exception. (MPEP 2106.04(d)(2))
	Dependent claims 22-24, 28-35, 37-39 have been analyzed with respect to the integration of the judicial exception(s) into a practical application.  Dependent claims 22-23, 24, 29-30, 32-33, 35, 40, 43-44 each are directed to further abstract limitations.  Abstract limitations include further mental steps, or mathematic concepts: mathematic calculations, further data comparison, identifying rare variants, identifying additional paths, genotyping and diagnosing a disease.  Additional abstract limitations cannot integrate a judicial exception into a practical application as they are a part of it.  Dependent claims 26-28, 37, 41-42, 45 are related to the data gathering elements, providing additional descriptions of the data. Data gathering steps or elements are insufficient to provide a practical application to a judicial exception as they merely describe aspects of the input data, but these aspects do not affect how the judicial exception is performed.
	None of these dependent claims recite additional elements which would integrate a judicial exception into a practical application.
	Finally, the (2B) analysis. Because the claims recite an abstract idea, and do not integrate that abstract idea into a practical application, the claims are probed for a specific inventive concept.  The judicial exception alone cannot provide that inventive concept or practical application (MPEP 2106.05).  Identifying whether the additional elements beyond the abstract idea amount to such an inventive concept requires considering the additional elements individually and in combination to determine if they provide significantly more than the judicial exception. (MPEP 2106.05.A i-vi).
	With respect to claims 21, 36 and 40: The additional element of data gathering does not rise to the level of significantly more than the judicial exception(s). Zeng et al (2013) provide stored data which comprises sample genetic data, reference genetic data, and known variations of reference genetic data. (reference genome data, resequencing genetic data of the reference genome and sample sequencing data). Homer (2010), Iqbal (2013) and Leggett (2013) also disclose stored sequence read data which can be reference, reference variant, or sample based.  The prior art sets forth that datasets representing reference genetic data, known reference variations, and sample genetic data for certain samples are freely available in public databases such as the 1000 Genome Project, or the Human Genome Project databases.  As such, this data gathering element is routine, well understood and conventional in the art.  The specification also notes that a human genome reference sequence such as GRCh37 is publicly available at page 2.  Elements related to data gathering do not improve the functioning of a computer, or comprise an improvement to any other technical field, they do not require or set forth a particular machine, they do not effect a transformation of matter, nor do they provide a non-conventional or unconventional step. Data gathering steps constitute a general link to a technological environment which is insufficient to constitute an inventive concept which would render the claims significantly more than the judicial exception (MPEP2106.05(g)&(h)).
	With respect to claims 21, 36 and 40: the additional limitations of a data graph structure fails to rise to significantly more than the judicial exception(s).  Zeng et al provide sequence alignments in a data graph structure format, with nodes and paths through variants to identify genotypes, SNP, or structural variations such as insertions and deletions.  Homer et al (2010) show an example of a data graph structure having nodes and paths to identify genetic variants in Figure 6 and its discussions. Iqbal etc (2013) provide a de Brujin graph data structure comprising nodes and paths of sequence variations to determine a genotype or to verify the presence of a sequence variation (See Fig 1). Leggett et al (2013) provide data graph structures for identifying and verifying the presence of sequence variations, structural or SNP, where nodes (vertices) and paths (edges) are formed through sequence data information.  See Figure 1.  As such this data graph structure element is routine, well understood and conventional in the art.  This limitation does not improve the functioning of a computer, or comprise an improvement to any other technical field. These limitations do not require or set forth a particular machine, they do not effect a transformation of matter, nor do they provide a non-conventional or unconventional step. As such these limitations fail to rise to the level of significantly more.
	With respect to claims 21, 36 and 40:	the computer related elements or the general purpose computer do not rise to the level of significantly more than the judicial exception(s).  Zeng, Homer, Iqbal and Legget each provide computer systems comprising a processor, storage mediastored sequence data, and instructions.  As such these computing elements are routine, well understood and conventional in the art.  The additional elements are set forth at such a high level of generality that they can be met by a general purpose computer.  Therefore, the computer components constitute no more than a general link to a technological environment, which is insufficient to constitute an inventive concept that would render the claims significantly more than an abstract idea (see MPEP 2106.05(b)I-III).
	In combination, providing a graph data structure, comprising reference sequence data, and known reference sequence variant data, and providing sequence read data in combination with the judicial exception(s) provides no more than what is normally practiced in the scientific method.  Providing data, and providing a visualization structure which can be drawn by paper and pencil, or through the use of simple drawing software or word processing, then performing the judicial exception is the logical process of analyzing data.  The combination provides no non-routine elements which clearly provide any improvement to a technology.  No non-conventional steps or elements are clearly recited or required which would provide that inventive aspect.
	Dependent claims 22-24, 28-35, 37-39 have been analyzed with respect to step 2B. Dependent claims 26-28, 37, 41-42, 45 relate to the data gathering discussed above, and cannot provide significantly more than a judicial exception, as they merely describe aspects of the data provided to the exception(s).  Dependent claims 22-23, 24, 29-30, 32-33, 35, 40, 43-44 relate to additional abstract limitations.  Additional abstract elements cannot provide significantly more than a judicial exception as they are a part of that exception.  None of these claims provide a specific inventive concept, as they all fail to rise to the level of significantly more than the identified judicial exception.
	For these reasons, the claims, when the limitations are considered individually and as a whole, are rejected under 35 USC § 101 as being directed to non-statutory subject matter.
NOTE: this claim, while computer-implemented, is a data analysis type claim: data is obtained, analyzed, and new data is output. In response to this rejection, Applicant is encouraged to consider the following:
It is the integration of the judicial exception with a practical application that takes a judicial exception into the realm of being patent-eligible.  If the claim is sufficiently computer-related and/or improves its otherwise relevant field, at least the following Federal Circuit opinions may be relevant to an argument in this context: Enfish/TLI, McRO, BASCOM and Synopsys, but also see In re... Stanford (CAFC 3/11/2021, precedential).  
Since several of these opinions relate to inventions which were to some extent computer-related, arguments related to these opinions should clearly identify the particular field in which asserted improvement occurs in the claims.  These arguments generally rely on there being an "improvement" clearly on the record and in the independent claims.  One approach to clearly placing an improvement argument on the record is to show that: 1) a particular improvement is identified (assertion of general "improvement" cannot suffice); 2) there is a clear difference, apparent through comparison with the most relevant conventional technology (since there can be no "improvement" without a difference); and 3) any improvement is either explicitly recited or is inherent to the claims, but in either case must apply to all claimed embodiments within the recited claim scope.  
As further examples, argument may explain cause and effect leading to improvement or may include evidence comparing a claimed result to conventional results.  Arguments and evidence may be extrinsic to the original disclosure, including references available after the priority date, as long as it is clear that an argument applies to all embodiments of a properly supported claim.
Applicant’s arguments:
	Applicant’s arguments with respect to previous training examples are not persuasive.  Those examples are no longer considered to be instructive in determining patent-eligible subject matter.  All relevant examples have been incorporated into the MPEP 2106.
	With respect to applicant’s arguments that the claims do not comprise mathematical concepts these arguments are not persuasive.  The MPEP clearly states that mathematical concepts do not need to be a formula to be considered a mathematic concept under the abstract idea categorization.  
	“MPEP 2106.04: It is important to note that a mathematical concept need not be expressed in mathematical symbols, because "[w]ords used in a claim operating on data to solve a problem can serve the same purpose as a formula." In re Grams,. See, e.g., SAP America, Inc. v. InvestPic, LLC, (holding that claims to a ‘‘series of mathematical calculations based on selected information’’ are directed to abstract ideas); Digitech Image Techs., LLC v. Elecs. for Imaging, Inc., (holding that claims to a ‘‘process of organizing information through mathematical correlations’’ are directed to an abstract idea); and Bancorp Servs., LLC v. Sun Life Assurance Co. of Can. (U.S.), (identifying the concept of ‘‘managing a stable value protected life insurance policy by performing calculations and manipulating the results’’ as an abstract idea).”

	The step of determining a plurality of scores is not a minor aspect of the claimed invention, but is critical to the process of genotyping a sample.  The score is determined by calculating a “degree of overlap” which is a mathematic calculation based on how much of the test or sample sequence is identical to one or more reference sequences or known variants of reference sequences.
	With respect to the argument that the steps identified as mental steps, either in a computing environment, or applied using paper and pen, cannot be performed in the human mind, those arguments are not persuasive.  These steps do not set forth limitations for which the human mind is not equipped, such as requiring a GPS receiver, use of network packets and network monitors, or a specific multi-step encryption of data for communication between computers.  The steps require observation of data, evaluating data, judging data, all of which fall within the definition of mental processes.  It is noted that there is no limitation to the claim that the first graph data structure obtained comprises an entire genome.  It represents some subset or portion of a genome.  Similarly, the sample data sequence reads do not require unlimited amounts of data.  
	Even if the claims did require data structures representing an entire genome, or unusually large amounts of data, those elements are not sufficient.  
The courts have repeatedly set forth that amounts of data, in and of itself, is not sufficient: “We acknowledge that such computations performed mentally, or with paper and pencil, would take considerable time and effort, but that is, of course, the singular purpose of computers and computer networks, to perform large numbers of calculations, via algorithms, rapidly, and without error (assuming no error in user input). Although a general-purpose computer can perform calculations at a rate and accuracy that can far outstrip the mental performance of a skilled artisan, the nature of the activity is essentially the same, and constitutes an abstract idea. See Bancorp Serves., L.L. C. v. Sun Life Assur. Co. of Canada (U.S.), 687 F.3d 1266,1278 (Fed. Cir. 2012) (holding that “the fact that the required calculations could be performed more efficiently via a computer does not materially alter the patent eligibility of the claimed subject matter”); see also See SiRF Tech., Inc. v. Int’l Trade Comm ’n, 601 F.3d 1319,1333 (Fed. Cir. 2010) (holding that: In order for the addition of a machine to impose a meaningful limit on the scope of a claim, it must play a significant part in permitting the claimed method to be performed, rather than function solely as an obvious mechanism for permitting a solution to be achieved more quickly, i.e., through the utilization of a computer for performing calculations).

	The limitations identified as mental steps require applying some subset of the plurality of sequence reads of the sample to the graph data structure- this is a mental step of matching up the sequence of the sequence read, with the data structure which represents reference and known variant data visually: Figures 6 and 7 of the instant application are an example.  Figure 6 sets up a reference sequence construct with known reference sequence variants. SEQ ID NO 12 is the base reference sequence, and SEQ ID NO 13 and 18 represent known variant reference sequences.\

    PNG
    media_image1.png
    452
    681
    media_image1.png
    Greyscale

	This graph discloses a graph data structure, obtained by the computer system, which provides a reference sequence and at least 2 known variant sequences, as strings.  Traversing the string with any allelic differences illustrated as nodes and edges (alleles 1= reference, alleles 2-3= known variants).  This represents a portion of a genome of a species.  This represents a structural variant, and a variant of a single nucleotide or single symbol.  The graph data structure is not generated in any particular way: it is merely retrieved from another source.

    PNG
    media_image2.png
    467
    644
    media_image2.png
    Greyscale

	With a plurality of sequence reads from the sample, any given read is compared with the reference sequence construct #2, as illustrated in Fig 7.  Making the traversal for each sequence read SEQ ID NO 14-15 and 19 through each path of the reference construct allows for the genotyping of that allele.  This portion of the sequencing reads to be applied to or compared to the reference construct are easily able to be genotyped by the human mind, or in a computing context using basic matching properties. As labeled in Fig 7, SEQ ID NO 15, Read #2, represents matching to the base reference sequence, while reads #1 and #5 (SEQ ID NO: 14 and 19) represent matching to the known variants of the base reference sequence. In the context of genotyping the sample comprising read #2 would be considered/ genotyped as wild-type or reference, the sample comprising read #1 would be considered/ genotyped as allele #2, and a sample comprising read #5 would be considered/ genotyped as having allele #3.
	No aspect of this comparison requires something for which the human mind is not equipped. With respect to the computing environment of a generic computer, “the courts do not distinguish between mental processes that are performed entirely in the human mind and mental processes that require a human to use a physical aid (e.g., pen and paper or a slide rule) to perform the claim limitation. See, e.g., Benson, 409 U.S. at 67, 65, 175 USPQ at 674-75, 674 (noting that the claimed "conversion of binary- coded decimal numerals to pure binary numerals can be done mentally,”  i.e., "as a person would do it by head and hand").”  MPEP 2106.04 “Nor do the courts distinguish between claims that recite mental processes performed by humans and claims that recite mental processes performed on a computer. As the Federal Circuit has explained, "[c]ourts have examined claims that required the use of a computer and still found that the underlying, patent ineligible invention could be performed via pen and paper or in a person’s mind." Versata Dev. Group v. SAP Am., Inc., 793 F.3d 1306, 1335, 115 USPQ2d 1681,1702 (Fed. Cir. 2015). See also Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307,1318, 120 USPQ2d 1353, 1360 (Fed. Cir. 2016) (‘‘[W]ith the exception of generic computer-implemented steps, there is nothing in the claims themselves that foreclose them from being performed by a human, mentally or with pen and paper.’’); Mortgage Grader, Inc. v. FirstChoice Loan Servs. Inc., 811 F.3d 1314, 1324, 117 USPQ2d 1693, 1699 (Fed. Cir. 2016) (holding that computer-implemented method for "anonymous loan shopping” was an abstract idea because it could be "performed by humans without a computer").
The mental steps of comparing sequence read data from a sample to the reference sequence construct which also comprises known variants and genotyping the allele within the construct, can be practically performed by the human mind, or by a person with pen and paper, or using a general purpose computer as a tool. There are no data manipulations in those steps which cannot be performed as such. “Another example is Berkheimer v. HP, Inc., 881 F.3d 1360, 125 USPQ2d 1649 (Fed. Cir. 2018), in which the patentee claimed methods for parsing and evaluating data using a computer processing system. The Federal Circuit determined that these claims were directed to mental processes of parsing and comparing data, because the steps were recited at a high level of generality and merely used computers as a tool to perform the processes. 881 F.3d at 1366, 125 USPQ2d at 1652-53.”
	Applicant has provided no evidence as to the practicality or impracticality of meeting embodiments over the scope of the data from the sample.
Further, with respect to arguments regarding any alleged improvement, it is unclear that the independent claims recite all the necessary and sufficient steps required to achieve that improvement. MPEP 2106.05(a): “An important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. McRO, 837 F.3d at 1314-15, 120 USPQ2d at 1102- 03; DDR Holdings, 773F.3d at 1259, 113 USPQ2d at 1107.”
The MPEP sets forth that “if the examiner concludes the disclosed invention does not improve technology, the burden shifts to applicant to provide persuasive arguments supported by any necessary evidence to demonstrate that one of ordinary skill in the art would understand that the disclosed invention improves technology. Any such evidence submitted under 37 CFR 1.132 must establish what the specification would convey to one of ordinary skill in the art and cannot be used to supplement the specification.” Applicant’s arguments cannot take the place of evidence.
New Grounds of Rejection
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 21-24, 26-30, 32-33, 35-37, 40-45 is/are rejected under 35 U.S.C. 103 as being unpatentable over Iqbal (2012-A) in view of Iqbal (2012-B).  I will refer to each reference by IA or IB below.
Iqbal, Z. et al. High-throughput microbial population genomics using the Cortex variation assembler. Bioinformatics, vol 29, number 2 p275-276, and supplemental material (published online 11/19/2012). IA
Iqbal, Z. et al. de novo assembly and genotyping of variants using colored de Brujin graphs.  Nature genetics, vol 44 number 2, p226-233, and supplemental material (Published online 01/08/2012) IB.


The claims have been heavily amended: 
	The claims are drawn to computer systems, computer program products and computer-implemented methods for genotyping a genetic sample. A graph data structure already comprising a reference sequence and a corresponding known variant of the particular reference sequence “representing a genome of a species” is obtained.  This data structure has at least two paths, where any divergence between the reference and the known variant is represented in the string of symbols.  The difference or divergence is the second path and is the variant allele.  Figure 6 of the application is a sample of such a graph data construct obtained by the computer system.  Some subset of a plurality of sequence reads from a sample are aligned to the graph data structure, wherein a plurality of scores are determined for each symbol between the first sequence read and the reference construct.  This is expressed as a degree of overlap of the sequence read to either the reference sequence, or the known variant sequence.  Based on the alignment, any allele identified within the reference construct can provide a genotype at that allele. Figure 7 illustrates applying one or more sequence reads to the reference construct.
	IA provides the Cortex program, specifically designed for the analysis of genetic sequence variations with or without known reference sequences.  As set forth in the abstract, “these also enable the construction of a graph of known sequence and variation in a species against which new samples can be compared rapidly.”  
	With respect to claim 21, computer systems, comprising I/O, processors, data storage, memory and display elements are disclosed throughout.  IA discloses a computer program (Cortex), which provide graph data structures.  IA provides workflow pipelines for genotyping a sample, which can comprise a reference microbial genomic sequences and known variant sequences of the reference genome. Each sequence in the reference construct is represented by a string of symbols.  The sequencing reads also comprise a string of symbols.  Where the sequence of the known variant diverges, a break in the path is inserted in order to incorporate the divergence as the beginning of a second path.  Fig 1 of IA is a simplified cartoon of such a graph: the graph data structure comprises the reference sequence as the top black line.  The known variants are shown in red or blue lines below the black line, and show the divergence from the original string path as squares or rectangular elements.  The “node” is first divergent symbol identified, the edges illustrate the length of the divergence, to a second node prior to sequence re- convergence. The graph has nodes and traversing the nodes by a pathway along the symbol strings can provide the sequence or the genotype at identified alleles.  

    PNG
    media_image3.png
    295
    416
    media_image3.png
    Greyscale

	Section 4 “a repository of variation” sets forth reference sequence data, as well as known variants of the reference sequence data.  These are assembled into a reference graph data construct comprising the reference sequence and the known variants of that same sequence.  “Assembled graphs and callsets are a repository of known sequence and variation.” P276. 
	Once the “assembled graphs” comprising the reference symbol string and the known variant strings is obtained, sequence reads from a sample can be aligned against the assembled graph.  IA provides “a three-way alignment between the assemblies and a reference, called variants and removed alignment errors” identifying SNP present on the chromosome represented in the assembled graph. (p267).
	As set forth in the supplemental information, the use of de Brujin graph data structures, sequencing depth, read length, variant length and error rates are all important.  Some elements of a complex genome having a complex repeat structure cannot be genotyped effectively within those repeat regions.  “Therefore, to obtain a maximum sensitivity set of variant calls, one would ideally try a range of kmer values, call variants for each, and then take a union. The new Cortex
workflows enable precisely this.” Supplemental information p1. In this section, IA further notes “we emphasise that this benefit of combining results from different kmers-values is in sharp contrast to common practise in de Bruijn-based whole-genome consensus assembly, where it is
standard to try to assemble at different kmers, and then choose the ”best”. As we have described above, in variant discovery there is no need to find a single parameter which optimises the results.” Supplemental information p2. Cortex is able to detect some collapsed copies of a repeat, through use of the population filter “which looks at allele balance and coverage statistics to determine if the putative site is behaving like a polymorphism, repeat, or error” Supplemental Information p2. This is one type of scoring, where the symbols of the sequence read are scored at each position, in matching both the reference sequence or the known variant sequence, to identify the presence of an allele or polymorphism.  
	This is described in section 6 of the supplemental information: 
“Having discovered a variant, in the form of 5-prime flank, allele, allele, 3-prime flank,
calls are typically mapped to a reference genome to get coordinates, and alleles are aligned with each other to classify the variant type. Results are presented in VCF format. If variants are within k base pairs of each other, and if Cortex is able to call them, then it will call them as a combined ”long bubble” - this is an advantage for discovery, as Cortex is not confounded in the way a mapper would be. However it does mean it is necessary to compare the two called alleles for the purpose of deciding what type of variant(s) they represent - this is all wrapped automatically by the run calls script; two VCF files are produced for each callset. The primary call file is the “.raw” call file, containing the calls. The secondary one (“.decomp”) VCF splits complex variants and phased SNPs into constituent SNPs and clean indels where possible... If the goal is to produce a VCF with respect to a specific reference genome, then it is mandatory that the reference genome is present in the graph. Cortex therefore allows a lot of flexibility in how it is run.” (p2-3, supplemental information).
	Section 13, p5-6 of the supplemental information, discloses how to generate graph data structures for references and sequences having known variations, then comparing a sequencing read from a new sample to the graph structure.  This section also provides the pseudocode to carry out the programmed workflow to make the reference constructs and the comparison to the test sample sequence read.  This meets the limitation of instructions which when carried out by the processor, perform the steps of the method.
	IA does not provide additional figures, or details as to complex genotyping using the Cortex program, and the deBrujin graph data structures. IA does not specifically speak to determining an overlap score.  IA does specifically refer to IB as the source of the Cortex program description and refers readers to this document at several places.
	IB discloses “an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously.” (abstract). Cortex focuses on detecting and characterizing genetic variation in one or more individuals.  “This approach accommodates information from multiple samples, including one or more reference sequences and known variants.” P227.  Cortex is demonstrated to improve accuracy of genotyping known variants in a new sample.
	Graph data structures which comprise a reference, and at least one known variant sequence of the reference are shown in Fig 1a-d, and supplemental Fig 1a-d (provided below).  

    PNG
    media_image4.png
    1294
    924
    media_image4.png
    Greyscale

	Creating a graph data structure which comprises the reference genome sequence, and known sequence variants is set forth in the results sections named “Bubble Calling” and “Path divergence” page 227.  As defined in the supplemental information, “a bubble is a pair of supernodes with the same start and end noted.  This generalizes straightforwardly to multiallelic sites.” P1 supplemental information.  The supplemental information continues to provide detailed information as to how genome complexity is estimated, developing a Poisson model incorporating read length, coverage, kmer size and sequencing errors. The effects of error cleaning, and tip clipping (removing a branch with only a single node) are provided.  
	A graph data structure of a hash table is used in the Bubble caller, and the pseudocode for traversing that structure is provided at page 7 of the supplemental information.  Next the Path divergence caller algorithm is provided beginning at page 8.  “for complex variants…the path of at least one allele is unlikely to generate a clean contig…It is possible to identify such (variants) cases by following the path of the reference through the joint graph…” The joint graph comprises the reference genome sequence and the known variant sequence.  P8 provides the pseudocode for the Path divergence algorithm.  This is illustrated in supplemental figure 2. 

    PNG
    media_image5.png
    650
    802
    media_image5.png
    Greyscale

	Section 4 specifically provides genotyping using the graph data structure. “the following algorithm assumes there is a multicolored de Brujin graph with one color for each known allele, one color for the reference genome… and one color for the sample…” (supplemental information p9).  For any pair of paths, the likelihood function on p10 is provided and specifically calculates overlaps of the test sequence to either the reference genome string, or the known variant sequence string.  
	Traversal of the two strings of the reference construct, to identify a genotype is discussed at pages 11-12 of the supplemental information.  “Thus, given a kmer, functions are provided to allow fetching the corresponding node (and hence coverage and edges), to traverse the graph in various ways, and to apply global operations such as cleaning the graph based on coverage and/or topology.  IB specifically notes two performance benefits of Cortex at section 5.1: “Cortex is the only assembler capable of simultaneously handling multiple eukaryotic genomes.  Secondly performance (speed) is improved…Furthermore by introducing a compact binary file format for de Brujin graphs, we allow the graph building process to be parallelized across a compute cluster, in a manner that scales well.” P11 supplemental information. A detailed simulation of Cortex for a single diploid genome is provided beginning at page 12 of the supplemental information.  The application of the population filter is discussed at page 13, which discusses identifying true variants as compared to known variants and the reference.  
	The de Brujin graphs of IB can be used to genotype samples at known loci, even when coverage is insufficient to enable variant assembly.  “we construct a colored de Brujin graph of the reference sequence, known allelic variants, and data from the sample.  The likelihood of each possible genotype is calculated, accounting for the graph structure of both the local and genome-wide sequence.  This approach generalizes to multiple allelic types and because the algorithm doesn’t require variants to form simple bubble structures, it is possibly to genotype complex and compound variants such as those at classical HLA loci.” IB p228. Cases 1-4 in the results section detail use of Cortex for varying genotyping tasks.  
	IB concludes: “the key advance is the development of a highly efficient de Bruijn graph implementation. This efficiency enables data from multiple samples, as well as reference sequences and known variants, to be included in a single graph structure that preserves sample identity through the use of colors. For single high-coverage genomes, the algorithms provide power to detect and genotype simple and complex variants. However, the main strength
of the approach lies in the simultaneous analysis of multiple genomes, which enables powerful and accurate approaches to variant detection without the need for a reference genome. This makes possible HTS analysis of genetic variation in any species. It could also provide an
approach for detecting changes between highly related genomes, as in tumor-normal pairs in cancer genomics or bacteria in transmission chains.” P231.  
	In KSR Int 'l v. Teleflex, the Supreme Court, in rejecting the rigid application of the teaching, suggestion, and motivation test by the Federal Circuit, indicated that: “The principles underlying [earlier] cases are instructive when the question is whether a patent claiming the combination of elements of prior art is obvious. When a work is available in one field of endeavor, design incentives and other market forces can prompt variations of it, either in the same field or a different one. If a person of ordinary skill can implement a predictable variation, § 103 likely bars its patentability.” KSR Int'l v. Teleflex lnc., 127 S. Ct. 1727, 1740 (2007).
	Applying the KSR standard of obviousness to IA and IB, the examiner concludes that the combination of IA and IB represents applying a known technique to a known method.  IA set forth a particular application of the Cortex program, which provided the same graph data structures comprising a reference sequence as a symbol string, and a known variant sequence shown as a second symbol string, where the point of divergence is split at the variant to provide two paths through the graphed strings.  Applying sample sequence reads to the graph reference construct allows for genotyping of the allele within that reference graph construct.  IA does not detail the scoring process within its paper, but specifically refers to IB where full details of how Cortex performs each step of their genotyping process is disclosed.  The scoring provides a degree of overlap of the test string with each of the two strings in the reference construct, by providing a plurality of scores at each symbol or node.  One of skill in the art would have been motivated to have looked to IB to select the scoring steps within, as the discussion of the steps executed by Cortex provide specific improvements over the prior art, in accuracy and speed.  One of skill would have had a reasonable expectation of success at performing such genotyping methods, as both IA and IB provide specific pseudocode, algorithms, and the Cortex program itself.  

	With respect to claim 22, when the plurality of the sample sequence reads align to the second path, the instructions report out the presence of the first structural variation (IA Fig 1; IB Fig 1, supplemental fig 1).
	With respect to claims 23-24, Iqbal provides a directed acyclic graph: a graph with direction, that is not cyclic.  DeBrujin graphs are DAG.  (IA Fig 1, IB Fig 1, Supplemental Fig 1). 
	With respect to claim 26, at least some of the sequence reads from the sample include the first genetic structural variation, in order to perform the analysis of that variation (IA Fig 1, IB Fig 1, supplemental fig 1). 
	With respect to claim 27-28, 3 nucleotide deletions, insertions or polymorphisms are identified and a part of the second path. (IA Fig 1, Section 2 and Section 4, IB Fig 1, supplemental fig 1).
	With respect to claims 29, 32, 37, 41-42, 45,  IA comprises sequencing reads of a variety of lengths which can be specified as desired, which comprise sequence variations of small or large lengths which are then analyzed. IB, in the supplemental information, notes that the distance between any two structural variations can be controlled or set as desired.  
With respect to confidence values in claims 32, 43, 44, IB discloses confidence values based on overlapping alignment scores, which then can drive the identification of the structural variation, or additional new variations “near” a known rare variant.  (IB supplemental, Bubble caller, Path Divergence).
	With respect to claims, 30 and 33 the graph structure comprises two or more paths at a second or other position of the graph data structure. (IA Fig 1; IB throughout). This includes a third path.
With respect to claim 35, 38, 39 when the presence of a particular variant or set of variants is determined it is a genotype.  When this genotype is known to be associated with a species phenotype, or a disease phenotype, the species or disease can be diagnosed.  
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 21-24, 26-30, 32-33, 35-37, 40-45  remain rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-19 of U.S. Patent No. 10,078,724. Although the claims at issue are not identical, they are not patentably distinct from each other because both are drawn to methods of genotyping a sample, utilizing directed acyclic graphs to provide alternative paths for alignment of sequencing reads to a reference sequence which comprises known variants.  The patent claims directed to specific overlap scoring for genotyping of two alleles are now specifically recited by the claims of the instant application.

Claim s 21-24, 26-30, 32-33, 35-37, 40-45  remain provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim s 21-40 of copending Application No. 16/443,402 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because both claims recite instructions for addressing genomic reference graph structures, which include variant structure pathways, for identification of additional genomic variants or pathways. Claims 21-40 of ‘402 specifically obtain reference graph structures, which comprise reference genome sequence symbol strings, and known variants of the reference genome as symbol strings, align one or more reads from a sample to the reference construct, determine overlap scores, and identify new variants.  The computer program products of claim 21 of ‘402 is equivalent to the computer program products of claim 40 of the instant application providing the same steps.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. It is noted this application has been ALLOWED, and once it has issued, it will no longer be a provisional rejection.

Claim s  21-24, 26-30, 32-33, 35-37, 40-45 remain provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim s 21-40 of copending Application No. 17087300 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because both claims recite instructions for addressing reference graph structures, which include variant structure pathways, for identification of additional variants or pathways. The ‘300 claims utilize “support values” for the nodes which are encompassed within the calculations of the degree of  overlaps of the sequence read with the reference graph data structures of the instant application.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.  At the time of writing, this application appears unexamined.

Claim s 21-24, 26-30, 32-33, 35-37, 40-45 remain provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim s 21-40 of copending Application No. 17/087,385 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because both claims recite instructions for obtaining DAG genomic reference graph structures, which include variant structure pathways, for identification of additional genomic variants or pathways.  The ultimate purpose is for genotyping the sample. The computer system of ‘385 claim 21 is equivalent to claim 21 of the instant application. The method of ‘385 claim 28 is equivalent to claim 36 of the instant application.  The program of ‘385 claim 35 is equivalent to claim 40 of the instant application.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. At the time of writing this application appears unexamined.

Claim s 21-24, 26-30, 32-33, 35-37, 40-45 remain are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim s 23-42 of copending Application No. 17/095206 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because both claims recite instructions for creating DAG for genomic reference graph structures, which include specific known variant structure pathways, for identification of additional genomic variants or pathways. This provides a nucleotide sequence of at least a portion of a sample genome. The ‘206 application further specifies more than one alternative sequences per position at multiple positions which is generically encompassed by the claims of the instant application.  The 206 application specifically notes the DAG corresponds to “at least one substantially entire sequence of at least one chromosome” which is encompassed by the claims of the instant application.  
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. At this time, this application appears unexamined.

Claim s 21-24, 26-30, 32-33, 35-37, 40-45 remain provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim s 21-40 of copending Application No. 16/106,996 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because both claims recite instructions for addressing genomic reference graph structures, which include variant structure pathways, for identification of additional genomic variants or pathways. In the ‘996 application the known reference variants are known to be from a cancerous sample, and the reference sequence from a noncancerous sample.  The instant application encompasses diagnosing disease by identifying the presence or absence of structural variations present in a DAG.  The claims of the instant application do not specify cancer or tumor sequence as the sample, but they are encompassed within the generic recitation of the sample.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. It is noted that this application has been ALLOWED.  When this issues, the rejection will no longer be provisional.

Claims 21-24, 26-30, 32-33, 35-37, 40-45 remain rejected on the ground of nonstatutory double patenting as being unpatentable over claim1-20 of U.S. Patent No. 10,053,736. Although the claims at issue are not identical, they are not patentably distinct from each other because claims recite instructions for obtaining genomic reference graph structures, which include variant structure pathways, for identification of additional genomic variants or pathways present in sequence reads from a sample. In the patent the known reference variants are known to be from cancerous samples, and the reference sequence from a noncancerous sample.  The instant application encompasses diagnosing disease by identifying the presence or absence of structural variations present in a DAG. The sample of the instant limitation includes cancer or tumor related sequences within its scope.
Claims 21-24, 26-30, 32-33, 35-37, 40-45 remain rejected on the ground of nonstatutory double patenting as being unpatentable over claim1-20 of U.S. Patent No. 9116866. Although the claims at issue are not identical, they are not patentably distinct from each other because claims recite instructions for analyzing sample genetic sequence reads to genomic reference graph structures, which include variant structure pathways, for identification of a new mutation. In the patent the known reference variants are within a certain distance of known variations.  The instant application sets forth  dependent claims wherein a new mutation or variant is “near” a known structural variant, and within 100bp or fewer.
Claims 21-24, 26-30, 32-33, 35-37, 40-45 remain are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-13 of U.S. Patent No. 9390226. Although the claims at issue are not identical, they are not patentably distinct from each other because claims recite instructions for analyzing sample genetic sequence reads to genomic reference graph structures, which include variant structure pathways, for identification of a new mutation. In the patent the known reference variants are within a certain distance of known variations.  The instant application sets forth a dependent claim wherein a new mutation or variant is “near” or within 100bp of a known structural variant. The patent refers to optimized scoring, which is equivalent to the scoring provided in the instant application.
Claims 21-24, 26-30, 32-33, 35-37, 40-45 remain rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10325675. Although the claims at issue are not identical, they are not patentably distinct from each other because claims recite instructions for analyzing sample genetic sequence reads to genomic reference graph structures, which include variant structure pathways, for the purpose of genotyping. In the patent certain scoring and overlap values are determined with a backtracking step, which are included and equivalent to the scoring now recited within the claims of the instant application.  
Applicant’s arguments
	Applicant’s arguments with respect to the obviousness type double patenting rejections are not persuasive.  Each of the referenced applications or patents 1) obtains reference sequence graph constructs which provide additional paths for known variants; aligns one or more reads from a sample to that graph construct, for the purpose of genotyping the sample, identifying a new variant, and/or diagnosing a disease.  The scoring recited in multiple cited applications and patents are the same overlap scores now recited in the instant claims, or encompass that type of scoring.  Various lengths of the structural variations are specified in multiple applications and patents, and are now specifically claimed in the instant application.  Various length of the reference construct comprising the reference sequence and paths for separate known variations discussed in each application and patent are encompassed by the instant claims (from a portion of a genome, up to essentially a complete chromosome).  Each identified allele along the construct can be combined to represent a genotype.  As such, these rejections remain.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARY K ZEMAN whose telephone number is 5712720723.  The examiner can normally be reached on 8am-2pm M-F.  Email may be sent to mary.zeman@uspto.gov if the appropriate permissions have been filed.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karl Skowronek can be reached on 571 272 9047.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

	/MARY K ZEMAN/            Primary Examiner, Art Unit 1631