DETAILED ACTION
Applicant’s response, filed 16 May 2022, has been fully considered. The following rejections and/or objections are either reiterated or newly applied. They constitute the complete set presently being applied to the instant application.

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Status of Claims
Claim 2 is cancelled.
Claim 20 is newly added.
Claims 1 and 3-20 are pending.
Claims 1 and 3-20 are rejected.
Claims 1, 7, 11, 14, 16-18, and 20 are objected to.

Priority
The effective filing date of the claimed invention is 17 Sept. 2020.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 14 Feb. 2022 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the list of cited references was considered in full by the examiner.

Claim Objections
The objection to claims 1-2 and 11 in the Office action mailed 14 Feb. 2022 has been withdrawn in view of claim cancellations and/or amendments received 16 May 2022.
Claims 1, 6-7, 11, 14, 16-18, and 20 are objected to because of the following informalities. Any newly recited portion herein is necessitated by claim amendment.
Claims 1, 16, and 18 recite “…modifying an estimated position…based on supplementary alignment (SA) tag associated with the first…”, and then later recite “…the SA tag associated with the first structural variant”. Accordingly, to fix a grammatical error and clarify that there is one SA tag associated with the first structural variant, the claims should be amended to recite “…based on a supplementary alignment (SA) tag associated with…”.
Claim 6 recites “…an average insert size of across both the tumor…and the normal…”, which appears to be a typographical error and should recite “…an average insert size across both…”.
Claim 7 recites “…wherein the features comprises…”, which is a grammatical error and should recite “…wherein the features comprise…”.
Claim 11 recites “…wherein the features comprises…”, which is a grammatical error and should recite “…wherein the features comprise…”.
Claim 14 recites “The method of claim 1, further comprises:….”; to clarify that claim 14 requires the method of claim 1 and further comprises the recited steps, the claim should be amended to recite “The method of claim 1, further comprising…”.
Claim 17 recites “The method of claim 16, further comprise:”, which, as discussed above for claim 14, should be amended to recite “The method of claim 16, further comprising…”.
Claim 20 recite “if the region included in the SA tag and region mapped by the alignment tool is determined to be different and not continuous with each other”, which is a grammatical error and should recite “…by the alignment tool are determined to be different…”.
Appropriate correction is required.

Response to Arguments
Applicant's arguments filed 16 May 2022 regarding the claim objections have been fully considered but they are not persuasive. 
Applicant remarks that claims 41 and 17 have been amended to address the alleged informalities and thus the claim objections are overcome by the claim amendments (Applicant’s remarks at pg. 10, para. 3-5).
This argument is not persuasive. Claims 14 and 17 recite “The method of claim 1/16, further comprises:…”, which does not serve to clarify that claims 14 and 17 require the method of claims and 16, respectively, and further require an additional step. To overcome the objections claims 14 and 17 should be amended to recite “The method of claim 1, further comprising…” and “The method of claim 16, further comprising…”, respectively.

Claim Interpretation
Claims 1, 16, and 18-19 recite “…modifying an estimated position of a first structural variant candidate…based on [a] supplementary alignment tag associated with the first structural variant…., wherein the SA tag is data that is outputted by an alignment tool that maps each read of split reads to a reference sequence,….and the alignment tool determines that the corresponding read maps to the first region on the reference sequence, and records and outputs the SA tag including information about the second region”. The limitation regarding the alignment tag being outputted by an alignment tool is interpreted to be a product by process limitation that serves to define the process in which the supplementary alignment tag was previously generated; however, a step of outputting the supplementary alignment tag using an alignment tool is not required within the metes and bounds of the claim. See MPEP 2113.
Claim 20 recites “…if the region included in the SA tag and region mapped by the alignment tool is determined to be different and not continuous with each other, determining the read to be the split read, and modifying the position of a breakpoint associated with the first structural variant based on the position of the split read”, which has been interpreted to require determining the read is a split read if regions of the read are mapped to different, not continuous regions by the alignment tool, as discussed in the 112(b) rejection below. The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met. See MPEP 2111.04 I. Because the claim does not require that the read is determined to be a split read, the broadest reasonable interpretation of the claim does not require modifying the position of a breakpoint associated with the first structural variant based on the position of the split read.

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 6-7, 11, and 20 are rejected under 35 U.S.C. 112(a) as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor at the time the application was filed, had possession of the claimed invention. This rejection is newly recited and necessitated by claim amendment.
Claim 6, and claims dependent therefrom, recite “…wherein the first length is an average insert size across both the tumor genome data and the normal tissue data, included in the whole-genome sequencing data”. Claim 5, from which claim 6 depends, recites “wherein the extracting comprises: obtaining…first data that has a first length or more…”, such that the features for each structural variant are extracted by obtaining data within a region of a first length or more. Applicant’s specification at para. [0016]-[0017] discloses that features may be extracted of a first length or more in a direction, and that the first length or more may be an average insert size of the whole-genome sequencing data. Applicant’s specification further discloses at para. [0060] that the feature extractor extracts the features of the tumor genome data different from the normal tissue genome data by examining the breakpoints of each candidate structural variant in the tumor genome data and the normal genome data, and discloses a list of features in Table 1 that are specific to the tumor genome data or the normal genome data. While Applicant’s specification at para. [0012] discloses the whole-genome sequencing data includes a pair of tumor genome data and normal tissue genome data, Applicant’s specification does not disclose any process in which the features for the tumor genome data and the features for the normal genome data, which are determined separately as discussed above, based on average insert size of reads across both the tumor genome data and the normal tissue genome data. Instead, Applicant’s specification provides support for using an average insert size of the tumor genome data (e.g. for determining tumor-specific features) or an average insert size of the normal genome data (e.g. for determining normal features).
For the reasons discussed above, the specification does not provide a sufficient disclosure of the limitation of “…wherein the first length is an average insert size across both the tumor genome data and the normal tissue data, included in the whole-genome sequencing data” recited in claim 6, to demonstrate to one of ordinary skill in the art that the inventor possessed the invention at the time the application was filed. THIS IS A NEW MATTER REJECTION. For more information regarding the written description requirement, see MPEP §2161.01- §2163.07(b).


Claims 7 and 11 recite “…wherein the features comprises at least one of…a number of split reads which do not have SA tags”. In this case, Applicant’s specification at para. [0018], [0022], [0070]-[0071] discloses the features can include a number of split reads and a number of split reads with supplementary alignment (SA) tags. However, Applicant’s specification does not provide support for extracting a feature of a number of split reads without supplementary alignment tags. Given the feature of a number of split reads is interpreted to include split reads with both supplementary alignment tags and without supplementary alignment tags, this does not provide support for extracting a different feature of the number of split reads without supplementary alignment tags. 
For the reasons discussed above, the specification does not provide a sufficient disclosure of the limitation of “…wherein the features comprises at least one of…a number of split reads which do not have SA tags” recited in claims 7 and 11 to demonstrate to one of ordinary skill in the art that the inventor possessed the invention at the time the application was filed. THIS IS A NEW MATTER REJECTION. For more information regarding the written description requirement, see MPEP §2161.01- §2163.07(b).

Claim 20 recites “…determining a read of the first structural variant candidate has the SA tag, if the read has the SA tag, determining whether the region included in the SA tag and region mapped by the alignment tool is different and not continuous with each other, if the region included in the SA tag and region mapped by the alignment tool is determined to be different and not continuous with each other, determining the read to be the split read…”. Applicant’s specification at para. [0056]-[0057] discloses that a particular read may be split, and thus may be mapped to two different positions or regions that are not continuous with each other on the reference sequence, that it can be analyzed that most sequences of the read are mapped to the first region of the reference sequence and the remainder sequences of the read are mapped to the second region that is not continuous with the first region, and in this case, the alignment tool determines the read maps to the first region, but records information about the second region as the SA tag and outputs it. Accordingly, Applicant’s specification discloses analyzing whether a read maps to two regions that are not continuous with each other, determining if the read is a split read if it does, and outputting the SA tag for that split read. However, Applicant’s specification does not provide support for first determining a read has the SA tag, and then determining the read with the SA tag is a split read (e.g. the SA tags are outputted after determining the read is a split read).
For the reasons discussed above, the specification does not provide a sufficient disclosure of the limitation of  “…determining a read of the first structural variant candidate has the SA tag, if the read has the SA tag, determining whether the region included in the SA tag and region mapped by the alignment tool is different and not continuous with each other, if the region included in the SA tag and region mapped by the alignment tool is determined to be different and not continuous with each other, determining the read to be the split read…” recited in claim 20, to demonstrate to one of ordinary skill in the art that the inventor possessed the invention at the time the application was filed. THIS IS A NEW MATER REJECTION. For more information regarding the written description requirement, see MPEP §2161.01- §2163.07(b).	

Claim Rejections - 35 USC § 112(b)
The rejection of claims 1, 3-11, and 13-19 in the Office action mailed 14 Feb. 2022 has been withdrawn in view of claim amendments received 16 May 2022.
The rejection of claim 2 in the Office action mailed 14 Feb. 2022 has been withdrawn in view of the cancellation of this claim received 16 May 2022.
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 5, 8-9, 12, and 20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention. Any newly recited portion herein is necessitated by claim amendment.
Claims 5 and 8, and claims dependent therefrom, is indefinite for recitation of “…obtaining…first data that has a first length or more in a first direction…” and “…obtaining…second data that has a second length or less in a second direction….”. It’s unclear if claims 5 and 8 intend to require obtaining data that is specifically a particular length (e.g. a vector of a first/second length), or if the claims intend to require obtaining data in a region of a first or second length in a first or second direction. If Applicant intends for the claims to require obtaining data that is a particular length, then it’s further unclear in what way the data that is the first/second length also has a direction relative to a breakpoint. As such the metes and bounds of the claims are unclear. It is noted that Applicant’s specification at para. [0016] discloses that extracting features may be based on data located within a predetermined range from each structural variant by obtaining data of a first length or more in a direction, which suggests the extracted features is based on data obtained in a region of a first length or more or a second length of less, respectively, from the breakpoint. Therefore, for purpose of examination, the limitations are interpreted to mean data is obtained in a region of a first length or more in a direction, and in a region of a second length of less in a second direction.
Claim 5 is indefinite for recitation of “…first data…in a first direction, in which the corresponding read to the first structural variant candidate is located, from a breakpoint…”. Claim 1, from which claim 5 depends, recites that the corresponding read is located at both a first region and a second region that are different and not continuous. Furthermore, Applicant’s specification at para. [0057] discloses a split read located at breakpoints may be mapped to two different positions or regions that are not continuous with each other on the reference sequence. This is also illustrated by Averbuj (Applying machine learning to detect somatic structural variation in bulk whole genome sequencing, 2020, San Diego State University, pg. 1-59; Pub. Date: Aug. 2020; previously cited) at Figure 5, which discloses the split read separated by break points with a portion of the split read at each breakpoint. Accordingly, it’s unclear which direction the first data is intended to be from the break point, given the split read (i.e. the corresponding read) is located in two positions and can be in either direction from either breakpoint (see Averbuj Figure 5). As such, the metes and bounds of the claims are unclear. For purpose of examination, the limitation is interpreted to mean …”first data…in a first direction…from a break point”, such that the first data can be in any direction from the break point.
Claim 8, and claims dependent therefrom, are indefinite for recitation of “…second data…in a second direction, which is opposite to a first direction in which the corresponding read to the first structural variant candidate is located, from a breakpoint…”. As discussed above for claim 5, it’s unclear which direction the second is intended to be from the break point, given the split read (i.e. the corresponding read) is located at two positions and can be in either direction from either breakpoint (see Averbuj Figure 5); as such, it’s also unclear which direction is opposite to a first direction the corresponding read is located. Therefore, the metes and bounds of the claim are unclear. For purpose of examination, the limitation is interpreted to mean …”second data…in a second direction opposite to a first direction…from a break point”, such that the second data can be in any direction opposite to a different direction from the break point.
Claim 12 is indefinite for recitation of “…wherein the features comprises at least one of tumor histology, whole-genome duplication status, tumor purity in a sample, and tumor ploidy of a tumor, from the tumor genome data”. Claim 1, from which claim 12 depends, recites “….extracting features of each structural variant candidate based on data located within a predetermined range from each structural variant candidate…”, and thus involves that the features are extracted for each individual variant. Given each of the listed features in claim 12 corresponds to a genome/sample level feature (i.e. a tumor histology corresponds to the entire tumor, tumor purity in a sample, etc.), and claim 12 recites the features are “from the tumor genome data”, but “the features of claim 1” are extracted specifically for each structural candidate variant, it’s unclear if Applicant intended to extract such sample-wide features for each structural variant within a sample, or if Applicant intended to extract a single set of these sample-wide features for the structural variant candidates. As such, the metes and bounds of the claims are unclear. Applicant’s specification at para. [0062] discloses that among the 45 features, tumor histology, whole-genome duplication status, tumor purity, and tumor ploidy of the tumor genome may be extracted from clinical data of the sample, and for the remaining 41 features, the feature extractor 110 examines the surrounding regions of each structural variant candidate. Therefore, for purpose of examination, claim 12 is interpreted to require extracting a single set of the recited features for the structural candidate variants from the tumor genome data. To overcome the rejection, claim 12 can be amended to recite “The method of claim 1, further comprising extracting features of tumor histology,…, and tumor ploidy of a tumor genome,  from the tumor genome data.”, to clarify that this set of features are different than the features extracted for each candidate variant. 
Claim 20 is indefinite for recitation of “…determining whether the region included in the SA tag and region mapped by the alignment tool is…”. Claim 1, from which claim 20 depends, recites “…the SA tag is data that is outputted by an alignment tool that maps each read of a split read...is information indicating another position…in addition to a main position…, the another position of a second region and the main position of a first region…and outputs SA tag including information about the second region”. Thus claim 1 recites that reads with a supplementary alignment tag align to both a first region and a second region. Accordingly, it’s unclear which region “the..region mapped by the alignment tool” is intended to refer to (e.g. the first or second region). As such, the metes and bounds of the claims are unclear. For purpose of examination, limitation is interpreted to mean it is determined whether the second region included in the SA tag is different and not continuous with the first region.
Claim 20 is indefinite for recitation of “…determining the read to be the split read” in line 6 of the claim. There is insufficient antecedent basis for “the split read” in the claim because claim 1, from which claim 20 depends, does not recite “a split read”. Instead, claim 1 recites “…the SA tag is data that is outputted by an alignment tool that maps of read of split reads…”, but does not recite a single split read. Therefore, it’s unclear which split reads claim 20 is referring to, and the metes and bounds of the claims are unclear. To overcome the rejection, the claim can be amended to recite “…determining the read to be a split read”.
Claim 20 is indefinite for recitation of  “…determining a read of the first structural variant candidate has the SA tag, if the read has the SA tag, determining whether the region included in the SA tag and region mapped by the alignment tool is different and not continuous with each other, if the region included in the SA tag and region mapped by the alignment tool is determined to be different and not continuous with each other, determining the read to be the split read…”. Claim 1, from which claim 20 depends recites “the SA tag is data that is outputted by an alignment tool that maps each read of split reads to a reference sequence…the another position of a second region and the main position of a first region are two different positions that are not continuous with each other on the reference sequence…the alignment tool determines that the corresponding read maps to the first region on the reference sequence, and records and outputs the SA tag including information about the second region…”. It’s unclear if the SA tag is outputted for reads determined to be split reads (e.g. reads mapping to two different, not continuous regions), as suggested by independent claim 1, or if an SA tag is outputted for a read prior to determining the read is a split read, as suggested by claim 20. As such, the metes and bounds of the claims are unclear. For purpose of examination, claim 20 is interpreted to require determining whether regions the read maps to are different and not continuous to each other, determining the read is a split read if they are different.

Response to Arguments
Applicant's arguments filed 16 May 2022 regarding 35 U.S.C. 112(b) have been fully considered but they are not persuasive. 
First, Applicant’s remarks do not pertain to the new grounds of rejection under 35 U.S.C. 112(b) of claims 5, 8, and 20, set forth above.
Applicant remarks that claims 1, 5-8, 11, 12, 14, 16, 18, and 19 are amended to address the alleged indefinite expression (Applicant’s remarks at pg. 10, para. 6 to pg. 11, para. 1).
This argument is not persuasive. While, Applicant’s claim amendments overcome the 112(b) rejection of claims 1, 3-11, and 13-19 as indicated above, regarding claim 12, it’s still unclear whether Applicant intends for the recited features to be determined for each structural candidate variant, given the features in claim 1 are specific to each structural candidate variant, or if Applicant intends for the recited features to be determined sample-wide. A suggested amendment to overcome the rejection has been provided above.


Claim Rejections - 35 USC § 101
The rejection of claim 2 under 35 U.S.C. 101 in the Office action mailed 14 Feb. 2022 has been withdrawn in view of the cancellation of this claim received 16 May 2022.
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 and 3-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
The Supreme Court has established a two-step framework for this analysis, wherein a claim does not satisfy § 101 if (1) it is “directed to” a patent-ineligible concept, i.e., a law of nature, natural phenomenon, or abstract idea, and (2), if so, the particular elements of the claim, considered “both individually and as an ordered combination,” do not add enough to “transform the nature of the claim into a patent-eligible application.” Elec. Power Grp., LLC v. Alstom S.A., 830 F.3d 1350, 1353 (Fed. Cir. 2016) (quoting Alice, 134 S. Ct. at 2355). Applicant is also directed to the 2019 Revised Patent Subject Matter Eligibility Guidance published in the Federal Register (84 FR 50) on January 7, 2019.
Step 1: The instantly claimed invention (claims 1, 16, and 18-19 being representative) is directed a method, system and computer readable medium for identifying structural variants. Therefore, the instantly claimed invention falls into one of the four statutory categories. [Step 1: YES]
Step 2A: First it is determined in Prong One whether a claim recites a judicial exception, and if so, then it is determined in in Prong Two if the recited judicial exception is integrated into a practical application of that exception.
Step 2A, Prong 1: Under the MPEP § 2106.04, the Step 2A (Prong 1) analysis requires determining whether a claim recites an abstract idea, law of nature, or natural phenomenon.
Claims 1 and 19-20 recite the following steps which fall under the mathematical concepts and/or mental processes groupings of abstract ideas:
obtaining structural variant candidates identified from whole-genome sequencing data including a pair of tumor genome data and normal tissue genome data;
modifying an estimated position of a first structural variant candidate by determining a position of a breakpoint associated with the first structural variant candidate based on [a] supplementary alignment (SA) tag associated with the first structural variant candidate among the structural variant candidates, wherein the SA tag is data that is outputted by an alignment tool that maps each read of split reads to a reference sequence, the SA tag associated with the first structural variant candidate is information indicating another position on the reference sequence, to which a corresponding read is mapped, in addition to a main position, to which the corresponding read is mapped on the reference sequence, the another position of a second region and the main position of a first region are two different positions that are not continuous with each other on the reference sequence, the first region is at which most sequences of the corresponding read are located, and the second region is at which remainder sequences of the corresponding read are located, and the alignment tool determines that the corresponding read maps to the first region on the reference sequence, and records and outputs the SA tag including information about the second region;
extracting features of each structural variant candidate based on data located within a predetermined range from each structural variant candidate in the whole-genome sequencing data;
labeling each structural variant candidate with classification information based on a list of known structural variants; and
training a machine learning model by the features of each structural variant candidate and the labeled classification information are annotated, which are included in the stored dataset, wherein the machine learning model receives an identification target structural variant candidate and outputs a classification of the identification target structural variant candidate.
Claim 16 recites the following steps which fall under the mathematical concepts and/or mental processes groupings of abstract ideas:
obtaining structural variant candidates from whole-genome sequencing data of a sample;
modifying an estimated position of a first structural variant candidate based on supplementary alignment (SA) tag associated with the first structural variant candidate among the structural variant candidates, wherein the SA tag is data that is outputted by an alignment tool that maps each read of split reads to a reference sequence, the SA tag associated with the first structural variant candidate is information indicating another position on the reference sequence, to which a corresponding read is mapped, in addition to a main position, to which the corresponding read is mapped on the reference sequence, the another position of a second region and the main position of a first region are two different positions that are not continuous with each other on the reference sequence, the first region is at which most sequences of the corresponding read are located, and the second region is at which remainder sequences of the corresponding read are located, and the alignment tool determines that the corresponding read maps to the first region on the reference sequence, and records and outputs the SA tag including information about the second region;
extracting features of each structural variant candidate based on data located within a predetermined range from each structural variant candidate in the whole-genome sequencing data; and
and inputting the extracted features of each structural variant candidate into a trained machine learning model to identify each structural variant candidate as a negative structural variant candidate or a positive structural variant candidate, wherein the machine learning model is trained by using features of structural variant candidates for training obtained from whole-genome sequencing data including a pair of tumor genome data and normal tissue genome data, and classification information of the structural variant candidate for training.
The identified claim limitations falls into the groups of abstract ideas of mathematical concepts and/or mental processes for the following reasons. In this case, the steps of obtaining structural variant candidates from sequencing data involves analyzing the sequencing data to determine a potential structural variant (i.e. determining a region with very high coverage may be a copy number amplification), which can be practically performed in the mind. The step of modifying an estimated position of a first structural variant candidate by determining a position of a breakpoint associated with the first structural variant based on a supplementary alignment (SA) tag associated with the first structural variant candidate involves analyzing reads to determine that a read maps to two different regions in a reference sequence and thus is associated with an SA tag that was previously outputted by an alignment tool, determining that the position at which that read splits is a breakpoint of a structural variant, and estimating the position of a structural variant using the determined breakpoint (e.g. the breakpoint is associated with the edge of the structural variant), which amounts to a mere analysis of data that can be practically performed in the mind. Extracting features from each structural variant candidate involves analyzing the region surrounding the variant to determine, for example, a number of reads in the region, which can be practically performed in the mind. Labelling each variant with classification information based on a list of known structural variants involves performing data comparisons between the variant with the list to determine information associated with the variant. Furthermore, the broadest reasonable interpretation of training a machine learning model using the extracted features and the labelled classification information to output a classification of the structural variant candidate includes training a linear regression classifier to output a classification, which involves inputting numbers into the line regression model, determining an output (e.g. by the addition of the weighted variables), and adjusting parameters in the model to optimize a cost function for the model, which amounts to a mere analysis of data that can be practically performed in the mind aided with pen and paper. Similarly, the step of inputting extracted features of each structural variant into a trained machine learning model can include inputting numbers into a linear regression classifier and performing multiplication and addition to determine the output, which can be practically performed in the mind. That is, other than reciting the above limitations are carried out by a computing device in claims 18 and 19, nothing in the claims precludes the steps from being practically performed in the mind. See MPEP 2106.04(a)(2)
Furthermore, the steps of training a machine learning model by the extracted features of each structural variant candidate and the labeled classification information and inputting the extracted features of each structural variant candidate into a trained machine learning model to identify each structural variant candidate as a negative structural variant candidate or a positive structural variant candidate recite a mathematical concept. As discussed above, the broadest reasonable interpretation of the limitations include embodiments in which the machine learning model is a linear regression classifier, which requires the addition of weighted variables in training and using the linear regression model, such that the limitations amount to a textual equivalent of performing mathematical calculations. See MPEP 2106.04(a)(2) I. Therefore, these limitations further recite a mathematical concept. 
Dependent claims 3-15, 17, and 20 further recite an abstract idea. Dependent claim 3 further recites the mental process of identifying first and second non-adjacent regions to which the first structural variant candidate is mapped. Dependent claim 4 further recites the mental process of determining a position of a breakpoint associated with the first structural variant candidate. Dependent claim 5 further recites the mental process of obtaining first data that has a first length or more in a direction, in which the corresponding read to the first structural variant candidate is located, from a breakpoint associated with the first structural variant candidate. Dependent claim 6 further recites the mental process of obtaining data of an average insert size across both the tumor genome data and the normal tissue data in the whole-genome sequencing data. Dependent claim 7 further recites the mental process of extracting features including at least one of a number of variant-supporting reads, a number of split reads without SA tags, a number of split reads with SA tags, and a number of reads having the same clipped sequences as split reads within normal tissue genome data. Dependent claim 8 further recites the mental process of obtaining second data that has a second length or less in a direction opposite to a first direction which the corresponding read to the first structural variant candidate is located, from a breakpoint associated with the first structural variant candidate. Dependent claim 9 further recites the mental process of obtaining data of 200 base pairs of less in a direction opposite to variant-supporting reads. Dependent claim 10 further recites the mental process of labeling a first structural variant as positive and labeling a second structural variant as negative. Dependent claim 11 further recites the mental process of extracting, for each structural variant candidate, a number of variant-supporting reads, a number of split reads with supplementary alignment tag, a number of split reads without SA tags, mapping quality, read depth change, a number of background noise reads, and a number of samples, in which the same variant is detected among normal tissue genome data of a panel of normal samples. Dependent claim 12 further recites the mental process of extracting, for the tumor genome data, tumor histology, whole-genome duplication status, tumor purity in a sample, and tumor ploidy of a tumor genome. Dependent claim 13 further recites the mental process and mathematical concept of the machine learning model receiving features of the identification target structural variant candidate and outputting a probability value for classifying the identification target structural variant as positive or negative. Dependent claim 14 further recites the mental process and mathematical concept of evaluating classification performance of the machine learning model using validation samples and determining a cutoff value for determining whether a probability value output from the machine learning model indicates positive or negative based on the classification performance evaluation. Dependent claim 15 further recites the mental process of obtaining structural variant candidates output by the structural variant search tool (i.e. analyzing the output of the structural variant search tool). Dependent claim 17 further recites the mental process of generating a list of true structural variants by removing structural variant candidates which are identified as the negative structural variants from among the obtained structural variant candidates. Dependent claim 20 further recites the mental process of determining a read of the first structural variant has the SA tag, determining whether the region in the SA tag and region mapped by the alignment tool are different, determining the read to be a split read if the regions are different, and then modifying the position of a breakpoint based on the position of the split read. Therefore, claims 1 and 3-20 recite an abstract idea.  [Step 2A, Prong 1: YES]
Step 2A: Prong 2: Under the MPEP § 2106.04, the Step 2A, Prong 2 analysis requires identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. This judicial exception is not integrated into a practical application for the following reasons.
Claims 3-14, 16-17, and 20 do not recite any elements in addition to the recited judicial exception.
The additional elements of claim 15 include: 
inputting the whole genome sequencing data into a structural variant search tool (i.e. data input).
The additional elements of claims 1, 18, and 19 include:
a computing device (claim 1);
a computer readable non-transitory storage medium (claim 18); 
a processor and memory (claim 19); and
storing a dataset, in which the features of each structural variant candidate are labeled with the classification information (claims 1 and 18-19).
The addition elements of a computing device, processor, memory, a computer readable non-transitory storage medium, data input, and storing data, are generic computer components and/or processes. The courts have found the use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application. See MPEP 2106.05(f).
Therefore, the additionally recited elements merely invoke computers as a tool to perform an existing process, and, as such, the claims as a whole do no integrate the abstract idea into practical application. Thus, claims 1-19 are directed to an abstract idea. [Step 2A, Prong 2: NO]
Step 2B: In the second step it is determined whether the claimed subject matter includes additional elements that amount to significantly more than the judicial exception. See MPEP § 2106.05.
The claims do not include any additional steps appended to the judicial exception that are sufficient to amount to significantly more than the judicial exception for the following reasons. Claims 2-14 and 16 do not recite any elements in addition to the recited judicial exception.
The additional elements of claim 15 include: 
inputting the whole genome sequencing data into a structural variant search tool (i.e. data input).
The additional elements of claims 1, 18, and 19 include:
a computing device (claim 1);
a computer readable non-transitory storage medium (claim 18); 
a processor and memory (claim 19); and
storing a dataset, in which the features of each structural variant candidate are labeled with the classification information (claims 1 and 18-19).
The addition elements of a computing device, processor, memory, a computer readable non-transitory storage medium, data input, and storing data are conventional computer components and/or processes. The courts have found the use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit).	
Therefore, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception(s). Even when viewed as a combination, the additional elements fail to transform the exception into a patent-eligible application of that exception. Thus, the claims as a whole do not amount to significantly more than the exception itself. [Step 2B: NO] 
Therefore, the instantly rejected claims are not drawn to eligible subject matter as they are directed to an abstract idea without significantly more. For additional guidance, applicant is directed generally to applicant is directed generally to the MPEP § 2106.

Response to Arguments
Applicant's arguments filed 16 May 2022 regarding 35 U.S.C. 101 have been fully considered but they are not persuasive. 
Applicant remarks that independent claims 1, 18, and 19 should not be considered as mathematical concepts and/or mental process groupings of abstract ideas because claim 1 explicitly recites that the claimed subject matter is “a method performed by a computing device for filtering false positive structural variants in genomes data”, and further because claim 1 recites “storing a dataset, in which the features of each structural variant are labeled…” and “training a machine learning model by the features…, which are included in the stored dataset, wherein the machine learning model receives an identification target structural variant candidate and outputs a classification…”, and that these features are neither a mathematical concept or mental process, but a computing device for filtering false positive structural variants(Applicant’s remarks at pg. 12, para. 2-4). Applicant further remarks that because amended independent claim 1 recites the above features which are neither the alleged mathematical concepts nor mental processes groupings of abstract ideas, the pending claims as a whole are not directed to mathematical concepts nor mental processes (Applicant’s remarks at pg. 12, para. 5 to pg. 13, para. 3).
This argument is not persuasive. The courts do not distinguish between claims that recite mental processes performed by humans and claims that recite mental processes performed on a computer. As the Federal Circuit has explained, "[c]ourts have examined claims that required the use of a computer and still found that the underlying, patent-ineligible invention could be performed via pen and paper or in a person’s mind." Versata Dev. Group v. SAP Am., Inc., 793 F.3d 1306, 1335, 115 USPQ2d 1681, 1702 (Fed. Cir. 2015). See also Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307, 1318, 120 USPQ2d 1353, 1360 (Fed. Cir. 2016). Accordingly, simply because claim 1 recites that the method is performed by a computing device does not preclude the claim from reciting an abstract idea. Instead, both the computing device and act of storing information (e.g. the extracted features with labeled classification information) are treated as additional elements analyzed under step 2A, Prong 2 and step 2B. As discussed in the above rejection, the courts have found the use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone). Accordingly, simply adding a computing device to an abstract idea and storing information in a dataset within the computing device are not sufficient to integrate a judicial exception into a practical application or provide significantly more under Step 2A, prong 2 and Step 2B. 

Applicant remarks that the present Application recognized problems of inefficiency and inaccuracy in the field of sequencing technology, as described in para. [0002]-[0004] of the specification, and provides technical solutions to address such problems as described in para. [0005]-[0011], and that the feature of modifying an estimated position of a first structural variant candidate by determining a position of a breakpoint associated with the first structural variant candidate based on an SA tag associated with the first structural variant provides efficiency and accuracy in filtering false positive structural variants in genomes data (Applicant’s remarks at pg. 13, para. 4 to pg. 14, para. 1).
This argument is not persuasive. The judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception. Furthermore, it is important to keep in mind that an improvement in the abstract idea itself (e.g. a recited fundamental economic concept) is not an improvement in technology. See MPEP 2106.05(a). In this case, the alleged improvement in filtering false positive structural variants in genomes data would amount to an improved abstract idea (e.g. false positive identification). Similarly, regarding the alleged improvement to sequencing technology, claim 1 only requires “obtaining structural variant candidates identified from whole-genome sequencing data”, but does not require the additional element of performing sequencing; accordingly, the alleged improvement to sequencing technology is not reflected within or provided by an additional element of the claim. 

Applicant remarks that similar to the claims in DDR Holdings, the amended claims in the present application may provide a method of efficient work assignment scheme, that is not known from the pre-Internet world, and as such, the pending claims recite technical solutions to Internet-centric problems (Applicant’s remarks at pg. 14, para. 2-4).
This argument is not persuasive. A technical explanation for how the invention improves upon technology should be present in the specification, and must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. See MPEP 2106.05(a). In this case, Applicant has not pointed out what the alleged improvement to internet technology is, nor how the invention improves upon such technology. Furthermore, Applicant has not provided a sufficient explanation of the “work assignment scheme” that is not known from the pre-internet world, how the present application provides such a work assignment scheme, and how the work assignment scheme is related to the internet. Because claim 1 only recites “a computing device” for performing the claimed method, but does not require that any of the method steps utilize the internet or that the computing device is connected to the internet, the argument relating to the claims improving internet-centric problems is not persuasive. 

Applicant remarks that the feature of “modifying an estimated position of a first structural variant by determining a position of a breakpoint associated with the first structural variant by based on [a] supplementary alignment (SA) tag…” (including the full limitation), when combined with the other features recited in the pending claims, recite an inventive concept that is significantly more than any alleged abstract idea (Applicant’s remarks at pg. 15, para. 2-3).
This argument is not persuasive. Under Step 2B, the additional elements are evaluated to determine whether they amount to an inventive concept. See MPEP 2106.05 I. In this case, the limitation regarding “modifying an estimate position…” is part of the abstract idea, as discussed in the above rejection, and therefore, is not considered under Step 2B. The addition elements of the claims include a computing device, processor, memory, a computer readable non-transitory storage medium, data input, and storing data, which are conventional computer components and/or processes. As discussed in the above rejection, the courts have found the use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit). Therefore, the claims do not amount to significantly more than the recited judicial exception.

Applicant remarks that the USPTO issues some patents by determining that inventions of the technology fields similar to the subject matter of independent claim 1 as patentable subject matter, and that U.S. Patent No. 10,984,887 B2, 10,878,938 B2, and 10,354,747 B1 are exemplary patents of the similar technology fields as the subject matter of independent claim 1, and more particular that claim 1 of U.S. 10,354,747 B1 recites the features of “determining by providing the generated image to a trained neural network, a likelihood…”, which is using a machine learning model, and thus the subject matter of amended independent claim 1 should also be considered as patentable subject matter (Applicant’s remarks at pg. 15, para. 4 to pg. 16, para. 2).
This argument is not persuasive. First, regarding the feature of  “determining by providing the generated image to a trained neural network, a likelihood…” in U.S. 10,354,747 B1, while a trained neural network would be considered an additional element, the instant claims do not require that the machine learning model is a neural network; therefore, in contrast to U.S. 10,354,747 B1, the machine learning model encompasses models such as linear regression classifiers, and is part of the abstract idea as discussed in the above rejection. Regarding the remaining cited patents, Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the facts between the cited patents and the instant claims are analogous. Furthermore, each Application is considered based on its own merits.

Applicant remarks that dependent claims 3-17 depend from amended claim 1, and that independent claims 18 and 19 recite similar features to amended claim 1, and therefore claims 2-19 are patent-eligible for the same reasons discussed for amended independent claim 1 (Applicant’s remarks at pg. 16, para. 3-4).
This argument is not persuasive for the same reasons discussed above for claim 1.

Claim Rejections - 35 USC § 103
The rejection of claims 1, 3-5, 8-10, 13, and 15-19 under 35 U.S.C. 103 as being unpatentable over Averbuj (Applying machine learning to detect somatic structural variation in bulk whole genome sequencing, 2020, San Diego State University, pg. 1-59; Pub. Date: Aug. 2020) in the Office action mailed 14 Feb. 2022 has been withdrawn in view of claim amendments received 16 May 2022.
The rejection of claim 2 under 35 U.S.C. 103 as being unpatentable over Averbuj (Applying machine learning to detect somatic structural variation in bulk whole genome sequencing, 2020, San Diego State University, pg. 1-59; Pub. Date: Aug. 2020) in the Office action mailed 14 Feb. 2022 has been withdrawn in view of the cancellation of this claim received 16 May 2022.
The rejection of claims 6-7 under 35 U.S.C. 103 as being unpatentable over Averbuj as applied to claim 5 above, and further in view of Chu et al. (GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence reads, 2014, PLOS ONE, pg. 1-22) in the Office action mailed 14 Feb. 2022 has been withdrawn in view of claim amendments received 16 May 2022.
The rejection of claim 11 under 35 U.S.C. 103 as being unpatentable over Averbuj as applied to claim 1 above, and further in view of Kamal-Reid et al. (A classification system for clinical relevance of somatic variants identified in molecular profiling of cancer, 2016, Genetics in Medicine, 18(2), pg. 128-136) in the Office action mailed 14 Feb. 2022 has been withdrawn in view of claim amendments received 16 May 2022.
The rejection of claim 12 under 35 U.S.C. 103 as being unpatentable over Averbuj as applied to claim 1 above, and further in view of Kamal-Reid et al. (A classification system for clinical relevance of somatic variants identified in molecular profiling of cancer, 2016, Genetics in Medicine, 18(2), pg. 128-136) and Cmero et al. (Inferring structural variant cancer cell fraction, 2020, Nature Communications, pg. 1-15; Pub. Date: 05 Feb. 2020) in the Office action mailed 14 Feb. 2022 has been withdrawn in view of claim amendments received 16 May 2022.
The rejection of claim 14 under 35 U.S.C. 103 as being unpatentable over Averbuj as applied to claim 1 above, and further in view of Brownlee (A Gentle Introduction to Threshold-Moving for Imbalanced Classification, 2020, Machine Learning Mastery, pg. 1-24) in the Office action mailed 14 Feb. 2022 has been withdrawn in view of claim amendments received 16 May 2022.
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-5, 8-11, 13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Averbuj (Applying machine learning to detect somatic structural variation in bulk whole genome sequencing, 2020, San Diego State University, pg. 1-59; Pub. Date: Aug. 2020; previously cited) in view of Arthur et al. (Detection of complex structural variation from paired-end sequencing data, 2018, bioRxiv, pg. 1-17; newly cited, as evidenced by BWA (BWA-Manual Reference pages, 2013, pg. 1-9; newly cited). This rejection is newly recited and necessitated by claim amendment.
Regarding claims 1, 18, and 19¸ Averbuj discloses a method for identifying structural variants in genomes (Abstract) comprising the following steps.
Averbuj discloses obtaining structural variants (SV) from whole-genome sequencing data of germline (i.e. normal) genome data and somatic genome data (pg. 9, para. 2; pg. 13, para. 2; pg. 23, para. 2).
Averbuj discloses modifying a position of a first candidate SV by merging overlapping variants determined by using split reads that align to the reference genome in at least two locations (i.e. modifying the position based on reads with a supplementary alignment for the overlapping variants) (pg. 9, para. 2; FIG. 5). Averbuj further discloses such split reads include information indicating a second region on the reference sequence in addition to a first region to which the read maps, wherein the first and second positions are two different positions that are not continuous with each other on the reference sequence (Figure 5, e.g. “(b) SR” indicates split read at two locations on reference), and wherein the two regions correspond to a longer and shorter region (Figure 5, e.g. see Inversion).
 Averbuj further discloses the split reads were determined using the BWA-MEM aligner (i.e. alignment tool) that maps the split read to a reference sequence (Table 2; pg. 9, para. 2).
Averbuj discloses extracting features within a predetermined range from each SV (pg. 9, para. 2; Table 1, features extracted in "Flank", "SV Body", and "Overlap", which define particular regions).
Averbuj discloses labeling the SV candidates with known labels, including labeling with previously identified (i.e. known) SV calls from the human genome structural variation consortium (i.e. based on a list of known structural variants)  (pg. 5, para. 2; pg. 12, para. 1-3, e.g. training data of the SVs are labeled based on known SV calls for a list of structural variants).
Averbuj discloses creating a training dataset comprising the extracted features of the SV candidates and the class labels (i.e. a dataset in which the features of each SV candidate are labeled with classification information (pg. 12, para. 1 to pg. 13, para. 3; pg. 15, para. 1-2).
Averbuj discloses training a machine learning model using the extracted features of the SV candidates and the class labels (pg. 12, para. 1 to pg. 13, para. 3; pg. 15, para. 1-2), wherein the machine learning model takes a structural variant candidate as input outputs a classification for the structural variant candidate (pg. 17, para. 1; Figure 4).
Further regarding claims 1, 18, and 19, Averbuj discloses the method is computer-implemented (pg. 20, para. 3, e.g. run time and memory requirements), which necessarily requires a computing device comprising a processor and memory and a computer-readable non-transitory storage medium for performing the method, and similarly requires that the created training dataset was stored in memory for the machine learning model to have used the training data. Furthermore, broadly providing an automatic or mechanical means to replace a manual activity which accomplished the same result is not sufficient to distinguish over the prior art. See MPEP 2144.04 III.
Regarding claim 3, Averbuj discloses determining the position of the first candidate SV comprises identifying a split read (pg. 9, para. 2), which involves identifying two non-adjacent regions that the read aligns (Fig. 5, e.g. SR aligned to non-adjacent regions).
Regarding claim 4, Averbuj discloses modifying the position of the first structural variant comprises determining a breakpoint associated with the structural variant (pg. 9, para. 2).
Regarding claim 5, Averbuj discloses obtaining features in a region of the structural variant body between the two breakpoints of the SV (i.e. data from a region of a first length or more in a direction from a breakpoint in which variant-supporting reads are located, or a first direction) (Table 1).
Regarding claim 8, Averbuj discloses obtaining features in a region within 1000 base pairs outside of the left or right breakpoints (i.e. from a region of a second length or less in a direction opposite to variant-supporting reads, or a second direction opposite a first direction).
Regarding claim 9, Averbuj discloses the features are obtained in a region that is 1000 base pairs, which necessarily includes obtaining data in a region of 200 base pairs (Table 1).
Regarding claim 10, Averbuj discloses the labeling with classification information comprises labeling a first SV, which includes a position, as a true positive (i.e. positive) and labeling a second SV as a false positive (i.e. negative), wherein the first SV is marked as positive in the list of known structural variants (pg. 13, para. 3 to pg. 14, para. 1; Table 6), and the second SV did not overlap regions of known SVs (i.e. a position of the second SV candidate is not in the list of known structural variants (pg. 13, para. 3 to pg. 14, para. 1, labels categorizing false positive SVs created using filters, including its position overlapping with known SVs).
Regarding claim 11, Averbuj discloses extracting the features comprises extracting for each structural variant candidate, a number of variant-supporting reads (e.g. sf_ratio), mapping quality (e.g. sf_mapq_mean), read depth change (e.g. sv_doc_fc), background noise reads (e.g. a read count in null regions, average chromosome depth) (Table 1, refer to feature types indicated above).
Regarding claim 13, Averbuj discloses the machine learning model takes the features of a structural variant candidate as input and outputs a probability value for classifying the structural variant (pg. 10, para. 1-2; pg. 14, para. 2; Figure 6).
Regarding claims 15, Averbuj discloses inputting the whole genome sequencing data into a search tool, and using the search tool to identify the initial SV predictions (pg. 9, para. 2, e.g. chonk breakpoints makes initial SV predictions; Figure 4).
Regarding claim 20, Averbuj discloses determining a read is a split-read by determining whether a read maps to two different, discontinuous regions (pg. 9, para. 2; pg. 11, para. 2; Figure 5), and using the determined split read to determine a confidence region of the breakpoints of a structural variant  (i.e. modifying the position of a breakpoint associated with the first structural variant candidate based on the position of the split read) (pg. 9, para. 2). However, as discussed above in claim interpretation, the step of determining the read to be a split read, and thus modifying the position of a breakpoint based on the position of the split read, is not required under the broadest reasonable interpretation of the claim. 

Regarding claim 16¸ Averbuj discloses a method for identifying structural variants in genomes (Abstract) comprising the following steps.
Averbuj discloses obtaining structural variants (SV) from whole-genome sequencing data (pg. 9, para. 2; pg. 13, para. 2; pg. 23, para. 2).
Averbuj discloses modifying a position of a first candidate SV by merging overlapping variants determined by using split reads that align to the reference genome in at least two locations (i.e. modifying the position based on reads with a supplementary alignment for the overlapping variants) (pg. 9, para. 2; FIG. 5). Averbuj further discloses such split reads include information indicating a second region on the reference sequence in addition to a first region to which the read maps, wherein the first and second positions are two different positions that are not continuous with each other on the reference sequence (Figure 5, e.g. “(b) SR” indicates split read at two locations on reference), and wherein the two regions correspond to a longer and shorter region (Figure 5, e.g. see Inversion).
Averbuj discloses extracting features based on data within a predetermined range from each SV (pg. 9, para. 2; Table 1, features extracted in "Flank", "SV Body", and "Overlap", which define particular regions).
Averbuj discloses inputting the extracted features of each structural variant into a trained machine learning model to classify each variant as either a somatic structural variant (positive) or not a somatic structural variant (i.e. negative) (pg. 14, para. 1-2; Figure 4 and 6), wherein the machine learning model is trained using a training data set comprising extracted features of SV candidates from whole genome sequencing data, including somatic and germline genome data, and the class labels (i.e. classification information) (pg. 12, para. 1 to pg. 13, para. 3; pg. 15, para. 1-2; Figure 4).
Further regarding claim 16, Averbuj discloses the method is computer-implemented (pg. 20, para. 3, e.g. run time and memory requirements), which necessarily requires a computing device for performing the method. Furthermore, broadly providing an automatic or mechanical means to replace a manual activity which accomplished the same result is not sufficient to distinguish over the prior art. See MPEP 2144.04 III.
Regarding claim 17, Averbuj discloses outputting a file comprising the structural variants classified as somatic structural variants (i.e. a list of true structural variants), such that variants identified as not somatic (i.e. negative structural variants are removed) (pg. 10, para. 2).

Averbuj et al. does not disclose the following limitations:
Regarding claims 1, 16, and 18-19, Averbuj does not disclose that whole-genome sequencing data for training the machine learning model includes tumor genome data. However, as discussed above, Averbuj discloses the whole-genome sequencing data discloses somatic genome data, and that somatic genome data can include tumor genome data (pg. 3, para. 2; pg. 4, para. 2). Averbuj further discloses the method for calling somatic structural variants is high throughput can aid researchers to understand the basis for human disease (pg. 24, para. 3).
It would have been prima facie obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the method shown by Averbuj to have used whole genome sequencing data including tumor and normal tissue genome data, as suggested by Averbuj (pg. 3, para. 2; pg. 4, para. 2). One of ordinary skill in the art would have been motivated to modify the method of Averbuj by using tumor genome data, as suggested by Averbuj, in order to identify somatic structural variants caused by the cancer to aid researchers in understanding the cancer, as shown by Averbuj (pg. 24, para. 3). This modification would have had a reasonable expectation of success given Averbuj discloses cancer is the most well-known example that has been understood to be caused by somatic structural variants (pg. 3, para. 2). 

Further regarding claims 1, 16, and 18-19¸ while Averbuj discloses modifying a position of a first candidate SV by based on split reads that align to the reference genome in a first and second region of different lengths (pg. 9, para. 2; FIG. 5) and that these split reads were determined using the BWA-MEM aligner (i.e. alignment tool) that maps the split read to a reference sequence (Table 2; pg. 9, para. 2), Averbuj does not explicitly disclose that modifying the position is based on a supplementary alignment (SA) tag that is associated with a second region that is shorter than a first region of the split read (e.g. Averbuj doesn’t disclose tagging each portion of a split read). 
Regarding claim 20, Averbuj does not disclose determining a read of the first structural variant has the SA tag.
However, these limitations were known in the art, before the effective filing date of the claimed invention, as shown by Arthur et al., as evidenced by BWA. 
Regarding claims 1, 16, and 18-20, Arthur et al. discloses a method for detecting complex structural variation (Abstract) that comprises determining a breakpoint interval (i.e. modifying a position of a structural variant candidate) based on mapped positions of primary and secondary (i.e. supplementary) alignments of split reads identified with the BWA-MEM aligner (pg. 5, para. 5), also used by Averbuj Furthermore, using such secondary alignments necessarily involves using supplemental alignment tags and determining the split read has a supplemental alignment tag, as evidenced by BWA, which overviews the BWA-Mem aligner used in both Averbuj et al. and Arthur et al., and discloses that BWA-MEM can determine split alignments, and the option -M can be used to flag shorter split hits as secondary (i.e. a supplemental alignment tag associated with a remainder of the sequences of a read), in addition to a primary alignment (pg. 1, Description, pg. 2, para. 6; pg. 3, “-M”). Accordingly, Arthur discloses modifying the position of a structural variant based on a supplemental alignment tag associated with the second region which is shorter than the first region of a primary alignment and determining a split rad has a supplemental alignment tag. 
It would have been further prima facie obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the split reads used in modifying the estimated position of the first structural variant of Averbuj to have used supplemental alignment tags, as shown by Arthur et al. (pg. 5, para. 5), as evidenced by BWA (pg. 1, Description, pg. 2, para. 6; pg. 3, “-M”). One of ordinary skill in the art would have been motivated to combine the method of Averbuj with the method of Arthur et al. and BWA, to use the secondary alignments with supplemental alignment tags to determine the position of a breakpoint of the structural variant, as shown by Arthur et al. (pg. 5, para. 5), given Averbuj also discloses using split-reads to make predictions regarding the position of a structural variant (pg. 9, para. 2). This modification would have had a reasonable expectation of success because both Averbuj and Arthur et al. use the BWA-MEM aligner, which is capable of flagging the short split hits as secondary (i.e. creating supplemental alignment tags), such that the method of Arthur et al. is applicable to the aligned sequencing data of Averbuj. 
Therefore, the invention is prima facie obvious.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Averbuj in view of Arthur et al., as evidenced by BWA, as applied to claim 1 above, and further in view of Cmero et al. (Inferring structural variant cancer cell fraction, 2020, Nature Communications, pg. 1-15; Pub. Date: 05 Feb. 2020; previously cited). This rejection is newly recited and necessitated by claim amendment.
Regarding claim 12, Averbuj in view of Arthur et al., as evidenced by BWA, as applied to claim 1 above, does not disclose the features comprise at least one of tumor histology, whole-genome duplication status, tumor purity, and tumor ploidy from clinical data. However, this limitation was known in the art before the effective filing date of the claimed invention, as shown by Cmero et al.
Regarding claim 12, Cmero et al. discloses a method for analyzing structural variants in whole-genome sequencing data from tumors (Abstract), which includes determining the tumor ploidy and purity of a sample and using the tumor ploidy and purity to adjust the allele frequency of non-supporting reads for a structural variant (pg. 10, col. 1, para. 4). Furthermore, given Cmero et al. discloses extracting ploidy information, this necessarily involves extracting information regarding whether there is a whole-genome duplication (e.g. a tumor ploidy equal to 3, rather than 2). 
It would have been prima facie obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the method made obvious by Averbuj in view of Arthur et al., as evidenced by BWA, as applied to claim 1 above, to have further extracted a feature of tumor histology, as shown by Kamal-Reid et al. (pg. 130, para. 1; Figure 2). One of ordinary skill in the art would have been motivated to combine the method made obvious by Averbuj in view of Arthur et al., as evidenced by BWA with the method of Kamal-Reid et al. in order to obtain additional features that distinguish between a somatic and germline variant, as shown by Kamal-Reid et al. (pg. 130, para. 1; Figure 2), given Averbuj discloses obtaining features for classifying a variant as somatic vs germline (Abstract). This modification would have had a reasonable expectation of success because both Averbuj and Kamal-Reid utilize features of variants for classification. 
It would have been prima facie obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the method made obvious by Averbuj in view of Arthur et al., as evidenced by BWA., to have extracted features of at least one of purity, tumor ploidy, and whole-genome duplication status, as shown by Cmero et al. (pg. 10, col. 1, para. 4). One of  ordinary skill in the art would have been motivated to combine the method made obvious by Averbuj in view of Arthur et al., as evidenced by BWA, with the method of Cmero et al. to have adjusted the allele frequency of non-supporting reads of a structural variant, as shown by Cmero et al. (pg. 10, col. 1, para. 4), given Averbuj discloses extracting features relating to a number of non-supporting alignments (Table 1, e.g. sf_ratio, a ratio of supporting to non-supporting alignments). This modification would have had a reasonable expectation of success given both Averbuj and Cmero involve extracting features from structural variants. Therefore, the invention is prima facie obvious.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Averbuj in view of Arthur et al., as evidenced by BWA, as applied to claim 1 above, and further in view of Brownlee (A Gentle Introduction to Threshold-Moving for Imbalanced Classification, 2020, Machine Learning Mastery, pg. 1-24; previously cited). This rejection is newly recited and necessitated by claim amendment.
Regarding claim 14, Averbuj discloses evaluating the performance of the machine learning model using a validation training set (pg. 15, para. 2).
Regarding claim 14, Averbuj in view of Arthur et al., as evidenced by BWA, as applied to claim 1 above, does not disclose determining a cutoff value for determining whether a probability value output from the machine learning model indicates positive or negative based on the classification performance evaluation result of the machine learning model. However, this limitation was known in the art, before the effective filing date of the claimed invention, as shown by Brownlee.
Regarding claim 14, Brownlee overviews methods for optimizing thresholds for classification when converting probability outputs to a classification (pg. 1, para. 3-5), which includes evaluating machine learning models by determining ROC curves using different threshold values and then selecting an optimal threshold for classification based on the evaluation (pg. 6, para. 1-6; pg. 9, para. 1; pg. 10, para. 1-4). Brownlee further discloses optimizing the threshold for classification finds the optimal balance between false positive and true positive classification rates and improves the performance of a classifier (pg. 1, para. 3; pg. 8, para. 1).
It would have been prima facie obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the method made obvious by Averbuj in view of Arthur et al., as evidenced by BWA,, as applied to claim 1 above, to have determined a cutoff value for determining whether a probability value output from the machine learning model indicates positive or negative based on the classification performance evaluation of the machine learning model, as shown by Brownlee (pg. 1, para. 3-5; pg. 6, para. 1-16; pg. 9, para. 1; pg. 10, para. 1-4). One of ordinary skill in the art would have been motivated to modify the method made obvious by Averbuj in view of Arthur et al., as evidenced by BWA, with the method of Brownlee in order to find the optimal balance between false positive and true positive classification rates and improve the performance of the machine learning model, as shown by Brownlee (pg. 1, para. 3; pg. 8, para. 1). This modification would have had a reasonable expectation of success because both Averbuj and Brownlee discloses using a machine learning model that outputs a probability and uses a threshold for classification and further discloses evaluating the machine learning model using an ROC analysis (Averbuj: Figure 6; Brownlee: pg. 1, para. 2-6). Therefore, the invention is prima facie obvious. 

Response to Arguments
Applicant's arguments filed 16 May 2022 regarding 35 U.S.C. 103 have been fully considered but they are not persuasive. 
Applicant remarks that the combination of Averbuj, Chu, Kamal-Reid, and Brownlee fails to disclose or render obvious the claimed combination of features set forth in amended independent claim 1, including “modifying an estimated position of a first structural variant candidate by determining a position of a breakpoint associated with the first structural variant candidate based on supplementary alignment (SA) tag associated with the first structural variant candidate…, wherein the SA tag is data that is outputted by an alignment tool…, the SA tag..is information indicating another position on the reference sequence…in addition to a main position, to which the corresponding read is mapped….., the another position…and the main position of a first region are two different positions that are not continuous with each other…., and the alignment tool determines that the corresponding read maps to the first region on the reference sequence, and records and outputs the SA tag including information about the second region…” (Applicant’s remarks at pg. 17, para. 2 to pg. 18, para. 1).
This argument is not persuasive because it does not consider the newly cited references Arthur et al. and BWA, which disclose the above newly recited limitation as discussed in the above rejection.

Applicant remarks that each of independent claims 18 and 19 recite features similar to the above features of claim 1, and thus are also allowable, and further remarks that claims 3-17 are allowable in virtue of their dependency on claim 1 (Applicant’s remarks at pg. 18, para. 2 to 4).
This argument is not persuasive for the same reasons discussed above for claim 1.

Conclusion
No claims are allowed.
Claims 6-7 are free of the art.
Claim 6 recites “…wherein the first length is an average insert size across both the tumor genome data and the normal tissue data, included in the whole-genome sequencing data”, and thus requires that the extracting features of each structural variant obtains data in a region that is an average insert size across both the tumor genome data and the normal tissue data or more, in a first direction from a breakpoint. Chu et al. (GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence reads, 2014, PLOS ONE, pg. 1-22; previously cited), discloses a method for classifying structural variants from sequencing data (Abstract), including extracting features by collecting reads (i.e. obtaining data) relevant to a particular indel (i.e. structural variant) by collecting reads falling with a first length or more of a breakpoint associated with the indel (pg. 6, para. 3; pg. 7, para. 1), such that variant supporting reads are located in this region, and further discloses the first length or more is dmax = m + 3v + 2r, where m is the mean insert size, such that the first length is an average insert size of the sequencing data (pg. 6, para. 3). However, Chu et al. does not disclose that this mean insert size is an average insert size across two different types of sequencing data, including sequencing data of a tumor genome and a normal genome, as claimed. Therefore, claim 6 and dependent claim 7 are free of the art. However, it is noted that Applicant’s specification does not provide adequate written description for the above limitation, as discussed in the above 35 U.S.C. 112(a) rejection of the claims.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Inquiries
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KAITLYN L MINCHELLA whose telephone number is (571)272-6485.  The examiner can normally be reached on 7:00 - 4:00 M-Th.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz Skowronek can be reached on (571) 272-9047.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/K.L.M./Examiner, Art Unit 1631                                                                                                                                                                                                        

/OLIVIA M. WISE/Primary Examiner, Art Unit 1631