DETAILED ACTION

Comments
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claims 1, 4-11, and 14-22 are pending and examined in the instant Office action.

Information Disclosure Statement
The IDS of 8/1/2022 has been considered.

Withdrawn Rejections
The rejections under 35 U.S.C. 112(b) are withdrawn in view of amendments filed to the instant set of claims on 5 July 2022.
The prior art rejections are withdrawn in view of arguments on pages 8-14 of the Remarks.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

The following rejection is reiterated:
Claims 1, 4-11, and 14-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to judicial exceptions without significantly more.  According to MPEP Section 2106.03, claims 1, 4-10, and 21 recite methods, and claims 11, 14-20, and 22 recite systems comprising computers.  The claim(s) recite(s) the judicial exceptions of obtaining a training set, obtaining saliency values, computationally producing a modified biological sequence, associating the modified biological sequence with the original biological sequence, generating elements in biological sequences, and adding the modified biological sequences to the original training set.  The claims recite mathematical calculations of probabilities.  The claims recite associating a label with a biological sequence.  Claims 4 and 14 limit the type of machine learning.  Claims 5 and 15 recite determining generator parameters.  Claims 6 and 16 further limit the type of biological sequence.  Claims 7 and 17 recite the mathematics of using a null symbol to represent deleted elements in the modified biological sequence.  Claims 9-10 and 19-20 recite mathematical expressions regarding probabilities in terms of saliency.  This judicial exception is not integrated into a practical application because the fact pattern of the instant set of claims is not analogous to any fact pattern of claims cited in MPEP Section 2106.04 that have been found to be patent eligible. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception (i.e. MPEP Section 2106.05) because the prior art of Tabibiazar et al. [US PGPUB 2008/0300797 A1; on IDS] teaches that using computers to conduct calculations on biological sequences is routine and conventional in the prior art.  In addition, the document of Bernardes et al. [Recent Patents on Biotechnology, volume 7, 2013, pages 122-141] is a review that demonstrates machine learning with proteins is routine and conventional in the prior art

Response to arguments:
Applicant's arguments filed 5 July 2022 have been fully considered but they are not persuasive.
Applicant argues that the amendments to the claims make the claims analogous to the fact pattern in Example 39 of the Guidelines.  This argument is not persuasive because while the core of claim of Example 39 is directed to machine learning, the instant claims only recite actual machine learning in the preamble and conclusion.  The bulk of the claims is drawn to judicial exceptions, and the prior art review article of Bernardes et al. is added to demonstrate that machine learning on proteins is routine and conventional in the prior art.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following rejection is newly applied:
Claim(s) 1, 4-6, 11, and 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cope [US PGPUB 2014/0214391 A1; on IDS] in view of Cristofaro et al. [Biochemistry, volume 41, 2002, pages 10968-10975].
Claim 1 recites a computer-implemented method for training a supervised machine learning model with an expanded training set using biological sequences.  The method comprises obtaining an original training set.  The original training set comprises original biological sequences.  The method comprises obtaining saliency values corresponding to elements in each of the biological sequences.  A saliency value of the saliency values corresponds to an element of the one or more elements indicating a degree of pertinency of the element to the biological function. The method requires that for each of the one or more original biological sequences, producing modified biological sequences.  The method comprises associating the modified biological sequences with the original biological sequence.  Each of the one or more original sequences has an associated label.  Each of the one or more modified biological sequences is associated with the same label as the associated original biological sequence.  The method comprises generating elements in each of the modified biological sequences using elements in the associated original biological sequence and corresponding saliency values.  A probability that an element in each of the one or more modified biological sequences is the same as the elements in the associated original biological sequence is higher for larger corresponding saliency values.  A biological function of the one or more modified sequences is maintained relative to the associated biological sequence.  The method comprises adding the modified biological sequences for each of the one or more original biological sequences to the original training set to for an expanded training set.  The method comprises training the supervised machine learning model using the expanded training set.  For the purpose of this prior art rejection, saliency is interpreted to be equivalent to activity.
Claim 11 is drawn to similar subject matter as claim 1, except claim 11 is drawn to a system.
The document of Cope studies methods, systems, and software for identifying biomolecules with interacting components [title].  Figure 8 of Cope illustrates the computer limitations of the claims.  Paragraphs 5-6 and 112 of Cope teach techniques for generating and using sequence-activity models, such as in directed evolution of protein libraries to identify proteins with desired biological activities and properties, and that avoid overfitting.  Paragraph 8 of Cope teaches a method for identifying amino acid residues to be modified in a protein variant library.  Paragraphs 8, 114, and 240 of Cope teach a plurality of biological molecules constituting a training set of a protein variant library with an initial training set.  Paragraph 5 of Cope teaches sequence-activity models that describe activities, characteristics, or properties of biological molecules as functions of various biological sequences.  Paragraph 114 of Cope teaches variant peptides produced during a round of directed evolution are assayed for activity.  Paragraphs 115 and 120 of Cope teach that observations are data provided in a training set for a model.  Figure 1 and paragraph 124 of Cope teach that data from the training set typically include complete or partial residue sequence information together with an activity value for each protein in the library.  
Paragraphs 120 and 175 of Cope teach variant libraries with a plurality of variants.  Paragraphs 120 and 175 of Cope teach next generation sequencing tools making it possible to include low activity and high activity variants in a training set.  Paragraph 127 of Cope teaches a base sequence activity model.  Paragraph 116 of Cope teaches sequence-activity models now generated using a training set that includes not only top performing peptides from a round, but also some peptides that would not be of interest for further rounds of evolution; the sequence-activity information could be applied to produce a more robust sequence-activity model.  Paragraph 118 of Cope teaches an iterative loop of modeling and exploring.  Paragraph 120 of Cope teaches inclusion of variants having a range of activity levels resulting in production of models that perform better and/or are better at predicting activity over a wider range of sequence and activity space.  Paragraph 287 of Cope teaches that the sequence variants of round n+1 provide an expanded training set for new models.
Paragraphs 85-87 of Cope teach types of probabilities to determine likely corresponding activities of biological sequences.
Paragraphs 79-85, 128, and 197-225 of Cope teach machine learning methods comprising neural networks, random forests, SVMs, and linear regression models.
Cope does not teach labels associated with the biological sequences.
The document of Cristofaro et al. studies mutations in the ribonuclease H active site of HIV-RT reveal a role for the site in stabilizing enzyme-primer-template binding [title].  The function and phenotypic label associated with RNase H is RNase H activity [abstract].  In other words, the higher the activity of the RNase H, the greater the probability that RNase H will bind to the target site.  The abstract of Cristoforo et al. teaches that mutating the RNase H protein with site directed mutagenesis results in mutated proteins with the same label as the wild type RNase H (i.e. the ability to bind to the target site), but with a greater activity (i.e. saliency) than the wild type protein.

With regard to claims 4 and 14, paragraphs 79-85, 128, and 197-225 of Cope teach machine learning methods comprising neural networks, random forests, SVMs, and linear regression models.

With regard to claims 5 and 15, paragraphs 165 and 235-236 of Cope teach defining the number of factors (i.e. variable positions), the number of levels (i.e. choices at each position), and the number of experiments to run to provide an output matrix.

With regard to claims 6 and 16, paragraph 113 of Cope teaches that the algorithms are applicable to both nucleic acid sequences and proteins.

It would have been obvious to someone of ordinary skill in the art at the time of the effective filing date of the instant application to modify the protein activity studies of Cope by use of the site directed mutagenesis of Cristofaro et al. wherein the motivation would have been that a subset of mutant proteins in Cristofaro et al. have the same label/function of wild type protein, but with greater activity toward the binding site than the original protein [abstract of Cristofaro et al.].

Response to arguments:
Applicant's arguments filed 5 July 2022 have been fully considered but they are not persuasive.
	Applicant argues that Cope does not teach the label limitation recited in the claims.  In response, the document of Cristofaro et al. is added to make obvious this limitation.
	Applicant argues that Cope teaches away from the recited limitations regarding labels because Cope is drawn to a plurality of methods and techniques for sequence-activity models.  However, Cope does not disparage site directed mutagenesis (i.e. the technique in Cristofaro et al.).  There would have been a reasonable expectation of success in combining Cope and Cristofaro et al. because the techniques of Cope are robust and generally application to sequence-activity models, including the site directed mutagenesis study of Cristofaro et al.
	While Cope teaches assaying, the site directed mutagenesis study of Cristofaro assays mutants for activity toward the target.

Related Prior Art
The prior art of Tabibiazar et al. [US PGPUB 2008/0300797 A1; on IDS] teaches biomarkers for diagnosis and monitoring of artherosclerotic cardiovascular disease.  Paragraphs 239-253 of Tabibiazar et al. teach analogous manipulations of training sets of biological data.
The document of Fox [US PGPUB 2005/0084907 A1; on IDS] teaches structure to function relationship statistical and probability analysis of functional biomolecules.

E-mail Communications Authorization
Per updated USPTO Internet usage policies, Applicant and/or applicant’s representative is encouraged to authorize the USPTO examiner to discuss any subject matter concerning the above application via Internet e-mail communications.  See MPEP 502.03. To approve such communications, Applicant must provide written authorization for e-mail communication by submitting the following statement via EFS-Web (using PTO/SB/439) or Central Fax (571-273-8300):
Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with the undersigned and practitioners in accordance with 37 CFR 1.33 and 37 CFR 1.34 concerning any subject matter of this application by video conferencing, instant messaging, or electronic mail. I understand that a copy of these communications will be made of record in the application file.

Written authorizations submitted to the Examiner via e-mail are NOT proper. Written authorizations must be submitted via EFS-Web (using PTO/SB/439) or Central Fax (571-273-8300). A paper copy of e-mail correspondence will be placed in the patent application when appropriate. E-mails from the USPTO are for the sole use of the intended recipient, and may contain information subject to the confidentiality requirement set forth in 35 USC § 122. See also MPEP 502.03.

Conclusion
No claim is allowed.
Claims 7-10 and 17-22 are free of the prior art because the prior art does not teach or suggest the mathematical manipulations recited in the claims.  The prior art also does not teach that saliency value are determined from conservation across two different species and allele frequency in a human population of at least two humans.
	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to Russell Negin, whose telephone number is (571) 272-1083.  This Examiner can normally be reached from Monday through Thursday from 9:00 am to 5 pm and variable hours on Fridays.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s Supervisor, Karlheinz Skowronek, Supervisory Patent Examiner, can be reached at (571) 272-9047.
	Information regarding the status of the application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only. 

/RUSSELL S NEGIN/Primary Examiner, Art Unit 1631                                                                                                                                                                                                        28 August 2022