DETAILED ACTION
Claims 1-20 are presented for examination.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings received on 24 August 2020 are accepted.
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because it recites phrases which can be implied. Examiner suggests amending the abstract to recite “Systems and methods for controllable protein generationThe systems and methods use or employ models implemented with transformer architectures developed for language modeling and apply the same to generative modeling for protein engineering.”
Correction is required.  See MPEP § 608.01(b).

The disclosure is objected to because of the following informalities:
The Specification contains several apparent typographical errors:
Specification [0035] “homosapiens”
Specification [0035] “vector. [[p]]Protein generation model 120 then”
Specification p0062] “receives a 
Appropriate correction is required.
Claim Objections
Claim 1, 8, and 15 are objected to because of the following informalities:
Claims 1 and 15 at the end of the selecting step is missing a semicolon “sequences; and”. Compare with claim 8.
Claim 8 recites “a memory configured to store a protein engineering model” and “generating, via a protein engineering model, one or more….” It appears the second recitation may intend to refer to the same protein engineering model. Examiner suggests amending claim 8 to recite “generating, via [[a]] the protein engineering model, one or more….”
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 7 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 7 recites “the language model.” There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 3, 4, 6-8, 10, 11, 13-15, 17, 18, and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 2022/0270711 A1 Feala, et al. [herein “Feala”].
Claim 1 recites “1. A method for quality control in protein engineering.” Feala abstract discloses “for engineering amino acid sequences configured to have specific protein functions or properties.” Engineering amino acid sequences to configure specific proteins is protein engineering.
Feala paragraphs 89-98 further discuss minimization of a loss function which corresponds with a quality control as discussed in more detail within the body of the claim below.
Claim 1 further recites “the method comprising: generating an input sequence composed of a data sequence of amino acids and a set of target protein properties.” Feala paragraph 56 discloses:
receive input data such as a primary amino acid sequence and generating a modified amino acid sequence corresponding to one or more functions or features of the resulting polypeptide or protein defined at least in part by the amino acid sequence. The input data can include additional information such as contact maps of amino acid interactions, tertiary protein structure, or other relevant information relating to the structure of the polypeptide.
An input sequence of an amino acid sequence is and input sequence composed of amino acids. The desired functions or features are a set of target protein properties.
Feala paragraph 66 discloses “to predict the desired function based on a set of labeled training data.” The labeled training data corresponding with the desired function is indications of the set of target protein properties.
Claim 1 further recites “generating, via a protein engineering model, one or more output data sequences of amino acids representing protein variants in response to the input sequence.” Feala paragraph 99 discloses “the last 208 layer is the final layer that outputs the amino acid sequence that is ‘decoded’ from the embedding.” The output amino acid sequence corresponds with an output data sequence of amino acids in response to the input.
The trained encoder-decoder machine learning model corresponds with the protein engineering model. Feala paragraph 125 discloses:
the machine learning method(s) comprise unsupervised machine learning. Unsupervised machine learning includes clustering, autoencoding, variational autoencoding, protein language model (e.g., wherein the model predicts the next amino acid in a sequence when given access to the previous amino acids), and association rules mining.
In particular, an unsupervised machine learning protein language model which has been trained corresponds with the protein engineering model.
Feala paragraph 237 discloses “generated variants” and “designed variants” indicating the resulting engineered proteins represent protein variants.
Claim 1 further recites “selecting a first output data sequence of amino acids with a lowest perplexity value from the one or more output data sequences.” Examiner is interpreting perplexity in light of Specification paragraph 85 (“For example, the perplexity may be a metric for language models, which is the exponentiated cross-entropy loss computed over each token in a dataset.”).
Feala paragraphs 89 discloses “The decoder can be trained to minimize the loss, reside-wise categorical cross-entropy, to reconstruct the sequence which maps to a given embedding (also referred to as reconstruction loss).” See also Feala paragraphs 96-98.The residue-wise cross-entropy loss function is a perplexity value of the amino acid sequence. Minimizing this loss function of cross-entropy is selecting a lowest perplexity value for the output data sequence of the final decoder layer.
Claim 1 further recites “and in response to determining that the selected output data sequence of amino acids yields a fitness value greater than a threshold, outputting the data sequence of amino acids for protein synthesis.” Feala abstract discloses “generate as output an optimized sequence having the desired function or property.” Outputting an optimized sequence is outputting the data sequence of an amino acid. Feala paragraph 27 and claim 90 disclose “synthesizing an improved biopolymer sequence.” 
Feala claim 1 clause (c) discloses “upon reaching a desired level of the function within a particular threshold.” The threshold is a fitness threshold of the respective protein function.
Furthermore, Feala paragraphs 89 discloses “The decoder can be trained to minimize the loss, reside-wise categorical cross-entropy, to reconstruct the sequence which maps to a given embedding (also referred to as reconstruction loss).” Minimizing the cross-entropy is ensuring a certain fitness value. Feala paragraph 26 lines 18-19 disclose “the protein function comprises a degree of protein stability.” The degree of protein stability is a threshold fitness of the corresponding protein.
Claim 3 further recites “3. The method of claim 1, further comprising: evaluating a quality of the selected output data sequence of amino acids by a three-level structure including: assessing a primary sequence similarity, a secondary structure accuracy, and a conformational energy analysis.” Feala paragraph 96 lines 7-9 disclose “Similar x and x' values and/or similar y' and y* values indicate that the decoder is working effectively.” Similar values at sequence locations is an assessment of primary sequence similarity.
Feala paragraph 116 discloses:
Amino acid sequences can be predicted or mapped based on protein stability, which can include various metrics such as, for example, thermostability, oxidative stability, or serum stability. In some embodiments, an encoder is configured to incorporate information relating to one or more structural features such as, for example, secondary structure, tertiary protein structure, quaternary structure, or any combination thereof. Secondary structure can include a designation of whether an amino acid or a sequence of amino acids in a polypeptide is predicted to have an alpha helical structure, a beta sheet structure, or a disordered or loop structure. Tertiary structure can include the location or positioning of amino acids or portions of the polypeptide in three-dimensional space. Quaternary structure can include the location or positioning of multiple polypeptides forming a single protein.
Incorporating secondary and tertiary structure information is assessing secondary and tertiary structure.
Feala paragraph 204 discloses “A ground-truth fitness is defined as the free energy of an amino acid chain with respect to a fixed conformation.” The free energy with respect to a fixed conformation is a conformational energy. Using conformational energy in evaluation of a fitness of an amino acid chain is a conformational energy analysis.
Claim 4 further recites “4. The method of claim 3, wherein the assessment of the primary sequence similarity includes determining a global and pairwise sequence alignment score.” Examiner is interpreting the claim language “global” in light of Specification paragraph 5 (“a local (e.g., secondary) and a global (e.g., tertiary) structure.”).
Feala paragraph 116 discloses:
Amino acid sequences can be predicted or mapped based on protein stability, which can include various metrics such as, for example, thermostability, oxidative stability, or serum stability. In some embodiments, an encoder is configured to incorporate information relating to one or more structural features such as, for example, secondary structure, tertiary protein structure, quaternary structure, or any combination thereof. Secondary structure can include a designation of whether an amino acid or a sequence of amino acids in a polypeptide is predicted to have an alpha helical structure, a beta sheet structure, or a disordered or loop structure. Tertiary structure can include the location or positioning of amino acids or portions of the polypeptide in three-dimensional space. Quaternary structure can include the location or positioning of multiple polypeptides forming a single protein.
Incorporating tertiary structure information is assessing a tertiary (e.g. global) structure.
Feala paragraph 179 disclose “FIG. 9 shows a pairwise amino acid sequence alignment 900 of avGFP against the GED-engineered GFP sequence.” A pairwise amino acid sequence used in validating the engineered protein sequence is assessing the sequence with a pairwise sequence alignment.
Claim 6 further recites “6. The method of claim 3, wherein the assessment of the conformational energy includes performing a Monte Carlo optimization of a conformational energy over a space of amino acid types and rotamers.” Feala paragraphs 223-224 disclose “Greedy Monte Carlo Search Optimization  [0224] The method optimizes objectives 2 and 3 by a greedy monte carlo search algorithm.” Feala paragraph 204 discloses “A ground-truth fitness is defined as the free energy of an amino acid chain with respect to a fixed conformation.” The conformations are the respective rotamers. Thus, considering the conformations regarding the fitness free energy fitness calculation is considering rotamers.
Claim 7 further recites “7. The method of claim 1, wherein each of the one or more output data sequences of amino acids representing protein variants is generated by: forming the input sequence of tokens by prepending the set of target protein properties to the data sequence of amino acids; generating, via the language model, a set of scores indicating conditional distributions of next-token prediction corresponding to the input sequence of tokens.” Feala paragraph 125 discloses:
the machine learning method(s) comprise unsupervised machine learning. Unsupervised machine learning includes clustering, autoencoding, variational autoencoding, protein language model (e.g., wherein the model predicts the next amino acid in a sequence when given access to the previous amino acids), and association rules mining.
The protein language model is a language model. The prediction of a next amino acid in a sequence is prepending a next amino acid to the data sequence of amino acids.
Feala paragraph 101 discloses “a probabilistic biopolymer sequence 390 produced by a decoder. … sequences can be randomly generated by sampling each position according to the amino acid probabilities.” The amino acid probabilities at each position is a scoring indicating a conditional distribution of the next amino acid.
Claim 7 further recites “sequentially determining a constituent amino acid from the data sequence of amino acids based on the set of scores.” Feala paragraph 101 discloses “a probabilistic biopolymer sequence 390 produced by a decoder. … sequences can be randomly generated by sampling each position according to the amino acid probabilities.” Sampling probabilities for each position is sequentially determining respective amino acids based on the probabilities/scoring.
Claim 7 further recites “and forming an output data sequence of amino acids representing a protein from the sequentially determined constituent amino acids.” Feala paragraph 101 discloses “a probabilistic biopolymer sequence 390 produced by a decoder.” Feala paragraph 99 discloses “the last 208 layer is the final layer that outputs the amino acid sequence that is ‘decoded’ from the embedding.” The output amino acid sequence corresponds with an output data sequence of amino acids in response to the input.
Claim 8 recites “8. A system for quality control in protein engineering.” Feala abstract discloses “for engineering amino acid sequences configured to have specific protein functions or properties.” Engineering amino acid sequences to configure specific proteins is protein engineering.
Feala paragraphs 89-98 further discuss minimization of a loss function which corresponds with a quality control as discussed in more detail within the body of the claim below.
Claim 8 further recites “the system comprising: a memory configured to store a protein engineering model; a processor.” Feala paragraph 157 lines 2-3 discloses “operatively coupled to a storage and/or memory device.” Feala paragraph 153 line 3 discloses “system comprises a plurality of processing units.” A processing unit is a processor.
Claim 8 further recites “configured to: generate an input sequence composed of a data sequence of amino acids and a set of target protein properties.” Feala paragraph 56 discloses:
receive input data such as a primary amino acid sequence and generating a modified amino acid sequence corresponding to one or more functions or features of the resulting polypeptide or protein defined at least in part by the amino acid sequence. The input data can include additional information such as contact maps of amino acid interactions, tertiary protein structure, or other relevant information relating to the structure of the polypeptide.
An input sequence of an amino acid sequence is and input sequence composed of amino acids. The desired functions or features are a set of target protein properties.
Feala paragraph 66 discloses “to predict the desired function based on a set of labeled training data.” The labeled training data corresponding with the desired function is indications of the set of target protein properties.
Claim 8 further recites “generating, via a protein engineering model, one or more output data sequences of amino acids representing protein variants in response to the input sequence.” Feala paragraph 99 discloses “the last 208 layer is the final layer that outputs the amino acid sequence that is ‘decoded’ from the embedding.” The output amino acid sequence corresponds with an output data sequence of amino acids in response to the input.
The trained encoder-decoder machine learning model corresponds with the protein engineering model. Feala paragraph 125 discloses:
the machine learning method(s) comprise unsupervised machine learning. Unsupervised machine learning includes clustering, autoencoding, variational autoencoding, protein language model (e.g., wherein the model predicts the next amino acid in a sequence when given access to the previous amino acids), and association rules mining.
In particular, an unsupervised machine learning protein language model which has been trained corresponds with the protein engineering model.
Feala paragraph 237 discloses “generated variants” and “designed variants” indicating the resulting engineered proteins represent protein variants.
Claim 8 further recites “select a first output data sequence of amino acids with a lowest perplexity value from the one or more output data sequences.” Examiner is interpreting perplexity in light of Specification paragraph 85 (“For example, the perplexity may be a metric for language models, which is the exponentiated cross-entropy loss computed over each token in a dataset.”).
Feala paragraphs 89 discloses “The decoder can be trained to minimize the loss, reside-wise categorical cross-entropy, to reconstruct the sequence which maps to a given embedding (also referred to as reconstruction loss).” See also Feala paragraphs 96-98.The residue-wise cross-entropy loss function is a perplexity value of the amino acid sequence. Minimizing this loss function of cross-entropy is selecting a lowest perplexity value for the output data sequence of the final decoder layer.
Claim 8 further recites “and in response to determining that the selected output data sequence of amino acids yields a fitness value greater than a threshold, outputting the data sequence of amino acids for protein synthesis.” Feala abstract discloses “generate as output an optimized sequence having the desired function or property.” Outputting an optimized sequence is outputting the data sequence of an amino acid. Feala paragraph 27 and claim 90 disclose “synthesizing an improved biopolymer sequence.” 
Feala claim 1 clause (c) discloses “upon reaching a desired level of the function within a particular threshold.” The threshold is a fitness threshold of the respective protein function.
Furthermore, Feala paragraphs 89 discloses “The decoder can be trained to minimize the loss, reside-wise categorical cross-entropy, to reconstruct the sequence which maps to a given embedding (also referred to as reconstruction loss).” Minimizing the cross-entropy is ensuring a certain fitness value. Feala paragraph 26 lines 18-19 disclose “the protein function comprises a degree of protein stability.” The degree of protein stability is a threshold fitness of the corresponding protein.
Dependent claims 10, 11, 13, and 14 are substantially similar to claims 3, 4, 6, and 7 above and are rejected for the same reasons.
Claim 15 recites “15. A non-transitory processor-executable storage medium storing processor-executable instructions for quality control in protein engineering.” 
Claim 15 further recites “the processor-executable instructions being executable by a processor.” 
Claim 15 further recites “to perform: generating an input sequence composed of a data sequence of amino acids and a set of target protein properties.” 
Claim 15 further recites “generating, via a protein engineering model, one or more output data sequences of amino acids representing protein variants in response to the input sequence.” 
Claim 15 further recites “selecting a first output data sequence of amino acids with a lowest perplexity value from the one or more output data sequences.” 
Claim 15 further recites “and in response to determining that the selected output data sequence of amino acids yields a fitness value greater than a threshold, outputting the data sequence of amino acids for protein synthesis.” 
Dependent claims 17, 18, and 20 are substantially similar to claims 3, 4, and 6 above and are rejected for the same reasons.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Feala as applied to claims 1, 8, and 15 above, and further in view of US 2021/0193259 A1 Bikard, et al. [herein “Bikard”].
Claim 2 further recites “2. The method of claim 1, further comprising: determining a mean hard accuracy over each token of the input sequence representing each corresponding amino acid error; and determining a mean soft accuracy over each token of the input sequence, wherein the mean soft accuracy penalizes incorrect amino acid predictions.” Feala paragraph 139 lines 6-12 disclose:
These performance metrics include classification accuracy, specificity, sensitivity, positive predictive value, negative predictive value, measured area under the receiver operator curve (AUROC), mean squared error, false discover rate, and Pearson correlation between the predicted and actual values which are determined for a model by testing it against a set of independent cases.
Classification accuracy is a measure of hard accuracy. See also Feala paragraph 181 disclosing “The residue-wise accuracy of the decoder.”
But Feala does not explicitly disclose a soft accuracy; however, in analogous art of generating protein sequences using machine learning techniques, Bikard paragraph 152 teaches:
Amino acids having similar properties (e.g. charge, polarity, or hydrophobicity) are more likely to preserve the activity and folding of the protein when swapped in a sequence. These similarities are usually represented in the form of a BLOSUM matrix
Considering amino acid relatedness for a predicted protein using a BLOSUM matrix is determining a soft accuracy.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feala and Bikard. One having ordinary skill in the art would have found motivation to use a BLOSUM matrix for amino acid relatedness into the system of polypeptide design for the advantageous purpose of considering similar properties of related amino acids. See Bikard paragraph 152.
Dependent claims 9 and 16 are substantially similar to claim 2 above and are rejected for the same reasons.
Dependent Claims 5, 12, and 19
Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Feala as applied to claims 3, 10, and 17 above, and further in view of US 2018/0068053 A1 Barakat, et al. [herein “Barakat”].
Claim 5 further recites “5. The method of claim 3, wherein the assessment of the secondary structure accuracy includes computing a per-residue for predicted secondary structures with a confidence greater than a threshold.” Feala paragraph 116 discloses:
Secondary structure can include a designation of whether an amino acid or a sequence of amino acids in a polypeptide is predicted to have an alpha helical structure, a beta sheet structure, or a disordered or loop structure.
The predicted secondary structure of an amino acid within a polypeptide is a computed residue prediction for secondary structure. Each amino acid is a respective residue.
Feala does not explicitly disclose a confidence threshold on predicted secondary structure; however, in analogous art of modeling proteins, Barakat paragraph 257 teaches “A 90% score for the secondary structure alignment and an iteration threshold of 0.2 Å was employed.” A 90% score is a confidence threshold.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feala and Barakat. One having ordinary skill in the art would have found motivation to use a secondary structure score into the system of polypeptide design for the advantageous purpose of assessing overall structural quality. See Barakat paragraph 257 last sentence.
Dependent claims 12 and 19 are substantially similar to claim 5 above and are rejected for the same reasons.
Conclusion
Prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Costello, Z. & Martin, H. “How to Hallucinate Functional Proteins” arXiv:1903.00458v1 Preprint (4 March 2019)
teaches
BioSeqVAE variational autoencoder.
US 20210174903 A1 Rothberg; Jonathan M. et al.

Enhanced Protein Structure Prediction Using Protein Homolog Discovery and Constrained Distograms
US 20200273541 A1 Costello; Zachary et al.

Unsupervised Protein Sequence Generation; Paragraphs 30-31 teach optimizing with an ELBO loss function based on KL divergence.
Paragraph 37 teaches an unsupervised  variational autoencoder for targeted design of protein function. Paragraph 38 teaches generating enzymes with desired properties. Paragraph 44 teaches using BLAST for sequence homology against the UniProt database.
US 20210174909 A1 Rothberg; Jonathan M. et al.

Generative Machine Learning Models for Predicting Functional Protein Sequences
US 20220122692 A1 Feala; Jacob D. et al.

Machine Learning Guided Polypeptide Analysis
WO 2021041199 A1 ALVAREZ LEONARDO et al.

Predicting Proteins; Paragraphs 140-145 teach a local and global similarity with secondary and 3D structure.


Examiner respectfully requests, in response to this Office action, support is shown for language added to any original claims on amendment and any new claims. Indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s).
When responding to this Office Action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jay B Hann whose telephone number is (571)272-3330. The examiner can normally be reached M-F 10am-7pm EDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rehana Perveen can be reached on (571)272-3676. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Jay Hann/Primary Examiner, Art Unit 2148                                                                                                                                                                                                        21 November 2022