Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
The application claims the benefit of U.S. Provisional Patent Application 
62/632,169 filed on February 19, 2018, which is incorporated herein by reference. This Application is related to PCT/US2019/018221, filed on February 15, 2019, which is incorporated herein by reference. This examination is conducted based on the priority date of February 19, 2018.

Specification
The listing of references in the specification is not a proper information disclosure statement.  37 CFR 1.98(b) requires a list of all patents, publications, or other information submitted for consideration by the Office, and MPEP § 609.04(a) states, "the list may not be incorporated into the specification but must be submitted in a separate paper."  Therefore, unless the references have been cited by the examiner on form PTO-892, they have not been considered.

Claim Objections
Claim 11, 14, 19 are objected to because of the following informalities:  “Energy” as one category of protein functional data repeated eight times in claim 11, 14, 19.  “IC50” appeared two times in claim 11, 14, 19. “EC50” appeared two times in claim 11, 14, 19. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 11, 14, 16, 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 11, 14, 19 recites the "concentration" in line 9.   It is not clear what kind “concentration”  here. Similarly, the claim limitation “Energy” (in line 9) and “Activity” (in line 2) are not clear what kind energy or what kind activity the inventor means here.
Claim 16 recites the limitation "derived value" in line 3.  It is not clear what “derived value” means here.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-19 are rejected under 35 U.S.C. 101 because the claimed inventions are directed to non-statutory subject matter. 
 “claims directed to nothing more than abstract ideas (such as a mathematical formula or equation), natural phenomena, and laws of nature are not eligible for patent protection”. (MPEP 2106.04 § 1).  Abstract ideas include mathematical concepts, certain methods of organizing human activity, and mental processes (MPEP 2106.04(a)(2)). The claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than the abstract idea of protein engineering through machine learning. 

Mathematical concepts recited in the claims include:
“encoding an input tensor comprising the amino acid sequences of the plurality of full length mutant protein sequences from the Al training set; encoding an output tensor comprising of one or more of the plurality of characteristic data associated with the plurality of full length mutant protein sequences from the Al training set” (claim 1);
 “encoding individual amino acid characteristics, partial sequences characteristics, or local behavior characteristics” (claim 2);

Mental processes recited in the claims include:
“generating an Al training set comprising one or more of the full length mutant protein sequences from the plurality of full length mutant protein sequences in the database” (claim 1);
 “generating a machine learning model using a machine learning framework configured to input the input tensor and the output tensor, and to generate the machine learning model” (claim 1);
 “receiving a protein identifier and protein functional data” (claim 8);
“matching the identifier to one or more full length mutant protein sequences stored in the database” (claim 8);
“creating the Al training set with the matched full length mutant protein sequences” (claim 8);
“generating a plurality of synthetic sequences” (claim 8);
“applying the machine learning model to the plurality of synthetic sequences to generate predicted protein functional data for each synthetic sequence” (claim 8);
“outputting one or more of the synthetic sequences and associated predicted protein functional data” (claim 8); 
“generating a subset of synthetic sequences in which the predicted protein functional data is within a predetermined range of the received protein functional data” (claim 9);
“comparing the full length protein sequence of the protein identifier to full length mutant protein sequences in the database and returning a match when the sequences are at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or more than 99% similar” (claim 13);
“comparing the full length protein sequence of the protein identifier to the full length mutant protein sequences in the database and returning a match when the sequences are at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or more than 99% similar” (claim 18);

Hence, the claims explicitly recite numerous elements that, individually and in combination, constitute abstract ideas. The claims must therefore be examined further to determine whether they integrate that abstract ideas into practical application (MPEP 2106.04(d)). (Step 2A Prong One: Yes).

In addition, the claim
Similarly, claim 1 recites  “a database”. The database recited is a generic database that is necessary for data storage. Therefore the database recited in claim 1 do not integrate that abstract idea into a practical application (see MPEP 2106.04(d) § 1; and MPEP 2106.05(f)). (Step 2A Prong Two: No).

None of the dependent claims  (of the independent claim 1, 15) recites any additional non-abstract elements; they are all directed to further aspects of the information being analyzed, the manner in which that analysis is performed, or the mathematical operations performed on the information. 
Because the claims recite an abstract idea, and do not integrate that abstract idea into a practical application, the claims are directed to that abstract idea. Claims that are directed to abstract ideas must be examined further to determine whether the additional elements amount to significantly more than the judicial exception. Claims that are directed to abstract ideas and that raise a concern of preemption of those abstract ideas must be examined to determine what elements, if any, they recite besides the abstract idea, and whether these additional elements constitute inventive concepts that are sufficient to render the claims significantly more than the abstract idea (MPEP 2106.05).
As explained above, the mere instructions to implement the abstract idea using a computer are, when considered individually, insufficient to constitute an inventive concept that would render the claims significantly more than an abstract idea (see MPEP 2106.05(f)). 
As explained above,  merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application (see MPEP 2106.05(h)). (Step 2B: No).

When the claims are considered as a whole, they do not integrate the abstract idea into a practical application; they do not confine the use of the abstract idea to a particular technology; they do not solve a problem rooted in or arising from the use of a particular technology; they do not improve a technology by allowing the technology to perform a function that it previously was not capable of performing; and they do not provide any limitations beyond generally linking the use of the abstract idea to a broad technological environment (i.e. computerized analysis of biological data). See MPEP 2106.05(a) and 2106.05(h). 
For these reasons, the claims, when the limitations are considered individually and as a whole, are directed to an abstract idea and lack an inventive concept. Hence, the claimed invention does not constitute significantly more than the abstract idea, so the claims are rejected under 35 USC § 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-7, 9-10, 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Szalkai (“Near perfect protein multi-label classification with deep neural Networks”, Methods, Volume 132, January 1 2018, Pages 50–56), further in view of Capriotti (“A neural-network-based method for predicting protein stability changes upon single point mutations”, Bioinformatics, Volume 20, Issue suppl_1, 4 August 2004, Pages i63–i68).

Claim 1 is directed to a computerized system for engineering proteins based on mutational comprising: 
a storage repository comprising: a database comprising: 
a plurality of full length mutant protein sequences, each full length mutant protein sequence comprising a string representing an amino acid sequence; and 
a plurality of characteristic data sets, wherein each characteristic data set has an associated full length mutant protein sequence from the plurality of full length mutant protein sequences and wherein the characteristic data set includes data from assays done with a protein of the associated full length mutant protein sequence; 
an Al Platform comprising computer executable instructions for execution by the processor, the computer executable instructions performing steps comprising: 
generating an Al training set comprising one or more of the full length mutant protein sequences from the plurality of full length mutant protein sequences in the database; 
encoding an input tensor comprising the amino acid sequences of the plurality of full length mutant protein sequences from the Al training set; 
encoding an output tensor comprising of one or more of the plurality of characteristic data associated with the plurality of full length mutant protein sequences from the Al training set; and 
generating a machine learning model using a machine learning framework configured to input the input tensor and the output tensor, and to generate the machine learning model.
With respect to claim 1, Szalkai discloses a computerized system executing a neural network system for protein engineering comprising:
a. a storage repository comprising: 
a). database of full length peptide sequences (“We have applied the SwissProt subset of the UniProt protein database, acquired from http://uniprot.org as starting point (using the query ‘‘goa: (⁄) AND reviewed:yes”), containing 526,526 sequences having Gene Ontology IDs at the date of download of 15 February 2017. The sequences were downloaded along with their assigned UniProt families. (page 52, col 2, last paragraph)
b) a plurality of characteristic datasets (“We encoded each amino acid as a 26-
dimensional vector, where the first 20 components comprised a one-hot vector (all components zero except the one uniquely identifying the amino acid in question), while the other 6 components encoded various properties of the amino acids: charge (_1 or 0.1 in the case of Histidine which is positive about 10% of the time and neutral 90% of the time), hydrophobicity, and the binary attributes isPolar, isAromatic, hasHydroxyl and hasSulfur. (page 52, col 2, last paragraph).
b. an AI system with computer executable instructions:
Generating AI training set comprising full length protein sequence (“This set was shuffled and then divided into training and test sets using the bash commands head _5000 and tail _n +5001. Since the data had headers, the test set contained 4,999 protein sequences, and the training set had the rest (521,527 sequences). Szalkai is silent on mutant proteins.
Encoding the input data as 3D arrays (“The input sequences were encoded as two arrays: one 3-dimensional array inputSeq with dimensions [batch_size, max_length, dims] and another array inputSeqLen encoding the length of the individual sequences with dimension [batch_size]. Here batch_size means the number of sequences in a minibatch and was set to 32. max_length was the maximum allowed length of a sequence: sequences longer than this were omitted in the training phase and cropped to the first max_length amino acids in the testing phase”. (page 52 , col 2, last paragraph line 1-9)). An tensor is a multi-dimensional array.
Encoding the output tensor (“The deep neural network had a primarily convolutional architecture with 1D spatial pyramid pooling and fully connected layers at the end. The architecture is shown in Table 1. The network had 6 one-dimensional convolution layers with kernel sizes [6,6,5,5,5,5] and depths (filter counts) [128,128,256,256, 512,512], with PReLU (parametric rectified linear unit) activation. We used max pooling with kernel size and stride 2 after each convolutional layer, except the first one. Max pooling was omitted after the first layer so that the network can conserve details about the fine structure of the protein. Each max pooling layer was followed by a batch normalization layer to help normalize the statistics of the heatmaps. after SPP, the network state could be represented as an array of shape [batch_size, 21, 512]. The output of the spatial pyramid pooling layer was fed into a fully-connected layer with 1024 units and PReLU activation, followed by a dropout layer with p ¼ 0:5 to avoid overfitting, and a batch normalization layer to normalize the mean and standard deviation. Then a second fully connected layer with sigmoid activation assigned numerical values (likelihoods) between 0 and 1 for each class, yielding the output array y with shape [batch_size, n_classes]. Note that softmax activation cannot be used because the network had to perform a multi-label classification task”. (page 53, col 2, last paragraph, page 53, col 1, first 2 paragraphs).
Generating a machine learning model (The deep neural network had a primarily convolutional architecture with 1D spatial pyramid pooling and fully connected layers at the end. The architecture is shown in Table 1. The network had 6 one-dimensional convolution layers with kernel sizes [6,6,5,5,5,5] and depths (filter counts) [128,128,256,256, 512,512], with PReLU (parametric rectified linear unit) activation. We used max pooling with kernel size and stride 2 after each convolutional layer, except the first one. Max pooling was omitted after the first layer so that the network can conserve details about the fine structure of the protein. Each max pooling layer was followed by a batch normalization layer to help normalize the statistics of the heatmaps”. (page 53, col 1, last paragraph)).

Szalkai discloses a computerized system executing a neural network system for protein engineering but Szalkai is silent on using mutant proteins (more specifically concerned with claim 1/b/a) ) for model training. With respect to claim 1 b a), Capriotti teaches using the mutant proteins for a neural network-based machine learning study. (“Our dataset is derived from the current release (July 2003) of the Thermodynamic Database for Proteins and Mutants 
[ProTherm by Gromiha et al. (2000)]. We considered two datasets: the first for training/testing our neural network system (S1615), and the second (a subset of the first), to be used in a testing phase with cross-validation procedure for comparison with other available predictors, considering mutations only at physiological conditions (S388)” (page i64, col 1, 1st  paragraph in section “System and Methods/the protein database”).

With respect to claim 2, Szalkai teaches encoding individual amino acid (“We encoded each amino acid as a 26-dimensional vector, where the first 20 components comprised a one-hot vector (all components zero except the one uniquely identifying the amino acid in question), while the other 6 components encoded various properties of the amino acids: charge (±1 or 0.1 in the case of Histidine which is positive about 10% of the time and neutral 90% of the time), hydrophobicity, and the binary attributes
isPolar, isAromatic, hasHydroxyl and hasSulfur”. (page 52 , col 2, last paragraph line 13-20)).   

With respect to claim 3, Szalkai teaches the data from assay comprise experimental assay type, numeric value and unit (“amino acids: charge (±1 or 0.1 in the case of Histidine which is positive about 10% of the time and neutral 90% of the time) (page 52 , col 2, last paragraph line 17-19))

With respect to claim 4, Szalkai teaches the data sets comprise protein structure data (“hydrophobicity, and the binary attributes isPolar, isAromatic, hasHydroxyl and hasSulfur”. (page 52 , col 2, last paragraph line 19-20)).   

With respect to claim 5, Szalkai teaches the input tensor depends on the protein structure data, as discussed above regarding claim 4.

With respect to claim 6, Szalkai teaches the input tensors comprises one or more of charge, hydrophobicity, and volume associated with amino acids in the amino acid sequences of the plurality of full length protein sequences from the Al training set, as discussed above regarding claim 4. Szalkai is silent in using mutant protein, Capriotti teaches using the mutant proteins in the training set, as discussed above in claim 1.

With respect to claim 7, Szalkai teaches the neural network for machine learning, as discussed above regarding claim 1.
With respect to claim 9, Szalkai teaches using the testing set as discussed above regarding claim 8, which is equivalent to the claim limitation “generating a subset of synthetic sequences” and apply the machine learning model to predict the function for the testing dataset. It is well known in the art to split the input data into training set and testing set randomly, so the predicted protein functional data is within a predetermined range of the received protein functional data. 
With respect to claim 10, Szalkai teaches using the testing set as discussed above regarding claim 8, which is equivalent to the claim limitation “generating a subset of synthetic sequences.

With respect to claim 12, Szalkai teaches protein identifier as discussed above regarding claim 8.

With respect to claim 13, Szalkai teaches alignment-based similarity search (“One possible approach is the sequence alignment-based similarity search between the input residue sequence x and a properly chosen and functionally annotated reference sequence database D. For the sequence alignment one may use the exact Smith-Waterman algorithm, or the popular BLAST or its clones, or a more advanced, hidden Markov-model based HMMER search”. (page 50, col 2, paragraph 3 line 1-7). Which, can return a match in the range 20%-100%.


It would have been a Prima Facie Case of Obviousness “teaching-to-modifying” (“Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention” (MPEP § 2143 I.G.)) to one of ordinary skill in art at the time of the invention to modify Szalkai’s machine learning pipeline, which use normal proteins for training purpose, with Capriotti’s teaching to incorporate the mutant proteins into the training data, and expect to be successful. Because both Szalkai and Capriotti are about predict unlabeled protein function/stability using neural network-based machine learning methodology and function-known protein datasets for training, and they both succeeded.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Szalkai and Capriotti as applied above on claims 1-7, and in further view of Leinonen (“UniProt archive”, Bioinformatics, Volume 20, Issue 17, 22 November 2004, Pages 3236–3237)
With respect to claim 8, Szalkai teaches acquiring protein data from the UniProt database as discussed above regarding claim 1. Szalkai does not mention the protein identifier explicitly. Leinonen teaches the unique protein ID is always available with the protein sequence (“each unique sequence is stored only once and assigned a UniParc identifier. These identifiers are stable and, once created, are never deleted or reassigned. Consequently, UniParc identifiers can be used to uniquely identify protein sequences in any protein database. The format of UniParc identifiers is UPI followed by 10 hexadecimal numbers, e.g. UPI000000000A”. page 3236, col 2, para 1). Szalkai teaches creating AI training set using full length protein sequence as discussed above regarding claim 1. Szalkai teaches using the testing set (“This set was shuffled and then divided into training and test sets using the bash commands head _5000 and tail _n +5001”. (page 52, col 1, last 3 lines)), which is equivalent to the claim limitation “generating a plurality of synthetic sequence” and apply the machine learning model to predict the function for the testing dataset.  Szalkai also teaches output the prediction results for a variety of validation and evaluation (Section “Result and discussion”).

It would have been a Prima Facie Case of Obviousness “teaching-to-modifying” (“Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention” (MPEP § 2143 I.G.)) to one of ordinary skill in art at the time of the invention to modify Szalkai’s machine learning pipeline, which use normal proteins for training purpose, with Capriotti’s teaching to incorporate the mutant proteins into the training data, plus Leinonen teaching of unique protein identifier, and expect to be successful. Because both Szalkai, Capriotti are about predict unlabeled protein function/stability using neural network-based machine learning methodology and function-known protein datasets for training, Leinonen’s unique protein identifier enables searching/indexing the features associated with the protein easy and convenient, and they all succeeded.

Claims 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Szalkai, Capriotti and Leinonen. 

Claim 15 is directed to a computerized method of engineering proteins comprising:
storing a plurality of full length mutant protein sequences, each full length mutant protein sequence comprising a string representing an amino acid sequence; 
storing a plurality of characteristic data sets, wherein each characteristic data set has an associated full length mutant protein sequence from the plurality of full length mutant protein sequences and wherein the characteristic data set includes data from assays done with a protein of the associated full length mutant protein sequence; 
receiving a protein identifier and protein functional data; 
matching the protein identifier to one or more full length mutant protein sequences stored in the database; 
generating an Al training set with the matching full length mutant protein sequences; 
training a machine learning model using the Al training dataset; 
employing the machine learning model to design one or more synthetic protein sequences and calculate each synthetic proteins predicted functional data; 
41DMSLIBRARY01\33847338 v1outputting the one or more synthetic protein sequences and predicted functional data.

With respect to claim 15, Szalkai discloses a computerized system executing a neural network system for protein engineering comprising:
database of full length peptide sequences (“We have applied the SwissProt subset of the UniProt protein database, acquired from http://uniprot.org as starting point (using the query ‘‘goa: (⁄) AND reviewed:yes”), containing 526,526 sequences having Gene Ontology IDs at the date of download of 15 February 2017. The sequences were downloaded along with their assigned UniProt families. (page 52, col 2, last paragraph)
a plurality of characteristic datasets (“We encoded each amino acid as a 26-
dimensional vector, where the first 20 components comprised a one-hot vector (all components zero except the one uniquely identifying the amino acid in question), while the other 6 components encoded various properties of the amino acids: charge (_1 or 0.1 in the case of Histidine which is positive about 10% of the time and neutral 90% of the time), hydrophobicity, and the binary attributes isPolar, isAromatic, hasHydroxyl and hasSulfur. (page 52, col 2, last paragraph).
–
--
Generating AI training set comprising full length protein sequence (“This set was shuffled and then divided into training and test sets using the bash commands head _5000 and tail _n +5001. Since the data had headers, the test set contained 4,999 protein sequences, and the training set had the rest (521,527 sequences).  Generating a machine learning model (The deep neural network had a primarily convolutional architecture with 1D spatial pyramid pooling and fully connected layers at the end. The architecture is shown in Table 1. The network had 6 one-dimensional convolution layers with kernel sizes [6,6,5,5,5,5] and depths (filter counts) [128,128,256,256, 512,512], with PReLU (parametric rectified linear unit) activation. We used max pooling with kernel size and stride 2 after each convolutional layer, except the first one. Max pooling was omitted after the first layer so that the network can conserve details about the fine structure of the protein. Each max pooling layer was followed by a batch normalization layer to help normalize the statistics of the heatmaps”. (page 53, col 1, last paragraph)). Szalkai is silent on mutant proteins.
Encoding the input data as 3D arrays (“The input sequences were encoded as two arrays: one 3-dimensional array inputSeq with dimensions [batch_size, max_length, dims] and another array inputSeqLen encoding the length of the individual sequences with dimension [batch_size]. Here batch_size means the number of sequences in a minibatch and was set to 32. max_length was the maximum allowed length of a sequence: sequences longer than this were omitted in the training phase and cropped to the first max_length amino acids in the testing phase”. (page 52 , col 2, last paragraph line 1-9)). An tensor is a multi-dimensional array.
Encoding the output tensor (“The deep neural network had a primarily convolutional architecture with 1D spatial pyramid pooling and fully connected layers at the end. The architecture is shown in Table 1. The network had 6 one-dimensional convolution layers with kernel sizes [6,6,5,5,5,5] and depths (filter counts) [128,128,256,256, 512,512], with PReLU (parametric rectified linear unit) activation. We used max pooling with kernel size and stride 2 after each convolutional layer, except the first one. Max pooling was omitted after the first layer so that the network can conserve details about the fine structure of the protein. Each max pooling layer was followed by a batch normalization layer to help normalize the statistics of the heatmaps. after SPP, the network state could be represented as an array of shape [batch_size, 21, 512]. The output of the spatial pyramid pooling layer was fed into a fully-connected layer with 1024 units and PReLU activation, followed by a dropout layer with p ¼ 0:5 to avoid overfitting, and a batch normalization layer to normalize the mean and standard deviation. Then a second fully connected layer with sigmoid activation assigned numerical values (likelihoods) between 0 and 1 for each class, yielding the output array y with shape [batch_size, n_classes]. Note that softmax activation cannot be used because the network had to perform a multi-label classification task”. (page 53, col 2, last paragraph, page 53, col 1, first 2 paragraphs). 
Output synthetic proteins and predicted functional data (“From our results and previous work we can conclude that the Gene Ontology functional classification task seems to be harder for artificial neural networks than the UniProt family classification task, probably because the assignment of UniProt families depends heavily on sequence similarity, and thus it is easier to classify proteins into UniProt families instead of functional classes based purely on the amino acid sequence data” , page 54, col 2,  para 9).

Szalkai discloses a computerized system executing a neural network system for protein engineering but Szalkai is silent on using mutant proteins (more specifically concerned with claim 15 a), b) and e) ) for model training. With respect to claim 15 a), b) and e), Capriotti teaches using the mutant proteins for a neural network-based machine learning study. (“Our dataset is derived from the current release (July 2003) of the Thermodynamic Database for Proteins and Mutants [ProTherm by Gromiha et al. (2000)]. We considered two datasets: the first for training/testing our neural network system (S1615), and the second (a subset of the first), to be used in a testing phase with cross-validation procedure for comparison with other available predictors, considering mutations only at physiological conditions (S388)” (page i64, col 1, 1st  paragraph in section “System and Methods/the protein database”).

Szalkai is silent in “receiving a protein identifier and protein functional data” and “matching the protein identifier to one or more full length mutant protein sequences stored in the database”. Szalkai does not mention the protein identifier explicitly. Leinonen teaches the unique protein ID is always available with the protein sequence (“each unique sequence is stored only once and assigned a UniParc identifier. These identifiers are stable and, once created, are never deleted or reassigned. Consequently, UniParc identifiers can be used to uniquely identify protein sequences in any protein database. The format of UniParc identifiers is UPI followed by 10 hexadecimal numbers, e.g. UPI000000000A”. page 3236, col 2, para 1). 

With respect to claim 16, Szalkai teaches the data from assay comprise experimental assay type, numeric value and unit (“amino acids: charge (±1 or 0.1 in the case of Histidine which is positive about 10% of the time and neutral 90% of the time) (page 52 , col 2, last paragraph line 17-19))

With respect to claim 17, Szalkai teaches the neural network for machine learning, as discussed above regarding claim 15.

With respect to claim 18, Szalkai teaches alignment-based similarity search (“One possible approach is the sequence alignment-based similarity search between the input residue sequence x and a properly chosen and functionally annotated reference sequence database D. For the sequence alignment one may use the exact Smith-Waterman algorithm, or the popular BLAST or its clones, or a more advanced, hidden Markov-model based HMMER search”. (page 50, col 2, paragraph 3 line 1-7). Which, can return a match in the range 20%-100%.


It would have been a Prima Facie Case of Obviousness “teaching-to-modifying” (“Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention” (MPEP § 2143 I.G.)) to one of ordinary skill in art at the time of the invention to modify Szalkai’s machine learning pipeline, which use normal proteins for training purpose, with Capriotti’s teaching to incorporate the mutant proteins into the training data, plus Leinonen teaching of unique protein identifier, and expect to be successful. Because both Szalkai, Capriotti are about predict unlabeled protein function/stability using neural network-based machine learning methodology and function-known protein datasets for training, Leinonen’s unique protein identifier enables searching/indexing the features associated with the protein easy and convenient, and they all succeeded.



Conclusion
No claims are allowed.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GUOZHEN LIU whose telephone number is (571)272-0224. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz R Skowronek can be reached on (571)272-9047. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Soren Harward/Primary Examiner, Art Unit 1631                                                                                                                                                                                                        
GUOZHEN . LIU
Examiner
Art Unit 1631