DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Herein, "the previous Office action" refers to the final rejection of 13 Jun 2022.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 13 Sep 2022 has been entered.

Amendments Received
Amendments to the claims were received and entered on 13 Sep 2022.

Status of the Claims
Examined herein: 1–22

Priority
Applicant’s claim under 35 USC § 119(e) for the benefit of prior-filed Provisional Application No. 16/278611 is acknowledged.
In this action, all claims are examined as though they had an effective filing date of 17 Feb 2018.  In future actions, the effective filing date of one or more claims may change, due to amendments to the claims, or further analysis of the disclosure of the priority application.

Withdrawn Rejections
The rejections of claim 22 under 35 USC §§ 112(a) and 112(b) are hereby withdrawn in view of Applicant's amendments.
The rejection of claims 1–12 and 15–22 under 35 USC § 103 over Calimeri, Somasundaram and Vang is hereby withdrawn in view of Applicant's amendments, and persuasive argument that none of these references teaches "the positive simulated polypeptide-MHC-I interaction data samples are generated based on … binding patterns mimicked by the amino acid distribution matrix".  Consequently, the rejection of claims 13 and 14 under 35 USC § 103 over Calimeri, Somasundaram and Vang is also hereby withdrawn.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1–22 are rejected under 35 USC § 101 because the claimed inventions are directed to non-statutory subject matter.
This rejection is maintained from the previous Office action.  Minor revisions have been made to address the newly-presented limitations of claim 1.
"Claims directed to nothing more than abstract ideas (such as a mathematical formula or equation), natural phenomena, and laws of nature are not eligible for patent protection" (MPEP 2106.04 § I).  Abstract ideas include mathematical concepts, and procedures for evaluating, analyzing or organizing information, which are a type of mental process (MPEP 2106.04(a)(2)).  The claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than the abstract idea of "training a generative adversarial network".

Step 1: The Four Categories of Statutory Subject Matter (MPEP 2106.03)
The claims are directed to a method, which is one of the categories of statutory subject matter.

Step 2A, Prong One: Whether the Claims Set Forth or Describe a Judicial Exception(MPEP 2106.04 § II.A.1)
Mathematical concepts recited in the claims include "generating … via a GAN generator, positive simulated polypeptide-MCH-I interaction data samples …"; "training … a convolutional neural network (CNN) by presenting, based on a set of CNN parameters, the positive simulated polypeptide-MCH-I interaction data samples … to the CNN"; and "presenting … the positive real polypeptide-MHC-I interaction data samples and the negative real polypeptide-MHC-I interaction data samples to the CNN to generate prediction scores".
Steps of evaluating, analyzing or organizing information recited in the claims include "determining … whether the prediction scores are indicative of the GAN being trained or not trained".
Hence, the claims explicitly recite numerous elements that, individually and in combination, constitute abstract ideas.  The claims must therefore be examined further to determine whether they integrate that abstract idea into a practical application (MPEP 2106.04(d)).

Step 2A, Prong Two: Whether the Claims Contain Additional Elements that Integrate the Judicial Exception(s) into a Practical Application (MPEP 2106.04 § II.A.2)
Claim 1 recites an additional element that is not an abstract idea: that the abstract idea steps are performed "by a computing device".  The claims do not describe any specific computational steps by which the computer performs or carries out the abstract idea, nor do they provide any details of how specific structures of the computer are used to implement these functions.  The claims state nothing more than that a generic computer performs the functions that constitute the abstract idea.  Hence, these are mere instructions to apply the abstract idea using a computer, and therefore the claim does not integrate that abstract idea into a practical application (see MPEP 2106.04(d) § I; and MPEP 2106.05(f)).
Claims 7 and 20 recite "outputting the GAN and the CNN".  Outputting the results of the abstract idea is quintessential insignificant extrasolution activity, which does not integrate the abstract idea into a practical application (see MPEP 2106.04(d) § I; and MPEP 2106.05(g)).
Claim 13 recites an additional element that is not an abstract idea: "synthesizing the polypeptide".  The claims do not describe any specific synthetic procedure, nor do they even specify what polypeptide is being synthesized.  This claim element is nothing more than a mere to apply the abstract idea using a generic synthesis procedure.  The claim therefore does not integrate that abstract idea into a practical application (see MPEP 2106.04(d) § I; and MPEP 2106.05(f)).
None of the dependent claims recite any additional non-abstract elements; they are all directed to further aspects of the information being analyzed, the manner in which that analysis is performed, or the mathematical operations performed on the information.
Because the claims recite an abstract idea, and do not integrate that abstract idea into a practical application, the claims are directed to that abstract idea.  Claims that are directed to abstract ideas must be examined further to determine whether the additional elements besides the abstract idea render the claims significantly more than the abstract idea.  Claims that are directed to abstract ideas and that raise a concern of preemption of those abstract ideas must be examined to determine what elements, if any, they recite besides the abstract idea, and whether these additional elements constitute inventive concepts that are sufficient to render the claims significantly more than the abstract idea (MPEP 2106.05).

Step 2B: Whether the Claims Contain Additional Elements that Amount to an Inventive Concept(MPEP 2106.05)
As explained above, the mere instructions to implement the abstract idea using a computer are, when considered individually, insufficient to constitute an inventive concept that would render the claims significantly more than an abstract idea (see MPEP 2106.05(f)).
As explained above, the mere instructions to synthesize a polypeptide are, when considered individually, insufficient to constitute an inventive concept that would render the claims significantly more than an abstract idea (see MPEP 2106.05(f)).
As also explained above, the generic steps of outputting the GAN and CNN resulting from the abstract idea constitute insignificant extrasolution activity, and when considered individually, are insufficient to constitute inventive concepts that would render the claims significantly more than an abstract idea (see MPEP 2106.05(g)).
When the claims are considered as a whole, they do not integrate the abstract idea into a practical application; they do not confine the use of the abstract idea to a particular technology; they do not solve a problem rooted in or arising from the use of a particular technology; they do not improve a technology by allowing the technology to perform a function that it previously was not capable of performing; and they do not provide any limitations beyond generally linking the use of the abstract idea to a broad technological environment (i.e. computerized analysis of biological sequence data; polypeptide synthesis).  See MPEP 2106.05(a) and 2106.05(h).

Conclusion: Claims are Directed to Non-statutory Subject Matter
For these reasons, the claims, when the limitations are considered individually and as a whole, are directed to an abstract idea and lack an inventive concept.  Hence, the claimed invention does not constitute significantly more than the abstract idea, so the claims are rejected under 35 USC § 101 as being directed to non-statutory subject matter.

Response to Arguments - Rejections Under 35 USC § 101
In the reply filed 13 Sep 2022, Applicant asserts that "claim 1 may be based on or involve mathematical concepts … but it does not recite a mathematical concept" (p. 9).
This is not a reasonable interpretation of the claimed subject matter.  As explained above, steps such as "generating … via a GAN generator" synthetic data, and "training … a convolutional neural network (CNN)" are both mathematical operations.  Somasundaram (p. 4), Kusner (p. 3 § "Generative adversarial modeling"), Calimeri (p. 627 § 2.1) all describe the mathematical procedure of generating data using GANs.  Vang (p. 2661, col. 1) describes the mathematical procedure of training a convolutional neural network.  These claimed steps do more than merely involve mathematical procedures.  They are themselves mathematical procedures.
Applicant further asserts that features of the claims "clearly indicate that the alleged judicial exception has been integrated into a practical application" (p. 11) and that "here, the improvements relate to the field of machine learning" (p. 12).
All of the claim features identified by Applicant as indicating a practical application are parts of the abstract idea.  And "machine learning", in this context, is not a technology or technological application.  It is a class of mathematical procedures.  Abstract ideas cannot be additional elements that integrate the abstract idea into a practical application, or that impart an inventive concept to the claims.
The arguments are therefore unpersuasive, so the rejection is maintained.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1–12 and 15–22 are rejected under 35 U.S.C. 103 as being unpatentable over Vang, et al. (Bioinformatics 2017; ref. A on IDS of 17 May 2019); Somasundaram, et al. (in 2nd International Conference on Information Technology Research 2017; ref. C on IDS of 6 Nov 2019); and Kusner, et al. ("GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution" 2016; previously cited).
The explanation of the correspondence among Vang, Somasundaram and the claim limitations is substantially similar to that presented in the previous Office action.
With respect to claim 1, Vang "propose a deep convolutional neural network architecture, name HLA-CNN, for the task of HLA class I-peptide binding prediction" (Abstract), comprising:
(a)	—
(b)	"the input into HLA-CNN network is the character string of the peptide, a 9-mer peptide in this example" which is then encoded into an amino acid matrix (p. 2660 § 2.3); "indicators of binding were given as either binary values of or ic50 (half maximal inhibitory concentration) measurements. Binary indicators were used directly while values given in ic50 measurements were denoted as binding if ic50 < 500nM" (p. 2659 § 2.1); the "binary indicators" constitute "positive real data, and negative real data"; the CNN is trained until a loss criterion is reached (p. 2661, bot. of col. 1)
(c)	presenting an evaluation set of real positive and negative examples to the CNN (p. 2662 § 3.2)
(d)	—
Vang teaches that "the lack of training data is a well-known weakness of deep neural networks as the model may not converge to a solution or worst yet, may overfit to the small training set" (p. 2659, bot. of col. 1).
Somasundaram teaches that "data mining and machine learning is typically associated with solving real world problems that are characterized by a large amount of data. However, in practice, collecting large amounts of data in medical field is infeasible" (p. 2 § III.B).  Somasundaram further teaches that "GANs are neural networks that learn to create synthetic data similar to some known input data" (p. 3 § VII), and that the synthetic data are created by inputting a noise vector into the Generator element of the GAN (p. 4, col. 1).  Somasundaram teaches that "after the remarkable success of GAN, it's widely used in many industries to generate things. GAN used to generate images, text, music and many more things" (p. 7 § IX).
Somasundaram provides a general solution to the "lack of training data" noted as a problem by Vang: a GAN can be used to generate synthetic training data to augment the real examples.  In this case, the training data of Vang are peptide sequences with their corresponding binding affinities for HLA class I; i.e. "polypeptide-MHC-I interaction data".  To generate synthetic training data as taught by Somasundaram, a GAN must be created such that the generator transforms a noise vector into polypeptide-MHC-I interaction data, which is the input needed by Vang.
Kusner teaches a GAN that can generate sequences of discrete elements.  Kusner teaches that such a generator transforms a noise vector into a multinomial probability distribution that approximates the distribution of sequence elements at each position in the sequence (p. 3 § "Generative adversarial modeling").  The sequences are then generated by sampling one element from the corresponding multinomial probability distribution for each position in the sequence (p. 3 § "Generative adversarial modeling").  Kusner provides an example of training a sequence generator over 20,000 mini-batch iterations (top of p. 5), which constitutes satisfying a stop criterion.  Kusner teaches that "we believe that these results, as a proof of concept, show strong promise for training GANs to generate discrete sequence data" (top of p. 6).
Since the input to the CNN of Vang includes an amino acid sequence, the generator must transform the noise vector into an amino acid sequence.  Kusner teaches that a GAN can generate a sequence of discrete elements by transforming a noise vector into a multinomial probability distribution of the possible elements (i.e. amino acids), and then sampling an element from the distribution for each position in the sequence.  To generate polypeptide sequences that bind to HLA class I, the GAN must generate "an amino acid distribution matrix that mimics binding patterns of a positive polypeptide MHC-I interaction", and then sample a sequence from that distribution matrix, as in claimed step (a).  An HLA-CNN, as taught by Vang, is then trained using both real and synthetic data, as in claimed step (b).  If the HLA-CNN performs sufficiently well on the evaluation set, then the synthetic data generated by the GAN are sufficiently representative of real binding peptides; if not, then the GAN performance is inadequate and further training is performed by repeating the GAN and CNN training, as in claimed step (d).
With respect to claim 2, Somasundaram and Vang both teach that the GAN and CNN operate on biological data.
With respect to claim 3, Vang teaches that "the focus of this article is on HLA class I proteins (p. 2658, bot. of col. 2) and "we apply machine learning techniques from the natural language processing (NLP) domain to tackle the task of MHC-peptide binding prediction" (p. 2659, mid. of col. 1).
With respect to claims 4 and 18, Somasundaram teaches that "in GAN the training data will be in 2 parts. One is the real data pdata(x) and another one is the generated data distribution pg(x)" (p. 4, mid. of col. 1).  GAN training includes adjusting the parameters of the Generator and the decision boundary of the Discriminator (p. 4, col. 2).  Kusner teaches that "the discriminator takes as input any real                         
                            d
                        
                    -dimensional vector (this could be a generated input                         
                            G
                            (
                            z
                            )
                        
                     or a real one                         
                            x
                        
                    ) and predicts the probability that the input is actually drawn from the real distribution                         
                            p
                            (
                            x
                            )
                        
                    . It will be trained to take samples                         
                            G
                            (
                            z
                            )
                        
                     and real inputs                         
                            x
                        
                     and accurately distinguish them" (p. 3 § "Generative adversarial modeling").  As explained above, in the combination of Kusner, Somasundaram and Vang, the training data are polypeptide-HLA interaction data.
With respect to claim 5, Vang teaches that "the input into HLA-CNN network is the character string of the peptide" (p. 2660 § 2.3), the peptide being one that does or does not bind to HLA class I.  Since the combination of Kusner and Somasundaram teaches using a GAN to generate synthetic positive training examples, synthetic training examples for the HLA-CNN of Vang must be peptide sequences predicted to bind to HLA class I, as in claimed step (j).  These synthetic training examples are combined with real training examples, as in claimed step (k).  The HLA-CNN is then trained until convergence, as in steps (l)–(o).
With respect to claim 6, Vang teaches that the HLA-CNN outputs a binary prediction of whether the peptide binds to HLA class I (p. 2661, col. 1; Fig. 1).
With respect to claims 7 and 8, Vang teaches evaluating the classification accuracy of the CNN (p. 2661, col. 1).  Kusner teaches that the synthetic examples generated by the GAN should be indistinguishable from real examples (p. 3 § "Generative adversarial modeling").  If a classifier is trained with synthetic training examples that are distinguishable from real training examples, then the classifier will have poor performance.  Hence, poor performance of a classifier (e.g. the HLA-CNN of Vang) indicates that the GAN is insufficiently trained; conversely, good performance of the classifier indicates that the GAN is sufficiently trained.  Kusner, Somasundaram and Vang all teach computerized training of their respective models, which necessitates that the models themselves were outputted in some form.
With respect to claim 9, Vang teaches that the input to the HLA-CNN is a peptide 9-mer (p. 2660 § 2.3).  Hence, the GAN must generate 9-mer peptide sequences; i.e. "allele length".  Kusner teaches that the GAN includes the parameter                         
                            τ
                        
                    , which is a learning rate (mid. of p. 2).  Kusner also teaches training the GAN with a chosen learning rate and batch size (p. 4 § "Optimization details").
With respect to claim 10, Vang teaches that the HLA-CNN model predicts HLA class I binding.  HLA-A, HLA-B and HLA-C are HLA class I proteins.
With respect to claims 11 and 12, Vang teaches "a 9-mer peptide" (p. 2660 § 2.3).
With respect to claims 15 and 16, Vang teaches training HLA-CNN models with specific HLA alleles, including A*02:01, A*02:03, B*27:03 and B*27:05 (p. 2663, Table 2).
With respect to claim 17, Somasundaram teaches that "the weights and biases in the discriminator and the generator are trained through back propagation" (p. 4, top of col. 1), which necessary includes "evaluating a gradient descent expression".
With respect to claim 19, Somasundaram teaches that "the optimization of GAN can be formulated as a minimax problem" (p. 4, mid. of col. 2); i.e. an evaluation of a MSE function.  Vang teaches that "the loss function used is the binary cross entropy function" (p. 2661, mid. of col. 1), which is the equivalent of MSE for binary outputs.  Vang further teaches that the HLA-CNN model is evaluated using AUC (p. 2661, top of col. 2).
With respect to claim 20, Kusner, Somsundaram and Vang all teach computerized training of their respective models, which necessitates that the models themselves were outputted in some form.
With respect to claim 21, Somasundaram teaches that "z is sampled from the prior distribution pz(z) such as uniform or Gaussian distribution", z being the noise vector (p. 4, mid. of col. 1).
With respect to claim 22, Kusner teaches that a GAN transforms a noise vector into a multinomial probability distribution that approximates the distribution of sequence elements at each position in the sequence (p. 3 § "Generative adversarial modeling").  A multinomial probability distribution is, by definition, a normalized matrix.
An invention would have been obvious to one of ordinary skill in the art if some motivation in the prior art would have led that person to modify prior art reference teachings to arrive at the claimed invention.  Prior to the time of invention, said practitioner would have been motivated to modify the HLA classification method of Vang to include synthetic training data generated by a GAN, because Somasundaram teaches that GANs can successfully generate synthetic training data for a classifier, overcoming a problem noted by Vang.  Given that Somasundaram teaches that GANs can be used to generate any kind of biomedical data, including sequences — as in Kusner and Vang — said practitioner would have readily predicted that the modification would successfully result in a method of generating a classifier for HLA-binding sequences, trained on a combination of real HLA binding data and synthetic training data generated by a GAN.  The invention is therefore prima facie obvious.

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Vang, Somasundaram and Kusner applied to claims 1–12 and 15–22 above, and further in view of Carr, et al. (WO 2017/184590).
Save for the replacement of Calimeri with Kusner, the rationale is substantially identical to that presented in the previous Office action.
The combination of Vang, Somasundaram and Kusner teaches a method of predicting HLA binding for a peptide sequence, but does not teach "synthesizing the polypeptide from the candidate polypeptide-MHC-I interaction classified as a positive polypeptide-MHC-I interaction".
Carr teaches "methods for improved prediction of HLA-peptide binding, datasets for predicting HLA-peptide binding and selection of HLA-binding peptides and compositions comprising HLA-binding peptides obtained by these methods" (0004).  Carr teaches "HLA-peptides sequenced by mass spectrometry along with a set of random decoys were used to build binary classifiers (one classifier per HLA allele) to predict whether a given peptide will bind to a specific HLA allele" (00471); classifiers can include "generative models" and "deep convolutional neural networks" (00114).  Carr further teaches that "a subset of [predicted] peptides were synthesized … and tested for binding to HLA molecules" (00470); the peptides synthesized for experimental validation were those predicted to bind to at least one HLA (00483).
With respect to claim 14, Vang teaches training HLA-CNN models to generate peptides that bind to specific HLA alleles, including A*02:01, A*02:03, B*27:03 and B*27:05 (p. 2663, Table 2).
An invention would have been obvious to one of ordinary skill in the art if some teaching in the prior art would have led that person to combine prior art reference teachings to arrive at the claimed invention.  Prior to the time of invention, said practitioner would have followed the teachings of Carr — synthesize peptides that are predicted to bind to HLA, to experimentally validate the prediction — and combined this experimental validation step with the method of Vang, Somasundaram and Kusner.  Given that both Carr and the combination of Vang, Somasundaram and Kusner are directed to generating peptide sequences predicted to bind to HLAs, and that peptides of any sequence can be readily synthesized using customary techniques, said practitioner would have readily predicted that the combination would successfully result in a method of generating predicted HLA-binding peptides, followed by synthesizing those peptides for experimental validation.  The invention is therefore prima facie obvious.

Conclusion
No claim is allowable.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Soren Harward whose telephone number is (571)270-1324. The examiner can normally be reached M-Th 8am-5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karl Skowronek can be reached on 571-272-9047. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Soren Harward/Primary Examiner, Art Unit 1631