DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
Authorization for this examiner’s amendment was given in an interview with Li Hua Weng on May 26th, 2022.
The application has been amended as follows: 
1.	(Currently Amended) A method of characterizing molecules having sequences, the method comprising: 
preparing a first group 
           
creating a first dataset comprising identities and experiment outcome of the sequences in the first group 
           creating a second dataset comprising identities of sequences from a second group of sequences obtained from publicly available datasets;
		training a first neural network using the first dataset to extract first sequence features from ;
           training a second neural network using the second dataset to obtain a pre-trained model; 
                     extracting second sequence features using the pre-trained model and the first dataset;
          predicting an outcome of a new sequence based on a trained model obtained from the training of the first neural network; 
          outputting a visualization report including the first and second sequence features and the predicted outcome of the new sequence;
          redesigning a more focused library based on the first and second sequence features; and
          conducting an experiment on the more focused library.

	2.	(Cancelled) 

	3.	(Currently Amended) The method according to claim [[2]] 1, wherein the second sequence features are different from the first sequence features.

	4-6.	(Cancelled) 

7.	(Currently Amended) The method according to claim [[5]] 1, wherein the second dataset includes sequences that are related to at least one of cell toxicity, cell membrane binding, metal binding, DNA binding, RNA binding, and non-specific binding to any molecules. 

8.	(Currently Amended) The method according to claim [[2]] 1, wherein the second sequence features are selected from a group consisting of cell toxicity sequence feature/motif, cell membrane binding motif, DNA binding motif, RNA binding motif, non-specific binding motif.

	9-10.	(Cancelled) 

11.	(Original) The method according to claim 1, wherein the first sequence features learned by the first neural network include at least one of 2D and 3D structural information, motif feature, physical and chemical property score of each sequence.

12.	(Original) The method according to claim 1, wherein the sequences are selected from a group consisting of DNA, RNA, protein amino acids. 

13.	(Original) The method according to claim 1, wherein the sequences are protein amino acides.

14.	(Original) The method according to claim 1, further comprising determining identities of the sequences by a sequencing process. 

15.	(Original) The method according to claim 1, wherein the experiment outcome includes information about interaction or non-interaction of the sequences of the molecules with a target of interest.

	16.	(Original) The method according to claim 1, wherein the experiment outcome relates to protein binding or gene editing efficiency.

17.	(Original) The method according to claim 1, further comprising performing both a supervised machine learning and an unsupervised machine learning, wherein the supervised machine learning uses the first dataset and the unsupervised machine learning uses both the first dataset and a second dataset that is different from the first dataset.

18-20.	(Cancelled) 
Allowable Subject Matter

Claims 1, 3, 7, 8, 11-17 are allowed.
The following is an examiner’s statement of reasons for allowance: 
Applicant's invention is drawn to a method of characterizing biological sequences includes: preparing a library of sequences; subjecting the sequences in the library to at least one screening experiment to obtain an experiment outcome of each of the sequences; creating a first dataset comprising identities of the sequences and the experiment outcomes of the sequences; and training a first neural network using the first dataset to extract first sequence features from the sequences in the first dataset. A second neural network may be additionally be trained using a second dataset based on an external database to generate a pre-trained model, which is used extract additional features from the first dataset.
	The closest prior art of record fail to teach the limitation of “preparing a first group of sequences; creating a first dataset comprising identities and experiment outcome of the sequences in the first group; creating a second dataset comprising identities of sequences from a second group of sequences obtained from publicly available datasets; training a first neural network using the first dataset to extract first sequence features from the first dataset; training a second neural network using the second dataset to obtain a pre-trained model; extracting second sequence features using the pre-trained model and the first dataset; predicting an outcome of a new sequence based on a trained model obtained from the training of the first neural network; outputting a visualization report including the first and second sequence features and the predicted outcome of the new sequence; redesigning a more focused library based on the first and second sequence features; and conducting an experiment on the more focused library”. 
	Applicant’s independent claim 1 comprises a particular combination of elements, which is neither taught nor suggested by the prior art.
Dependent claims are deemed allowable for the same reasons as corresponding independent claims.
Cope et al. Pub. No. US 20170211206 A1 teaches a methods for rapidly and efficiently searching biologically-related data space. More specifically, the present invention provides methods for identifying bio-molecules with desired properties, or which are most suitable for acquiring such properties, from complex bio-molecule libraries or sets of such libraries. The present invention also provides methods for modeling sequence-activity relationships, including but not limited to stepwise addition or subtraction techniques, Bayesian regression, ensemble regression and other methods. The present invention further provides digital systems and software for performing the methods provided herein
CN 112837747 A teaches a protein binding site prediction method based on attention twinning network, using a neural network with two layers of convolution layer to perform feature extraction, then estimating binding probability according to the extracted characteristic, obtaining the probability of the predicted RNA sequence binding protein. The invention adopts deep neural network paired measurement learning to effectively enhance the network capability of capturing the mutual information between circRNA, and using the available mark data from other RBP for pre-training, so as to obviously improve the prediction precision.
Azab et al. Pub. No. US 20190371429 A1 teaches a methods, processes, machines and apparatuses for non-invasive assessment of genetic alterations. In particular, a method is provided for that includes obtaining a set of sequence reads. The sequence reads each include a single molecule barcode (SMB) sequence that is a non-random oligonucleotide sequence. The method further includes assigning the sequence reads to read groups according to a read group signature. The read group signature comprises an SMB sequence and a start and end position of a nucleic acid fragment from the circulating cell free sample nucleic acid. The sequence reads comprising start and end positions and an SMB sequence similar to the read group signature are assigned to a read group. The method further includes generating a consensus for each read group, and determining the presence or absence of a genetic alteration based on the consensus for each read group.
Balac Sipes et al. Pub. No. US 20090082975 A1 teaches  a method of identifying a predictor of antisense oligonucleotide activity by identifying properties of oligonucleotides, evaluating oligonucleotide activity of the oligonucleotides, and correlating oligonucleotide activity with the properties. A high correlation between oligonucleotide activity and a property indicates that the property is a predictor of oligonucleotide activity.
	However, cited reference, alone or in combination, neither disclose nor suggest combination of features specifically preparing a first group of sequences; creating a first dataset comprising identities and experiment outcome of the sequences in the first group; creating a second dataset comprising identities of sequences from a second group of sequences obtained from publicly available datasets; training a first neural network using the first dataset to extract first sequence features from the first dataset; training a second neural network using the second dataset to obtain a pre-trained model; extracting second sequence features using the pre-trained model and the first dataset; predicting an outcome of a new sequence based on a trained model obtained from the training of the first neural network; outputting a visualization report including the first and second sequence features and the predicted outcome of the new sequence; redesigning a more focused library based on the first and second sequence features; and conducting an experiment on the more focused library.
	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cope et al. Pub. No. US 20170211206 A1 - METHODS, SYSTEMS, AND SOFTWARE FOR IDENTIFYING BIO-MOLECULES WITH INTERACTING COMPONENTS
Azab et al. Pub. No. US 20190371429 A1 - METHODS FOR NON-INVASIVE ASSESSMENT OF GENETIC ALTERATIONS
Balac Sipes et al. Pub. No. US 20090082975 A1 - METHOD OF SELECTING AN ACTIVE OLIGONUCLEOTIDE PREDICTIVE MODEL
Guyon et al. Pub. No. US 20180321245 A1 - METHODS FOR SCREENING, PREDICTING AND MONITORING  PROSTATE CANCER
Theofilatos et al. Pub. No. US 20170076036 A1 - PROTEIN FUNCTIONAL AND SUB-CELLULAR ANNOTATION IN A PROTEOME
Quake et al. Pub. No. US 20160333405 A1 - Measurement and Comparison of Immune Diversity by High-Throughput Sequencing
Horn et al. Pub. No. US 20130332133 A1 - Classification of Protein Sequences and Uses of Classified Proteins
Barnhill et al. Pub. No. US 20100256988 A1 - SYSTEM FOR PROVIDING DATA ANALYSIS SERVICES USING A SUPPORT VECTOR MACHINE FOR PROCESSING DATA RECEIVED FROM A REMOTE SOURCE
Guyon Pub. No. US 20110184896 A1 - METHOD FOR VISUALIZING FEATURE RANKING OF A SUBSET OF FEATURES FOR CLASSIFYING DATA USING A LEARNING MACHINE
Guyon et al. Pub. No. US 20080097939 A1 - DATA MINING PLATFORM FOR BIOINFORMATICS AND OTHER KNOWLEDGE DISCOVERY

Gough et al. Pub. No. US 20050053999 A1 - Method for predicting G-protein coupled receptor-ligand interactions
Gustafsson et al. Pub. No. US 20040161796 A1 - Methods, systems, and software for identifying functional biomolecules
Vissing et al. Pub. No. US 20030022200 A1 - Systems for analysis of biological materials
Cui et al. Pub. No. US 20110224913 A1 - METHODS AND SYSTEMS FOR PREDICTING PROTEINS THAT CAN BE SECRETED INTO BODILY FLUIDS
CN 112837747 A - Protein binding site prediction method based on attention twinning network
	CN 111048151 A - Method for identifying virus subtype by electronic device, involves obtaining output result of subtype virus classification neural network model, and determining virus to-be identified according to output result of neural network model
	WO 2019099716 A1 - CLUSTERING METHODS USING A GRAND CANONICAL ENSEMBLE
	Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences – July 2018
	On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach – 2017
	RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach – 2017
	A Novel Approach for Protein Structure Prediction – 2010
	Applications of ANN and RULES-3 to DNA Sequence Analysis – 2009
	Bioinformatics With Soft Computing – 2006
	De novo profile generation based on sequence context specificity with the long short-term memory network – 2018
	Gene prediction using Deep Learning – July 2018
	High-throughput discovery of functional disordered regions: investigation of transactivation domains 2018
	How Will Bioinformatics Impact Signal Processing Research? – 2013
	INTERPRETABLE MACHINE LEARNING METHODS FOR REGULATORY AND DISEASE GENOMICS – June 2018
	Machine Learning in Bioinformatics: A Novel Approach for DNA Sequencing - 2015
	 Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NIZAR N SIVJI whose telephone number is (571)270-7462.  The examiner can normally be reached on Monday-Friday 7-4.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Srilakshmi K. Kumar can be reached on (571) 272-7769.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NIZAR N SIVJI/Primary Examiner, Art Unit 2647