Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on August 5, 2022, in which claims 1, 3, and 11 are currently amended. Claims 1-20 are currently pending. 

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 103 based on amendment have been considered, however, have not been deemed persuasive.  
With respect to Applicant's argument that Lyu does not teach identifying a set of domains in a genome sequence, Examiner respectfully disagrees.  The named entities found from the JNLPBA corpus are grouped into one of five genome sequence domains ([p. 5 Col. 1] "the JNLPBA corpus consists of 22,402 sentences (18,546 training sentences and 3856 test sentences) from MEDLINE abstracts. The manual annotated entities in JNLPBA corpus contains five types, namely DNA, RNA, protein, cell line, and cell type.").  
With respect to Applicant's argument that one of ordinary skill in the art would not think to train a seq2seq network with genome sequences, Examiner respectfully disagrees.  In light of the amendments which are seen as an attempt to differentiate the claimed invention from a standard text processor to a genome specific text processor, additional prior art by Rajkumar et al. has been provided as support that this would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention.  
With respect to the claimed invention not using NLP techniques to classify BGC's, Examiner respectfully disagrees.  [¶0053] of the instant specification specifically uses the term pfam2vec which is a clear and obvious derivation of the well-known word2vec technique.  Lyu explicitly teaches using word2vec on p. 4 as inspiration  Therefore, the  networks are interpreted as analogously using text string inputs representing gene sequences.  Lyu further explicitly teaches that the model is trained using a word2vec CBOW model.
With regards to the attention model (also commonly known as an attention layer such that if isolated it could be considered a shallow network) not being the same as the shallow network in the claimed invention, Examiner respectfully asserts there is not sufficient detail in the claim language to differentiate the two.  The citation from Indeed.com does not differentiate the attention model cited from Lyu from the shallow network in the claimed invention.  A shallow network by definition is simply a network with one or two hidden layers.  https://machinelearningmastery.com/how-does-attention-work-in-encoder-decoder-recurrent-neural-networks/ shows that attention layers are expected to output context vectors.

Bull teaches using the NLP method for gene sequencing in ¶0050.  "For example, in order to determine what gene mutations are involved in a specific disease or what population groups are those genetic markers tied to a disease" a biosynthetic gene cluster is interpreted as a population group that genetic markers belong to.  
With regard to Purcell see [p. 562 Col. 1] "On the basis of the genome wide average proportion of alleles shared identical by state (IBS) between any two individuals, PLINK offers tools to (a) cluster individuals into homogeneous subsets" Which is interpreted as directed towards merging candidate genes.

Examiner further notes that the biosynthetic gene clusters being fed into the network are represented as text, therefore it is fully analogous to NLP.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claim 1, 3-6, 11, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lyu (“Long short-term memory RNN for biomedical named entity recognition”, 2017) and in view of Bull (US20180314704A1) and Xu (“Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery”, 2017) and in further view of Rajkumar (“sequence-prediction-using-CNN-and-LSTMs”, 2018)

	Regarding claim 1, Lyu teaches A method comprising: identifying, in a genome sequence, a set of domains, ([Abstract] "Our neural network architecture can be successfully used for BNER without any manual feature engineering. Experimental results show that domain-specific pre-trained word embeddings and character-level representation can improve the performance of the LSTM-RNN models" [p. 1 Col. 1] "Biomedical named entity recognition (BNER), which recognizes important biomedical entities (e.g. genes and proteins) from text, is an essential step in biomedical information extraction" word is interpreted as synonymous with sequence.  Word embeddings interpreted as identifiers in the biomedical domain.  See FIG. 4, sentence interpreted as synonymous with genome sequence, biomedical entity interpreted as synonymous with domain.)
	each identified domain corresponding to a set of domain identifiers; ([p. 2 Col. 1] "We regard BNER as a sequence labeling problem following previous work. The commonly used BIEOS tagging schema (B-beginning, I-inside, E-end, O-outside and S-the single word entity) is used to identify the boundary information of the entities." B, I, E, O, and S interpreted as a set of domain identifiers.)
	applying a shallow neural network block to each set of domain identifiers to produce a set of vectors, each vector corresponding to a set of domain identifiers; ([p. 2 Col. 1] "to improve the performance, and (2) they have enabled more effective training of RNNs by representing words with low dimensional dense vectors. which can capture distributional syntactic and semantic information...To capture morphological and orthographic information of words, we first use an attention model to encode character information of a word into its character-level representation" Attention model interpreted as synonymous with shallow neural network.)
	applying a RNN block to the set of vectors to produce a BGC class score for each domain, wherein the RNN block was trained by: ([p. 2 Col. 1] "Then we combine character- and word-level representations and then feed them into the LSTM-RNN layer to model context information of each word. On top of the neural network architecture, we use a CRF layer to jointly decode labels for the whole sentence" [p. 4 Col. 1] "For an input sentence x = x1, ... , xT, the corresponding hidden sequence h = h1, ... , hT is output by the above neural networks. We consider the matrix F of scores fθ  [ h]T1  and θ is a model parameter of the CRFs. In the matrix F, the element fi,t represents the score for the t-th word with the i-th tag. We introduce a transition score [A]j,k, which is also a model parameter, to model the transition from the j-th tag to the k-th tag. The score of the sentence [ x]T1 along with a label sequence [ y]T1 is computed by summing the transition scores and network output scores:" Matrix F of scores interpreted as class scores for each domain. See also FIG. 3 and 4.)
	identifying a set of positive vectors representing known BGCs; ([p. 4 Col. 2] "Word embeddings are distributed representations and capture distributional syntactic and semantic information of the word. Several types of word embeddings trained from different external sources are used in our LSTM RNN models. Here we will give a brief description of these pre-trained word embeddings." [p. 4 Col. 2] "The main idea for the neural network is to output high scores for positive examples and low scores for negative examples" Positive examples interpreted as synonymous with positive vectors. Obtaining pre-trained word embeddings including subsets of positive and negative vectors interpreted as synonymous with identifying a set of positive vectors.)
	synthesizing a set of negative vectors unlikely to represent BGCs; ([p. 4 Col. 2] "The main idea for the neural network is to output high scores for positive examples and low scores for negative examples" Determining negative examples interpreted as synonymous with synthesizing a set of negative vectors.  Obtaining pre-trained word embeddings including subsets of positive and negative vectors interpreted as synonymous with identifying a set of negative vectors.)
	applying the RNN block to the positive and negative sets of vectors to generate predictions of whether each vector is a positive or negative vector; and ([p. 4 Col. 2] "The main idea for the neural network is to output high scores for positive examples and low scores for negative examples" Determining negative examples interpreted as synonymous with synthesizing a set of negative vectors)
	updating weights of the RNN block based on the predictions; ([p. 4 Col. 1] "Max likelihood objective are used to train our model. The parameters...is the parameter set in our model. It consists of the parameters W and b of each neural layer" [p. 4 Col. 2] "To maximum the objective, we use online learning to train our model, and the AdaGrad algorithm [35] is used to update the model parameters" Updating parameters of each neural layer interpreted as updating weights of RNN block.).
	However, Lyu does not explicitly teach selecting candidate BGCs by averaging BGC class scores across genes within a domain and 
	comparing the average BGC class scores to a threshold; and 
	providing for display, on a user interface, the candidate BGCs and predicted molecular activity
predicting a molecular activity of biosynthetic products derived from the selected BGCs reinforcing obviousness  

Bull, in the same field of endeavor, teaches selecting candidate BGCs by averaging BGC class scores across genes within a domain and ([¶0037] "Next, at 206, the word relationship extraction program 110A, 110B smooths the transition vectors. According to the present embodiment, the word relationship extraction program 110A, 110B may apply a smoothing equation that averages examples of similar types of transition vectors from the training data 118 as depicted in FIG. 2C")
	comparing the average BGC class scores to a threshold; and ([¶0044] "According to present embodiment, the word relationship extraction program 110A, 110B may apply the “type” filter to the candidate answers using the method of removing b2 candidate answer where:p 2(ν(b 2))<Ttype, and p 2=MVN fit to data ν(A *2), where ν(A*2) represents smoothed transition vectors based on an answer type transition vector from the multi-part analogies training data, and Ttype is a first threshold value that is determined by the user while calibrating the word relationship extraction program 110A, 110B.")
	providing for display, on a user interface, the candidate BGCs and [predicted molecular activity] ([¶0047] "Next, at 226, the word relationship extraction program 110A, 110B displays the candidate answers. According to the present embodiment, the word relationship extraction program 110A, 110B may display the filtered candidate answers in a text format on a graphical user interface of a display monitor 344"). 

	Lyu and Bull are both directed towards using known NLP machine learning methods to classify biomedical data including genomes.  Therefore, Lyu and Bull are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lyu with the teachings of Bull by averaging class scores and classifying them relative to a threshold prior to displaying them through a user interface. As a majority of computers in circulation require a display to output relevant information to the user, posting a notification through a user interface would be obvious to one of ordinary skill in the art, and this is reinforced through the teachings of Bull. Bull further teaches as a motivation to combine ([¶0013] “the present embodiment has the capacity to improve the technical field of word relationship extractions using word embedding by lessening of an overhead and training period due to a lessening of extensive testing.”).  While neither Lyu or Bull explicitly mention biosynthetic gene clusters, the inclusion of biosynthetic gene clusters in the scope of the claim language is seen as merely an intended use achieved by methods that are well-known in the art.  It would be obvious to one of ordinary skill in the art that using systems which have been demonstrated to be effective at predicting and classifying gene sequences could be leveraged to predict biosynthetic gene clusters.  

	However, the combination of Lyu and Bull does not explicitly teach predicting a molecular activity of biosynthetic products derived from the selected BGCs 

Xu, in the same field of endeavor, teaches predicting a molecular activity of biosynthetic products derived from the selected BGCs ([p. 291 §4.3] "We report the accuracy means and standard deviations of 5-fold classification cross validation on both LogP and PM2-10k data, in Table 2 and 3 respectively. All results are the 100-run averages to reduce the randomness. We also show the impact of seq2seq fingerprint length on the accuracy in Figure 5" LogP interpreted as synonymous with molecular activity of biosynthetic products. Organic molecules represented by SMILE data in Xu interpreted as synonymous with biosynthetic product derived from BGC.). 

	Lyu, Bull, and Xu are all directed towards using known NLP machine learning methods to classify biomedical data including genomes.  Therefore, Lyu, Bull, and Xu are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lyu and Bull with the teachings of Xu by using the output of the seq2seq decoder to classify genes (sequences of molecules) based on their molecular activity.   It would be obvious to one of ordinary skill in the art that the sequence decoder output could be used advantageously for classification tasks, which is further reinforced by Xu.  Xu provides as an additional motivation for combination ([p. 292 §5] “The experiments on classification task demonstrate its superior performance. Also, the nature of our data-driven label-free model brings us even more benefits”). 
	
It would be obvious to one of ordinary skill in the art that language based processing systems, particularly seq2seq neural networks can be easily adapted to processing genomes which are regularly represented as text strings (similar to written text).  While it would be further obvious to one of ordinary skill in the art that SMILE string representations in Xu can be used to represent proteins which are the building blocks of genomes and by extension biosynthetic gene clusters, that one could feed string representations of genomes into a similar seq2seq system to produce expected outcomes.  The disclosure of Rajkumar is relied upon merely to reinforce this obviousness.  In the provided capture of the Github repository from Rajkumar [p. 6-7 l. 20-34] Rajkumar parses an input csv containing genome sequences [p. 11] and further merges said sequences to predict longer microRNA sequences using a typical seq2seq network similar to that of Lyu and Xu.  Lyu, Bull, Xu, and Rajkumar are all directed towards using known NLP machine learning methods to classify biomedical data including genomes.  Therefore, Lyu, Bull, Xu, and Rajkumar are all analogous art in the same field of endeavor.  It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lyu, Bull, Xu, and Rajkumar by substituting the SMILE representations in Xu with the genes in Rajkumar.  For the reasons described this would lead to obvious and expected results.  This motivation for combination also applies to the remaining claims depending on this combination. 

	Regarding claim 3, the combination of Lyu, Bull, Xu, and Rajkumar teaches The method of claim 1, further comprising: merging consecutive candidate BGC genes that are adjacent in the genome sequence. (Rajkumar [p. 4 l. 169-170] "sampled_char = reverse_target_char_index[sampled_token_index]         decoded_sentence += sampled_char" Appending genes to the predicted microRNA sequence interpreted as synonymous with merging consecutive candidate BGC genes that are adjacent in the genome sequence.). 

	Regarding claim 4, the combination of Lyu, Bull, Xu, and Rajkumar teaches The method of claim 1, wherein the RNN block is a bi-directional long short-term memory (LSTM) block. (Lyu [p. 6 Col. 1] "When we compare the uni-directional LSTM-RNNs with their bidirectional counterparts, we can see that the bidirectional improves the performance. BLSTM significantly outperforms"). 

	Regarding claim 5, the combination of Lyu, Bull, Xu, and Rajkumar teaches The method of claim 1, wherein the domain identifiers are maintained in genomic order. (Lyu [p. 2 Col. 1] "recent advances in word embedding induction methods [12, 23–25] have benefited researchers in two ways: (1) Intuitively, word embeddings can be used as extra word features in existing natural language processing (NLP) systems, including the general domain [26] and biomedical domain [27, 28], to improve the performance, and (2) they have enabled more effective training of RNNs by representing words with low dimensional dense vectors. which can capture distributional syntactic and semantic information [29, 30]." Capturing distributional syntactic information interpreted as synonymous with maintaining genomic order.). 

	Regarding claim 6, the combination of Lyu, Bull, Xu, and Rajkumar teaches The method of claim 1, wherein each vector in the set of vectors comprises one hundred elements, each element a real number, each element representing a property of the domain based on its genomic context. (Lyu [p. 8 FIG. 3] "Fig. 3 Feature representation of our model. Each column indicates the feature representation from BLSTM for each token. Each grid in the column indicates each dimension of the feature representation. The dimension of the feature representation is 100" The dimension of feature representation being 100 interpreted as synonymous with each vector in the set comprising 100 elements, each element representing a property of the domain based on its genomic context.). 

Claims 11 and 14-16 are directed towards a non-transitory computer-readable storage medium comprising instructions capable of performing the method of claims 1 and 3-6.  Therefore, the rejections applied to claims 1 and 3-6 also apply to claims 11 and 14-16.  Bull discloses executing the method in this way ([¶0020] “The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”).

	Claims 2 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lyu, Bull, Xu, and Rajkumar and in further view of Purcell (“PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses”, 2007).

	Regarding claim 2, the combination of Lyu, Bull, Xu, and Rajkumar teaches The method of claim 1.
	However, the combination of Lyu, Bull, Xu, and Rajkumar does not explicitly teach processing candidate BGCs, wherein processing includes merging and filtering candidate BGCs based on at least one of: a presence of known BGCs, a cluster length, or a distance between candidate BGCs.  

Purcell, in the same field of endeavor, teaches The method of claim 1, further comprising: processing candidate BGCs, wherein processing includes merging and filtering candidate BGCs based on at least one of: a presence of known BGCs, a cluster length, or a distance between candidate BGCs. ([p. 562 Col. 2] "Finally, one can also combine multiple external categorical and quantitative matching criteria (such as age, sex, other environmental variables, or QC measures such as the genotype call rate for each individual) alongside the genetic matching. Categorical criteria can be either “positive” or “negative,” such that only similarly categorized or differently categorized individuals can be merged. It is also possible to select only a single individual from a particular prespecified group. The complete algorithm is as follows: the IBS distance between individual k (belonging to cluster i) and
individual l (belonging to cluster j) is denoted"). 

	Lyu, Bull, Xu, Rajkumar, and Purcell are all directed towards using machine learning for genome classification and analysis.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lyu, Bull, Xu, and Rajkumar with the teachings of Purcell by merging and filtering based on candidate distance. Distance based filtering is well-known in the art of clustering which is reinforced in the art specific disclosure of Purcell. Purcell gives as motivation for combination ([p. 564 Col. 2] “Despite the necessity of permutation, one advantage is that nonnormal and dichotomous phenotypes can be appropriately analyzed. Whereas the basic test is of total association, the between and within components can also be tested separately").

Claim 12 is directed towards a non-transitory computer-readable storage medium comprising instructions capable of performing the method of claim 2.  Therefore, the rejection applied to claim 2 also applies to claim 12.  Bull discloses executing the method in this way ([¶0012] “The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”).
Regarding claim 13, the combination of Lyu and Bull teaches The method of claim 11.
However, the combination of Lyu and Bull does not explicitly teach further comprising: merging consecutive candidate BGC genes.  
	
Purcell, who teaches a related art of using machine learning for genome classification and analysis, teaches merging consecutive candidate BGC genes. ([p. 562] "3. For every pair between i and j, test the following (optional) constraints: ...Pass identity-by-missingness threshold? 4. Satisfies constraints? Merge clusters." [p. 565 Col. 1] "PLINK has a simple procedure to find extended stretches of homozygosity in whole-genome data (regions spanning more than a certain number of SNPs and/or kilobases, allowing for a certain amount of missing genotypes and/or occasional heterozygote calls) that occur relatively frequently, and it can provide a powerful approach to map recessive disease genes. 33,34 Via permutation, an empirical P value can be calculated for each SNP on the basis of a test for whether there is a higher rate of homozygous segments spanning that position in cases versus controls" Identity by missingness threshold is interpreted as consecutivity constraint.).

Lyu, Bull, Xu, Rajkumar, and Purcell are all directed towards using machine learning for genome classification and analysis.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lyu, Bull, Xu, and Rajkumar with the teachings of Purcell by merging and filtering based on candidate distance. Distance based filtering is well-known in the art of clustering which is reinforced in the art specific disclosure of Purcell. Purcell gives as motivation for combination ([p. 564 Col. 2] “Despite the necessity of permutation, one advantage is that nonnormal and dichotomous phenotypes can be appropriately analyzed. Whereas the basic test is of total association, the between and within components can also be tested separately").

	Claims 7-9 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lyu, Bull, Xu, and Rajkumar and in further view of Brito (“The gill-associated microbiome is the main source of wood plant polysaccharide hydrolases and secondary metabolite gene clusters in the mangrove shipworm Neoteredo reynei”, 2018).  

	Regarding claim 7, the combination of Lyu, Bull, Xu, and Rajkumar teaches The method of claim 1.
	However, the combination of Lyu, Bull, Xu, and Rajkumar does not explicitly teach predicting, for each candidate BGC, with a classifier, a secondary metabolite class based on a biosynthetic product and molecular activity of the candidate BGC.  

Brito, in the same field of endeavor, teaches The method of claim 1, further comprising: predicting, for each candidate BGC, with a classifier, a secondary metabolite class based on a biosynthetic product and molecular activity of the candidate BGC. ([p. 13] "In addition, we also annotated the secondary metabolome of all genomes in the Cellvibrionaceae family included in our genome-wide phylogeny (Fig 2) and resolved the detected BGCs diversity by supervisioned random forest combined with multidimensional scaling" [p. 13-14] "The Resistome is defined as the collection of all antibiotic resistance genes present in a micro organism’s genome, including precursor genes which encode proteins retaining weak antibiotic resistance or binding activity, and that under selective pressure can evolve into a new resistance marker [50]. Here we investigated the resistome coded in Teredinibacter genomes, including gills.bin.1 and gills.bin.4, and other representative genomes"). 

	Lyu, Bull, Xu, Rajkumar, and Brito are all directed towards using machine learning for genome classification and analysis.  Therefore, Lyu, Bull, Xu, Rajkumar, and Brito are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lyu, Bull, Xu, and Rajkumar with the teachings of Brito by using the machine learning systems taught by Lyu and Bull for the classification of biosynthetic gene clusters while incorporating other known machine learning techniques such as random forest.  It would have been obvious to one of ordinary skill in the art to use known machine learning techniques to predict BGC, which is reinforced by Brito ([p. 4] “Taxonomical and functional signatures were submitted to comparative metagenomics using R packages and the Statistical Analysis of Metagenomic Profiles (STAMP) software package, version 2.1.3[22]. Multivariate statistical analyzes were performed with R using the following unsupervised learning techniques: i) hierarchical clustering, with Ward grouping method on a Euclidean distance matrix, ii) PCA biplot, and iii) supervised/unsupervised random forest”).  

	Regarding claim 8, the combination of Lyu, Bull, Xu, Rajkumar, and Brito teaches The method of claim 7, wherein the classifier is a random forest classifier. (Brito [p. 13] "In addition, we also annotated the secondary metabolome of all genomes in the Cellvibrionaceae family included in our genome-wide phylogeny (Fig 2) and resolved the detected BGCs diversity by supervisioned random forest combined with multidimensional scaling"). 

	Regarding claim 9, the combination of Lyu, Bull, Xu, Rajkumar teaches The method of claim 1, wherein the set of negative vectors are synthesized by: modifying the genome sequence by replacing a portion of the genes within the known BGCs with random genes of similar length; ([p. 4 Col. 2] "the negative examples are the windows where one word is replaced by a random word")
	generating a set of identifiers for each domain in the modified genome sequence; and ([p. 2 Col. 1] "We regard BNER as a sequence labeling problem following previous work. The commonly used BIEOS tagging schema (B-beginning, I-inside, E-end, O-outside and S-the single word entity) is used to identify the boundary information of the entities." B, I, E, O, and S interpreted as set of identifiers for each domain in the modified sequence.)
	applying a shallow neural network block to each domain in the modified genome sequence to produce a negative set of vectors. ([p. 7 Col. 1] "The softmax classifier layer calculates the probability distribution over all labels and chooses the label with highest probability for each word." [p. 8 Col. 2] "On the GM corpus, our model achieves 4.68% improvements of F1 score over Li et al. (2015) [30], which is a neural network model using used softmax function to predict which tag the current token belongs to" Lyu explicitly teaches tokens being classified as negative or positive such that using the softmax layer to predict the tag the token belongs to is interpreted as synonymous with applying a shallow NN block to produce a negative set of vectors.).
	However, the combination of Lyu, Bull, Xu, Rajkumar does not explicitly teach the set of negative vectors are synthesized by: retrieving a genome sequence with known BGCs;  

Brito, in the same field of endeavor, teaches the set of negative vectors are synthesized by: retrieving a genome sequence with known BGCs; ("Eighteen of the putative BGCs from gills.bin.1 and additional fourth contig undetected by the antiSMASH server could be mapped to all 13 BGCs from T. turnerae T7901 secondary metabolome with high pairwise identity (~99%) and BGCs coverage (~88%) (S4 Table). BLASTp inspection showed that the type 1 PKS route from gills.bin.1 genome bin lacking in T7901 genome is actually conserved in other T. turnerae genomes, as the ones from strains T8412, T8413 and T8415 (E value = 0, 100% query coverage, ~99% sequence identity)" Brito teaches retrieving a genome sequence with known BGC for classification.). 

	Lyu, Bull, Xu, Rajkumar, and Brito are all directed towards using machine learning for genome classification and analysis.  Therefore, Lyu, Bull, Xu, Rajkumar, and Brito are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lyu, Bull, Xu, and Rajkumar with the teachings of Brito by using the machine learning systems taught by Lyu and Bull for the classification of biosynthetic gene clusters while incorporating other known machine learning techniques such as random forest.  It would have been obvious to one of ordinary skill in the art to use known machine learning techniques to predict BGC, which is reinforced by Brito ([p. 4] “Taxonomical and functional signatures were submitted to comparative metagenomics using R packages and the Statistical Analysis of Metagenomic Profiles (STAMP) software package, version 2.1.3[22]. Multivariate statistical analyzes were performed with R using the following unsupervised learning techniques: i) hierarchical clustering, with Ward grouping method on a Euclidean distance matrix, ii) PCA biplot, and iii) supervised/unsupervised random forest”).  

Claims 17-19 is directed towards a non-transitory computer-readable storage medium comprising instructions capable of performing the methods of claim 7-9.  Therefore, the rejection applied to claims 7-9 also apply to claims 17-19.  Bull discloses executing the method in this way ([¶0020] “The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”).

	Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lyu, Bull, Xu, and Rajkumar and in further view of Sharma (US20200012838A1).

	Regarding claim 10, Lyu teaches The method of claim 1.
	However, Lyu does not explicitly teach applying the RNN block to the set of vectors further comprises applying a sigmoid activation function.  

Sharma, in the same field of endeavor, teaches applying the RNN block to the set of vectors further comprises applying a sigmoid activation function. ([¶0033] "The evaluated performance of the Res-CRANN is performed on Bioimage Chromosome Classification dataset, which is publicly available online...The fully connected layers have sigmoid as their activation function"). 

	Lyu, Bull, Xu, Rajkumar, and Sharma are all directed towards using machine learning for genome classification and analysis.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lyu, Bull, Xu, and Rajkumar with the teachings of Sharma by using a sigmoid activation function in the recurrent neural network.  One of ordinary skill in the art would recognize that a sigmoid activation function is well known, which is reinforced by Sharma.  Sharma teaches as motivation for combination ([Abstract] “The Res-CRANN provides higher classification accuracy as compared to the state-of the-art methods for chromosome classification”).

Claim 20 is directed towards a non-transitory computer-readable storage medium comprising instructions capable of performing the method of claim 10.  Therefore, the rejection applied to claim 10 also applies to claim 20.  Bull discloses executing the method in this way ([¶0020] “The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126