DETAILED ACTION
	This Office Action is in response to the communication filed on 3/25/2019.
	Claims 1-23 are being considered on the merits.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 8/01/2019 has been considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, initialed and dated copies of Applicant's IDS forms 1449 filed 8/01/2019 is attached to the instant Office action. 

Drawings
The drawings filed on 3/25/2019 are accepted. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 11 recites the limitation "the sequence" in the first line.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the sequence” shall be interpreted to mean “the biological sequence”. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 22, and 23 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 1 is directed to a method claim, claim 22 is directed to a system, and claim 23 is directed to a non-transitory computer-readable storage media. Therefore, each of the claims falls within one of the four statutory categories. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), in this case the claims fall within the judicial exception of an abstract idea. Specifically, with the exception of using generic computer processing elements with generic computer processing steps, each of element of the claims—under the broadest reasonable interpretation—are abstract ideas or otherwise insignificant extra-solution. In this case, obtaining biological data is merely data gathering and amounts to an insignificant extra-solution activity, generating an encoding and processing the encoding to generate a score distribution both amount to mathematical calculation which constitutes an abstract idea, and classifying the biological sequence is a mental process which also constitutes an abstract idea that can be performed within the human mind. 
Step 2a Prong 2 Analysis: This judicial exception is not integrated into a practical application because as stated above, the step of obtaining biological data is merely data gathering and amounts to an insignificant extra-solution activity. Moreover, the processing step only recites the following additional elements: “using a deep neural network…the deep neural network is a convolutional neural network that comprises a plurality of depthwise separable convolutional layers that operate on the encoding of the biological sequence, and wherein the deep neural network has been configured through training to process the encoding”, the claim does not provide any detail as to how the neural network is configured through training and therefore is a step that is merely linking the abstract idea to a particular technological environment or otherwise implementing the use of a generic computer as a tool to perform the abstract idea.  
Step 2b: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using a trained neural network to perform the processing step without providing any detail of the training or structure of the neural network amounts to no more than merely linking the abstract idea to a particular technological environment or otherwise implementing the use of a generic computer as a tool to perform the abstract idea. Performing a mental process on a generic computer, in a computer environment, or merely using a computer as a tool does not amount to significantly more such that the judicial exception would apply. 
Claim 2 is rejected under 35 U.S.C 101. Where the limitation is that type of data being obtained is specifically RNA, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 3 is rejected under 35 U.S.C 101. Where the limitation is that type of data being obtained is specifically DNA, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 4 is rejected under 35 U.S.C 101. Where the limitation is that type of label being used is taxonomic labels, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 5 is rejected under 35 U.S.C 101. Where the limitation is that type of label being used is species labels, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 6 is rejected under 35 U.S.C 101. Where the limitation is that type of label being used is operational taxonomic unit labels, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 7 is rejected under 35 U.S.C 101. Where the limitation is that type of label being used is gene or gene property labels, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 8 is rejected under 35 U.S.C 101. Where the limitation is that type of label being used relate to pathogenicity, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 9 is rejected under 35 U.S.C 101. Where the limitation is that type of biological sequence data being used is a protein, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 10 is rejected under 35 U.S.C 101. Where the limitation is that type of label being used are a set of possible protein functions, such limitation amounts to selecting a particular data source or type of data to be manipulated which constitutes an insignificant extra-solution activity, does not integrate the abstract idea into practical application, and does not amount to significantly more.
Claim 11 is rejected under 35 U.S.C 101. Where the limitation that type of biological sequence data being used is a canonical compound, such limitation amounts to selecting a particular data source or type of data to be manipulated and is mere data gathering and does not amount to significantly more. The limitation of “one-hot encoding” the compound is a mathematical calculation which constitutes an abstract idea. The limitation of “resolving each ambiguity code” is likewise a mental process which constitutes an abstract idea. The additional limitations of claim 11 do not integrate the abstract ideas into practical application and, as previously stated, do not amount to significantly more. 
Claim 17 is rejected under 35 U.S.C 101. Where the limitation is that the depthwise separable layers are followed with fully connected layers, such limitation does not provide any additional detail to the training of the neural network and amounts to merely linking the abstract idea to a particular technological environment or otherwise implementing the use of a generic computer as a tool to perform the abstract idea. Therefore, claim 17 does not integrate the abstract idea into practical application and does not amount to significantly more. 
Claim 18 is rejected under 35 U.S.C 101. Dependent claim 18 recites the additional element of specifying that the fully connected layers are tiled, such limitation does not provide any additional detail to the training of the neural network and amounts to merely linking the abstract idea to a particular technological environment or otherwise implementing the use of a generic computer as a tool to perform the abstract idea. Therefore, claim 18 does not integrate the abstract idea into practical application and does not amount to significantly more.
Claim 19 is rejected under 35 U.S.C 101. Dependent claim 19 recites the additional element of specifying that the pooling layer is an average pooling layer, such limitation does not provide any additional detail to the training of the neural network and amounts to merely linking the abstract idea to a particular technological environment or otherwise implementing the use of a generic computer as a tool to perform the abstract idea. Therefore, claim 19 does not integrate the abstract idea into practical application and does not amount to significantly more.
Claim 20 is rejected under 35 U.S.C 101. Dependent claim 20 recites the additional element of specifying the deep neural network comprises a softmax outer layer, such limitation does not provide any additional detail to the training and amounts to merely linking the abstract idea to a particular technological environment or otherwise implementing the use of a generic computer as a tool to perform the abstract idea. Therefore, claim 20 does not integrate the abstract idea into practical application and does not amount to significantly more.
Claim 21 is rejected under 35 U.S.C 101. Dependent claim 21 recites the additional element of that the convolutional layers or fully connected layers or both, have a leaky rectified-linear unit activation function, such limitation does not provide any additional detail to the training and amounts to merely linking the abstract idea to a particular technological environment or otherwise implementing the use of a generic computer as a tool to perform the abstract idea. Therefore, claim 21 does not integrate the abstract idea into practical application and does not amount to significantly more.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 8-9, 11, 17-20, and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et. al. (WO 2018006152A1) (hereinafter, “Xiong”), in view of Verhulsdonck (“One Shot Object Detection for Tracking Purposes”) (hereinafter, “Verhul”).

Regarding Claim 1, Xiong teaches a method performed by one or more computers, the method comprising: 
obtaining data identifying a biological sequence (Xiong, para. 0004: “a first layer comprising a plurality of positions configured to obtain a biological sequence”); 
generating, from the obtained data, an encoding of the biological sequence (Xiong, para. 0010: “an encoder configured to encode the biological sequence as a vector sequence”); 
processing the encoding using a deep neural network, wherein the deep neural network is a convolutional neural network that comprises a plurality of…layers (Xiong, para. 0014: “molecular phenotype convolutional neural networks (MPCNNs) is provided, the method comprising: each of at least three layers” ) that operate on the encoding of the biological sequence (Xiong, para. 0020: “The method may further comprise an encoding operation that encodes the biological sequence as a vector sequence”), and wherein the deep neural network has been configured through training to process the encoding to generate a score distribution over a set of biological labels for the biological sequence; and (Xiong, paras. 0061 and 0065: “the input may include additional information, which may comprise, for example, environmental factors, cell labels, tissue labels, disease labels, and other relevant inputs” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression”; examiner notes that the broadest reasonable interpretation of “biological labels” includes labels relating to biology such as cell, tissue, and disease; examiner further notes that softmax regression generates a probability distribution over labels). 
classifying the biological sequence using the score distribution (Xiong, para. 0047 and 0065: “For example, the input layer may obtain a biological sequence represented as a vector sequence and additional information. The last layer is the output layer, for example, the molecular phenotype.” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression” Examiner notes that applicant does not specify the method or criteria of classification and therefore the broadest reasonable interpretation includes classification of biological sequences into molecular phenotypes; examiner further notes that softmax regression generates a probability distribution over labels).  
Xiong fails to explicitly disclose: 
“…depth wise separable convolutional layers…”
However, Verhul teaches: 
“…depth wise separable convolutional layers…” (Verhul, pg. 27, sec. 3.3.4.1: “A depthwise separable convolution splits a normal convolution in 2 parts.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to achieve increase the ability to detect a wider variety of features (Verhul, pg. 10, last paragraph).
Regarding claim 2, Xiong and Verhul teach the computer-implemented method of claim 1 (above). Xiong further teaches: 
wherein the biological sequence is RNA (Xiong, para. 0034: “The biological sequence may be a DNA sequence, an RNA sequence, or a protein sequence”); 

Regarding claim 3, Xiong and Verhul teach the computer-implemented method of claim 1 (above). Xiong further teaches: 
wherein the biological sequence is DNA (Xiong, para. 0034: “The biological sequence may be a DNA sequence, an RNA sequence, or a protein sequence”).

Regarding claim 4, Xiong and Verhul teach the computer-implemented method of claim 1 (above). Xiong further teaches: 
the set of biological labels are a set of taxonomic labels for the biological sequence. (Xiong, paras. 0061: “the input may include additional information, which may comprise, for example, environmental factors, cell labels, tissue labels, disease labels, and other relevant inputs”; examiner notes that the broadest reasonable interpretation of “taxonomic” means “concerned with the classification of things” wherein classification could include categories such as cell, tissue, or disease.)

Regarding claim 8, Xiong and Verhul teaches the computer-implemented method of claim 1 (above). Xiong further teaches: 
the biological labels comprise labels that identify a pathogenicity of the biological sequence (Xiong, para. 0061: “It will be appreciated that the input may include additional information, which may comprise, for example, environmental factors, cell labels, tissue labels, disease labels, and other relevant inputs.” Examiner notes that the broadest reasonable interpretation of “pathogenicity” means relating to causing disease, Xiong teaches classification of biological sequences over disease labels). 

Regarding claim 9, Xiong and Verhul teaches the computer-implemented method of claim 1 (above). Xiong further teaches: 
the biological sequence is a protein (Xiong, para. 0034: “The biological sequence may be a DNA sequence, an RNA sequence, or a protein sequence.”)

Regarding claim 11, Xiong and Verhul teaches the computer-implemented method of claim 1 (above). Xiong further teaches: 
the sequence is a sequence of canonical compounds and ambiguity codes (Xiong, para. 0004, 0036 and 0038: “a first layer comprising a plurality of positions configured to obtain a biological sequence” “MPCNNs may be constructed to account for the relationships between biological sequences” “whereas a G to T substitution may alter the molecular phenotype, a G to A substitution may not.” Examiner notes that claim 1 recites the limitation “obtain data identifying a biological sequence” such that accounting for any relationships between biological sequences and any canonical compounds and any ambiguity codes necessarily identifies such sequence), and wherein generating the encoding for the sequence comprises: 
one-hot encoding each of the canonical compounds (Xiong, para. 0062: “One method that may be applied by the encoder (107) is to encode the sequence of symbols in a sequence of numerical vectors, a vector sequence, using, for example, one-hot encoding”)
resolving each ambiguity code to a corresponding probability distribution over the canonical compounds (Xiong, para. 0064 and 0065: “the biological sequences need not be of the same length and that an MPCNN may be trained to account for other molecular phenotypes, for other biologically related variants and for other specifications of the additional information” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression”. Examiner notes that the ambiguity code may be accounted for as a biologically related variant and that softmax regression generates a probability distribution). 

Regarding claim 17, Xiong and Verhul teach the computer-implemented method of claim 1 (above). Xiong further teaches: 
…followed by a plurality of fully-connected layers (Xiong, para. 0008: “at least one of the at least three layers other than the first layer may be configured as a fully connected layer”). 
Xiong does not explicitly disclose: 
the plurality of depthwise separable convolutional layers…
However, Verhul teaches:
the plurality of depthwise separable convolutional layers… (Verhul, pg. 27, sec. 3.3.4.1 and pg. 53, table 5.5: “A depthwise separable convolution splits a normal convolution in 2 parts.”; examiner notes that Verhul table 5.5 shows a network with multiple depthwise separable convolutional layers)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to achieve a more parameter- and computationally-efficient model (Verhul, pg. 10, last paragraph).

Regarding claim 18, Xiong and Verhul teach the computer-implemented method of claim 17 (above). Xiong further teaches: 
and wherein the deep neural network comprises a pooling layer following the fully-connected layers (Xiong, paras. 0007 and 0008: “One or more of the at least three layers may be configured as pooling layers” “At least one of the at least three layers other than the first layer may be configured as a fully connected layer”) and…to generate the score distribution over the labels following the pooling layer (many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression”; examiner notes that a softmax regression generates a probability distribution over labels).
Xiong does not explicitly disclose:
the fully-connected layers are tiled…
…a softmax output layer…
However, Verhul teaches: 
the fully-connected layers are tiled…(Verhul, pg. 9, sec. 2.1.4 and pg.20, sec. 3.2.1: “uses cells only sensitive to a small part of the image called a receptive field, but tiles them to cover the whole image” “The neural network then applies a number of convolutions to both inputs and combines the outputs of the convolutions using fully connected layers”)
…a softmax output layer… (Verhul, pg. 26, sec. 3.3.3.2: “The output of the neural network is normalized with a softmax activation function”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to achieve a more parameter- and computationally-efficient model (Verhul, pg. 10, last paragraph).

Regarding claim 19, Xiong and Verhul teach the computer-implemented method of claim 18 (above). Verhul further teaches: 
the pooling layer is an average pooling layer (Verhul, pg. 11, sec. 2.1.4.1: “A pooling layer is a constant mathematical operation that is applied similar to a convolution, popular pooling layers are max pooling and average pooling.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to achieve a more parameter- and computationally-efficient model (Verhul, pg. 27, sec. 3.3.4.1).
Regarding claim 20, Xiong and Verhul teach the computer-implemented method of claim 19 (above). Xiong further teaches: 
the deep neural network comprises a softmax output layer to generate the score distribution over the labels following the fully-connected layers. (Xiong, para. 0008 and 0065: “At least one of the at least three layers other than the first layer may be configured as a fully connected layer” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression”. Examiner notes that softmax regression generates a probability distribution).
Xiong does not explicitly disclose: 
…a softmax output layer… (Verhul, pg. 26, sec. 3.3.3.2: “The output of the neural network is normalized with a softmax activation function”)
However, Verhul teaches: 
…a softmax output layer… (Verhul, pg. 26, sec. 3.3.3.2: “The output of the neural network is normalized with a softmax activation function”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to train a network to detect different classes of objects (Verhul, pg. 26, sec. 3.3.3.2).

Regarding claim 22, Xiong teaches a system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: 
obtaining data identifying a biological sequence; (Xiong, para. 0004: “a first layer comprising a plurality of positions configured to obtain a biological sequence”)
generating, from the obtained data, an encoding of the biological sequence; (Xiong, para. 0010: “an encoder configured to encode the biological sequence as a vector sequence”)
processing the encoding using a deep neural network, wherein the deep neural network is a convolutional neural network that comprises a plurality of…layers (Xiong, para. 0014: “molecular phenotype convolutional neural networks (MPCNNs) is provided, the method comprising: each of at least three layers” ) that operate on the encoding of the biological sequence (Xiong, para. 0020: “The method may further comprise an encoding operation that encodes the biological sequence as a vector sequence”), and wherein the deep neural network has been configured through training to process the encoding to generate a score distribution over a set of biological labels for the biological sequence; and (Xiong, paras. 0061 and 0065: “the input may include additional information, which may comprise, for example, environmental factors, cell labels, tissue labels, disease labels, and other relevant inputs” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression”; examiner notes that the broadest reasonable interpretation of “biological labels” includes labels relating to biology such as cell, tissue, and disease; examiner further notes that softmax regression generates a probability distribution over labels). 
classifying the biological sequence using the score distribution (Xiong, para. 0047 and 0065: “For example, the input layer may obtain a biological sequence represented as a vector sequence and additional information. The last layer is the output layer, for example, the molecular phenotype.” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression” Examiner notes that applicant does not specify the method or criteria of classification and therefore the broadest reasonable interpretation includes classification of biological sequences into molecular phenotypes; examiner further notes that softmax regression generates a probability distribution over labels).  
Xiong fails to explicitly disclose: 
“…depth wise separable convolutional layers…”
However, Verhul teaches: 
“…depth wise separable convolutional layers…” (Verhul, pg. 27, sec. 3.3.4.1: “A depthwise separable convolution splits a normal convolution in 2 parts.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to achieve a more parameter- and computationally-efficient model (Verhul, pg. 10, last paragraph).

Regarding claim 23, Xiong teaches one or more non-transitory computer-readable storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
obtaining data identifying a biological sequence; (Xiong, para. 0004: “a first layer comprising a plurality of positions configured to obtain a biological sequence”)
generating, from the obtained data, an encoding of the biological sequence; (Xiong, para. 0010: “an encoder configured to encode the biological sequence as a vector sequence”)
processing the encoding using a deep neural network, wherein the deep neural network is a convolutional neural network that comprises a plurality of…layers (Xiong, para. 0014: “molecular phenotype convolutional neural networks (MPCNNs) is provided, the method comprising: each of at least three layers” ) that operate on the encoding of the biological sequence (Xiong, para. 0020: “The method may further comprise an encoding operation that encodes the biological sequence as a vector sequence”), and wherein the deep neural network has been configured through training to process the encoding to generate a score distribution over a set of biological labels for the biological sequence; and (Xiong, paras. 0061 and 0065: “the input may include additional information, which may comprise, for example, environmental factors, cell labels, tissue labels, disease labels, and other relevant inputs” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression”; examiner notes that the broadest reasonable interpretation of “biological labels” includes labels relating to biology such as cell, tissue, and disease; examiner further notes that softmax regression generates a probability distribution over labels). 
classifying the biological sequence using the score distribution (Xiong, para. 0047 and 0065: “For example, the input layer may obtain a biological sequence represented as a vector sequence and additional information. The last layer is the output layer, for example, the molecular phenotype.” “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression” Examiner notes that applicant does not specify the method or criteria of classification and therefore the broadest reasonable interpretation includes classification of biological sequences into molecular phenotypes; examiner further notes that softmax regression generates a probability distribution over labels).  
Xiong fails to explicitly disclose: 
“…depth wise separable convolutional layers…”
However, Verhul teaches: 
“…depth wise separable convolutional layers…” (Verhul, pg. 27, sec. 3.3.4.1: “A depthwise separable convolution splits a normal convolution in 2 parts.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to achieve a more parameter- and computationally-efficient model (Verhul, pg. 10, last paragraph).

Claims 5-7 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong, in view of Verhul and further in view of Killoran, et. al. (“Generating and designing DNA with deep generative models”) (hereinafter, “Killoran”).
Regarding claim 5, Xiong and Verhul teach the computer-implemented method of claim 4 (above). Neither Xiong nor Verhul explicitly discloses: 
•	the set taxonomic labels comprise a set of species labels for the biological sequence
However, Killoran teaches: 
•	the set taxonomic labels comprise a set of species labels for the biological sequence (Killoran pg. 4, section 2.2.2: “Generator and predictor models can even be trained on different datasets (e.g., a large unlabeled dataset for the generator and a smaller labelled dataset for the predictor). A single predictor can also be paired with different generators, each trained to capture different types of genomic sequences, such as from functionally distinct regions or different species”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Killoran into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong and Verhul to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).

Regarding claim 6, Xiong and Verhul teach the computer-implemented method of claim 1 (above). Neither Xiong and Verhul explicitly disclose:  
the biological labels are a set of operational taxonomic units
However, Killoran teaches:
the biological labels are a set of operational taxonomic units (Killoran pg. 4, section 2.2.2: “Generator and predictor models can even be trained on different datasets (e.g., a large unlabeled dataset for the generator and a smaller labelled dataset for the predictor). A single predictor can also be paired with different generators, each trained to capture different types of genomic sequences, such as from functionally distinct regions or different species.” Examiner notes that the broadest reasonable interpretation of “operational taxonomic unit” is the basic unit used in numerical taxonomy which may refer to an individual, species, genus, or class.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Killoran into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong and Verhul to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).

Regarding claim 7, Xiong and Verhul teach the computer-implemented method of claim 1 (above). Killoran further teaches: 
•	the biological labels are a set of gene labels or a set of gene property labels. (Killoran pg. 4, section 2.2.2: “Generator and predictor models can even be trained on different datasets (e.g., a large unlabeled dataset for the generator and a smaller labelled dataset for the predictor). A single predictor can also be paired with different generators, each trained to capture different types of genomic sequences, such as from functionally distinct regions or different species.” Examiner notes that the broadest reasonable interpretation of “gene property” includes information such as the gene’s sequence, type, or its location).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Killoran into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong and Verhul to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).

Regarding claim 21, Xiong and Verhul teach the computer-implemented method of claim 17 (above). Xiong further teaches: 
…the plurality of fully-connected layers… (Xiong, paras. 0007 and 0008: “One or more of the at least three layers may be configured as pooling layers” “At least one of the at least three layers other than the first layer may be configured as a fully connected layer”).
Xiong does not explicitly disclose:
…the depthwise separable convolutional layers have a leaky rectified-linear activation function.
However, Verhul teaches:
…the depthwise separable convolutional layers… (Verhul, pg. 27, sec. 3.3.4.1: “A depthwise separable convolution splits a normal convolution in 2 parts.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Verhul into Xiong. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection. One of ordinary skill would have motivation to teachings of Verhul into Xiong to achieve a more parameter- and computationally-efficient model (Verhul, pg. 10, last paragraph). 
Neither Xiong nor Verhul explicitly disclose:
a leaky rectified-linear activation function.
However, Killoran does teach: 
a leaky rectified-linear unit activation function (Killoran, pg. 16, para. 3: “it was important to replace standard relu units with a ‘leaky’ relu,”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Killoran into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong and Verhul to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).


Claims 10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong, in view of Verhul, and further in view of Wang, et. al., (“From Protein Sequence to Protein Function via Multi-Label Linear Discriminant Analysis”) (hereinafter, “Wang”),
Regarding claim 10, Xiong and Verhul teach the computer-implemented method of claim 9 (above). Neither Xiong nor Verhul explicitly disclose: 
The set of biological labels are a set of possible protein functions for the protein 
However, Wang teaches:
The set of biological labels are a set of possible protein functions for the protein (Wang, Pg. 503, section 2: Predicting protein function from sequence involves two types of data, i.e., protein sequences and the corresponding function annotations; examiner notes that the broadest reasonable interpretation of labels includes identification makers e.g. annotations)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Wang into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Wang teaches analysis of protein function given a protein sequence via multi-Label linear discriminant analysis. One of ordinary skill would have motivation to teachings of Wang into Xiong and Verhul to implement more efficient and effective protein function prediction (Wang, pg. 512, para. 1).

Regarding claim 12, Xiong and Verhul teach the computer-implemented method of claim 11 (above). Xiong further teaches: 
obtaining training data for the deep neural network, the training data comprising: data representing a plurality of biological sequences and respective biological labels for each of the biological sequences (Xiong paras. 0011 and 0061: “a MPCNN training unit and a plurality of training cases, each training case comprising a biological sequence” the input to the MPCNN comprises a biological sequence encoded by an encoder (107) as a vector sequence. It will be appreciated that the input may include additional information, which may comprise, for example, environmental factors, cell labels, tissue labels, disease labels, and other relevant inputs”; examiner notes that the broadest reasonable interpretation of “biological labels” includes labels relating to biology such as cell and tissue labels)
…to generate score distributions that accurately reflect the biological labels for the biological sequences in the training data (Xiong. para. 0065: “many different machine learning architectures can be represented as neural networks, including linear regression, logistic regression, softmax regression”; examiner notes that softmax regression generates a probability distribution). 
Xiong does not explicitly disclose:
training the deep neural network on the training data using supervised learning…
However, Wang teaches: 
training the deep neural network on the training data using supervised learning… (Wang. Pg. 505, section 2.3: “Given a labeled data set {(x1,y1),…, {xn,yn)}, the goal is to predict labels for unlabeled data points”; examiner notes that the broadest reasonable interpretation of supervised learning includes using pairs of labeled data, i.e. training data, to obtain an output) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Wang into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Wang teaches analysis of protein function given a protein sequence via multi-Label linear discriminant analysis. One of ordinary skill would have motivation to teachings of Wang into Xiong and Verhul to implement more efficient and effective protein function prediction (Wang, pg. 512, para. 1).

Claims 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong, in view of Verhul, in view of Wang and further in view of Killoran.
Regarding claim 13, Xiong, Kaiser, and Wang teach the computer-implemented method of claim 12 (above). Xiong further discloses: 
…encoding the biological sequences in the training data for input to the deep neural network (Xiong, para. 0020: “The method may further comprise an encoding operation that encodes the biological sequence as a vector sequence”)
Xiong does not explicitly disclose: 
randomly injecting noise…
However, Killoran teaches:
randomly injecting noise…(Killoran pg. 16, para. 3: “The second optional component…is to add random noise.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Wang into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Wang teaches analysis of protein function given a protein sequence via multi-Label linear discriminant analysis; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong, Verhul, and Wang to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).

Regarding claim 14, Xiong, Verhul, Wang and Killoran teach the computer-implemented method of claim 13 (above). Killoran further teaches: 
for each element of a given biological sequence, determining to modify the element with a fixed probability r(). (Killoran, pg. 8, para. 3: “computing the inner product of a fixed position weight matrix (analogous to a convolutional filter from computer vision) with every length-K subsequence of the data.”; examiner notes that the broadest reasonable interpretation of “element” means a particular aspect of something, such as a particular sequence.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Wang into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Wang teaches analysis of protein function given a protein sequence via multi-Label linear discriminant analysis; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong, Verhul, and Wang to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).

Regarding claim 15, Xiong, Verhul, Wang, and Killoran teach the computer-implemented method of claim 14 (above). Killoran further teaches: 
when the element is a canonical compound and in response to determining to modify the element, flipping the canonical compound to one of the other canonical compounds with equal probability. (Killoran, pg. 4, section 2.2.2, para. 3: “mapping data x to the corresponding target attributes t = P(x). The two modules are ‘plugged’ back-to-back, so that they form a concatenated transformation z [Wingdings font/0xE0] x [Wingdings font/0xE0] t… the corresponding generated sequences (four examples shown) will change” Examiner notes that in the claim, the phrase “in response to determining” does not take into consideration whether the element is canonical or not. Examiner also notes that “flipping” is interpreted to mean substituting such that the sequence will change). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Wang into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Wang teaches analysis of protein function given a protein sequence via multi-Label linear discriminant analysis; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong, Verhul, and Wang to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).

Regarding claim 16, Xiong, Verhul, Wang, and Killoran teach the computer-implemented method of claim 14 (above). Killoran further teaches: 
when the element is not a canonical compound and in response to determining to modify the element, flipping the element to one of the canonical compounds with equal probability (Killoran, pg. 4, section 2.2.2, para. 3: “mapping data x to the corresponding target attributes t = P(x). The two modules are ‘plugged’ back-to-back, so that they form a concatenated transformation z [Wingdings font/0xE0] x [Wingdings font/0xE0] t… the corresponding generated sequences (four examples shown) will change” Examiner notes that in the claim, the phrase “in response to determining” does not take into consideration whether the element is canonical or not. Examiner also notes that “flipping” is interpreted to mean substituting such that the sequence will change).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Wang into Xiong and Verhul. Xiong teaches systems and methods for generating and training convolutional neural networks using biological sequences, including use of a CNN; Verhul teaches CNN architecture using tiled convolutional layers, softmax, and depthwise separable convolutions in the context of real-time object detection; Wang teaches analysis of protein function given a protein sequence via multi-Label linear discriminant analysis; Killoran teaches generating DNA sequences using a generative neural network. One of ordinary skill would have motivation to teachings of Killoran into Xiong, Verhul, and Wang to inexpensively simulate data or to explore the space of possible data configurations (Killoran, pg. 1, section 1).


Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Colwell, et. al., US 2014/0136120 A1

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SALLY T. NGUYEN whose telephone number is (571)272-3406. The examiner can normally be reached M-F 9:00am - 3:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amir Mehrmanesh can be reached on (571) 270-3351. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/STN/Examiner, Art Unit 4163                                                                                                                                                                                                        



/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126