DETAILED ACTION
	This action is in response to the arguments filed 06/03/2022. Currently claims 1,-4, 6, 7, 10-12, 14, 15, 17-19, 28-31 are pending. Claims 5, 8-9, 13, 16, 20-27 are cancelled. Claims 28-31 are new
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/03/2022 has been entered.

Response to Arguments
Applicant's arguments filed 06/03/2022 have been fully considered but they are not persuasive. Applicant’s arguments addressed in the previous advisory action were reconsidered. Applicant asserts that the cited art does not teach the amended claims. Examiner has updated the rejection accordingly. The claims 1,-4, 6, 7, 10-12, 14, 15, 17-19, 28-31 have been rejected in view of Zhiyanov/Wan/Santos/Pinero

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 4, 6, 7, 10, 12, 14, 15, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhiyanov et al. US Patent number US-10565498-B1, hereinafter Zhiyanov, further in view of Santos et al. “Attentive Pooling Networks” hereinafter Santos, further still in view of Pinero et al “DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes” hereinafter Pinero, further still in view of Wan et al. “Deep learning with feature embedding for compound-protein interaction prediction” hereinafter Wan.

Regarding Claim 1
	Zhiyanov teaches a computer-implemented method for using a neural network model (Col 19 line 46-47 “Methods for Similarity Analysis Using Deep Neural Network Models”) generating, by a computer, vector representations of respective tokens  (Col 12 line 60-62  “the text content of an example attribute (Title 402) may be processed into a set of zero or more text tokens”) generating, using a neural network, hidden vectors for the vector representations (Col 10 line 66-68 “The raw text of the attributes may be processed and converted into a set of intermediate [hidden] vectors by a token model layer [FIG4. Neural network]” The examiner notes that the intermediate vectors in 255 are between two dense layers, which corresponds to the claim term, “hidden”) to generate hidden matrices (Col 11 line 8-9 “an attribute model output[hidden] vector [matrix] (AMOV) may be generated” The examiner notes that a vector corresponds to a 1xn “matrix”) concatenating the hidden matrices and generating respective concatenated matrices (Col 11 line 16-17 “the AMOVs may be combined (e.g., by concatenation)”) correlating the concatenated matrices; and predicting a probability of an association…using the concatenated matrices. (Col 11 line 16-26 “In at least some embodiments, the AMOVs may be combined (e.g., by concatenation) and provided as input to a first dense or dully-connected layer 250A of the deep neural network 202…The output of the second dense layer 250B may comprise the similarity score 270 e.g., a real number or integer indicating the probability”) 
	Zhiyanov does not appear to teach, determining an association between biomedical entities in a biomedical entity pair, comprising: of biomedical entities of the biomedical entity pair; between the biomedical entities of the biomedical entity pair based at least in part on respective attention vectors generated by attentive pooling wherein the probability of the association between the biomedical entities is provided using two weighted vectors which correspond to a degree of contribution of each input; processing a new biomedical entity pair not appearing in [[the]]a training set for which a prior association is not known, wherein the new biomedical entity pair includes a gene sequence as a first biomedical entity and a disease sequence as a second biomedical entity; and determining a probability binding exists between the new biomedical entity pair based on a sigmoid of a product of a vector representation of the gene sequence and a vector representation of the disease sequence.
However Wan teaches when addressing issues related to association determination between a pair of biomedical entities with a neural network teaches, for determining an association between biomedical entities in a biomedical entity pair of biomedical entities of the biomedical entity pair… between the biomedical entities of the biomedical entity pair (abstract and Figure 1 pg 19 “We propose a new scheme that combines feature embedding (a technique of representation learning) with deep learning for predicting compound-protein interactions.” The system determines interactions or associations between to biomedical entities, a compound and a protein.) processing a new biomedical entity pair not appearing in the training set for which a prior association is not known (pg 1 “Also, they [previous approaches]  generally failed to predict potential interacting compounds for a given new target (i.e., with no known interacting compound in training data)” pg 2 “Inspired by recent progress in representation learning and deep learning, we propose a new framework that combines unsupervised representation learning with the current powerful deep learning techniques for structure-free drug-target interaction prediction… The comparisons to several baseline methods have demonstrated the superior performance of our approach in predicting new compound-protein interactions, especially when the interaction knowledge of compounds and proteins is unknown.” pg 8 “This result suggested that our deep learning model were much more capable of predicting new compound-protein pairs whose interaction knowledge is entirely unknown.” The system unlike previous approaches is able to process new biomedical entities with unknown association not appearing in a training set.) and determining a probability binding exists between the new biomedical entity pair based on a sigmoid of a product of a vector representation of the [inputs to the model] (pg 6 Section 2.4 ¶02 “After we calculate aH for the final hidden layer, the output layer Lout is simply a logistic regression model that takes aH as its input and compute… 
    PNG
    media_image1.png
    41
    170
    media_image1.png
    Greyscale
…where the output yˆ is the confidence score of the predicted binding between the given compound-protein pair, σ is the sigmoid function”)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to use a sigmoid output layer to determine a binding probability between biomedical entities based on neural network processing as disclosed by Wan to the disclosed invention of Zhiyanov.
	One of ordinary skill in the arts would have been motivated to make this modification because both Wan and Zhiyanov discuss using neural networks to processor biomedical entities. In order to make a classification a regression model is used to map the output of a model to a range from 0 to 1. Wan notes that “Our deep neural network (DNN) is a sequence of fully-connected layers that take the concatenated 300-dimensional feature vector of each compound-protein pair as input and classify this pair as binding or non-binding” and works with a training set of binary classifications “Suppose that we are given a training data set…. where N stands for the total number of compound-protein pairs, yi = 1 means that compound ci and protein pi bind to each other, and yi = 0 otherwise” (Wan Section 2.4)
Zhiyanov/Wan does not explicitly teach, based at least in part on respective attention vectors generated by attentive pooling wherein the probability of the association between the biomedical entities is provided using two weighted vectors which correspond to a degree of contribution of each input;, wherein the new biomedical entity pair includes a gene sequence as a first biomedical entity and a disease sequence as a second biomedical entity; [the input to the model are a] gene sequence and a vector representation of the disease sequence.
However Santos when addressing issues related to modify pooling layer with attentive pooling strategies teaches, based at least in part on respective attention vectors generated by attentive pooling (Section 3 “When AP[attentive pooling] is applied to CNN, which we call AP-CNN, the network learns the similarity measure over the convolved input sequences[ learns to predict]. … Next, we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL…Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa…Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa and the output of the convolution (or biLSTM) over q and a” the result of the max pooling layer is modified to produce attention vectors corresponding to attentive pooling.) wherein the probability of the association [as mapped by Zhiyanov] between the [input entities] is provided using two weighted vectors which correspond to a degree of contribution of each input (Section 3 “Attentive pooling is an approach that enables the pooling layer to be aware of the current input pair… Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa… Finally, the representations rq and ra are compute 
    PNG
    media_image2.png
    67
    105
    media_image2.png
    Greyscale
” the input entities are used to compute representations which are weighted according to the attention vectors. The higher the value of the attention vector the more attention is paid to the particular input corresponding to the degree of contribution. As previously discussed, the probability of the association is based on the output of the model.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network which determines similarity between two entities by employing an attentive pooling layer by Santos to the disclosed invention of Zhiyanov/Wan.
	One of ordinary skill in the arts would have been motivated to make this modification as “AP can be effectively used with CNNs and biLSTM in the context of the answer selection task” (Conclusion Santos)
	While Zhiyanov/Wan/Santos does not explicitly teach, wherein the new biomedical entity pair includes a gene sequence as a first biomedical entity and a disease sequence as a second biomedical entity, these labels appear to be nonfunctional descriptive language and carries no patentable weight. Zhiyanov/Wan/Santos clearly teaches the functional limitations of the claim including biomedical entities and new biomedical entities. Nevertheless, it is noted that even if this limitation merits patentable weight, it would be obvious in view of Pinero’s teaching.
Pinero, teaches the specific biomedical entities that can be applied to the biomedical entities discussed in Zhiyanov/Wan/Santos.	In particular Pinero teaches, wherein the new biomedical entity pair includes a gene sequence as a first biomedical entity and a disease sequence as a second biomedical entity; [the input to the model are a] gene sequence and a vector representation of the disease sequence. (“Gene vocabulary. For human genes, HGNC symbols and UniProt accession numbers have been converted to NCBI Entrez Gene identifiers” “Disease vocabulary. The vocabulary used for diseases in the current release of DisGeNET is the Unified Medical Language System” “characterize the relationships between genes and diseases, we use the DisGeNET association type ontology” pg 13 examiner notes that as shown in Figure 7 a system for discovering gene-disease association, which measures association between a disease sequence and a gene sequence.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to use data modeling methods to characterize the relationships between a gene sequence and a disease sequence as disclosed by Pinero to the disclosed invention of Zhiyanov/Wan/Santos.
	One of ordinary skill in the arts would have been motivated to make this modification because there exists “comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin” and because “Biomedical sciences are facing an enormous increase of data available in public sources, not only in volume, but also in nature” therefore processing this type of data is advantageous. ( Abstract and Background ¶01)

Regarding Claim 2 
	Zhiyanov/Wan/Santos/Pinero teaches claim 1.
	Further Zhiyanov teaches, wherein generating vector representations of biomedical entities of the biomedical entity pairs comprises processing tokens of the biomedical entities (Col 15 line 8-16 “The raw token [biomedical entity]…is mapped to a numerical value…producing a token string vector... in a dictionary generated for token embedding in the similarity analysis” combined with Wan as discussed in claim 1, the tokens can be tokenized biomedical entities corresponding to “biomedical entity”) via an embedding lookup layer (Col 15 line 15-16 “in a dictionary [lookup entity] generated for token embedding in the similarity analysis”, Col 4 line 36-38 “The neural network model may logically comprise a hierarchy of layers in some embodiments, including a token model layer [consisting of embedded lookup]” The examiner notes that tokenization is done via a LSTM depicted in Fig. 4 that consists of layers embedding the text into an attribute vector corresponding to an “embedded lookup layer” )

-- Regarding Claim 3
	Zhiyanov/Wan/Santos/Pinero teaches the method in claim 1.
Further Wan teaches, wherein a biomedical entity is selected from a group consisting of gene sequences, protein sequences, chemical structures, knowledge graphs, and combinations thereof. (pg 19 Figure 1 caption “Figure A1: The schematic workflow of our prediction model. To predict the binding score (i.e., the probability of interaction) of a given compound-protein pair, protein sequence and the compound InChI (or SMILES and SDF) format are provided as inputs to the modules of protein and compound embeddings” the binding prediction system takes as input two biomedical entities, a protein sequence and a SMILES compound or chemical structure. Presently the claims only require 1 of the entities in the list to be taught by prior art.)
Regarding Claim 4
Zhiyanov/Wan/Santos/Pinero teaches claim 1.
Further Zhiyanov teaches, wherein the neural network is a Long Short Term Memory (LSTM) recurrent neural network (RNN), wherein the RNN is used to project sequential inputs to dense vector representations (Col 8 line 15-17 “In some embodiments, Long Short Term Memory (LSTM) units may be used for one or more RNN layers of the deep neural network model” Col 11 line 16-21 “the AMOVs may be combined (e.g., by concatenation) and provided as input to a first dense or dully-connected layer 250A of the deep neural network 202, for which a first weight matrix 260A may be learned during training. The output of the first dense layer 250A may comprise another intermediate values vector” Neural networks generally involve taking an input and projecting it to a dense vector representation, or intermediate values vector, by applying learned weights to the input.)
Further Wan teaches, where the protein sequences are represented by amino acid sequences and drugs are represented by SMILES strings (pg 19 Figure 1 caption “Figure A1: The schematic workflow of our prediction model. To predict the binding score (i.e., the probability of interaction) of a given compound-protein pair, protein sequence and the compound InChI (or SMILES and SDF) format are provided as inputs to the modules of protein and compound embeddings” pg 5 last paragraph “each protein in our framework is regarded as a “sentence” reading from its N-terminus to C-terminus and every three nonoverlapping amino acid residues” the binding score model which compares the ‘probability of binding’ between two biomedical entities uses a protein sequence and compounds or drugs represented as SMILES strings.)
Regarding Claim 6
Zhiyanov/Wan/Santos/Pinero teaches claim 1.
Further, Santos when addressing issues related to neural network attention teaches, wherein the attentive pooling comprises row-wise attentive pooling and column-wise attentive pooling (Figure 2 Section 3 “we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL, Respectively… Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa …Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa” as shown in the figure the composite concatenated matrix G has max pooling applied. The column wise and row wise pooling is seen in the figure, the max pooling vectors are used to create attention vectors.)
Regarding Claim 7
Zhiyanov/Wan/Santos/Pinero teaches claim 6.
Further Wan teaches, corresponding to the biomedical entity pairs (Fig. A1 pg 19 Depicts a compound protein pair as input to a neural network)
	Further, Santos when addressing issues related to neural networks teaches generating attention vectors, based on the attentive pooling (Figure 2 Section 3 “we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL, Respectively” “Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa” as shown in the figure the composite concatenated matrix G has max pooling applied. The column wise and row wise pooling is seen in the figure, the max pooling vectors are used to create attention vectors.)
Regarding Claim 10
	Zhiyanov teaches A computer system for using a neural network model (Col 19 line 46-47 “Methods for Similarity Analysis Using Deep Neural Network Models”) one or more computer devices each having one or more processors and one or more tangible storage devices; and a program embodied on at least one of the one or more storage devices, the program having a plurality of program instructions for execution by the one or more processors, the program instructions comprising instructions for ( Col 11 line 42-48 “In various embodiments, respective data structures or objects may be allocated in memory at one or more computing devices to represent neurons or nodes of each of the layers of the deep neural network model. Furthermore, portions of the memory may also be utilized to store program instructions representing the logic exercised to train and execute the model”) 
The remaining limitation are rejected under Zhiyanov/Wan/Santos/Pinero for the reasons set forth in claim 1
Regarding Claim 11
	Claim 11 is rejected for the reasons set forth in claim 3 and claim 9
Regarding Claim 12
Claim 12 is rejected for the reasons set forth in claim 4 and claim 10
Regarding Claim 14
Claim 14 is rejected for the reasons set forth in claim 6 and claim 10
Regarding Claim 15
Claim 15 is rejected for the reasons set forth in claim 7 and claim 10
Regarding Claim 17
Zhiyanov teaches A computer program product for using a neural network model  (Col 19 line 46-47 “Methods for Similarity Analysis Using Deep Neural Network Models”) the computer program product comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising (Col 5 line 55-58 “it may be deployed to respond to similarity queries which may be submitted using one or more programmatic interfaces in various embodiments”) 
The remaining limitation are rejected under Zhiyanov/Wan/Santos/Pinero for the reasons set forth in claim 1
Regarding Claim 18
	Claim 18 is rejected for the reasons set forth in claim 3 and claim 17
Regarding Claim 19
Claim 19 is rejected for the reasons set forth in claim 4 and claim 17
Regarding Claim 28
Zhiyanov/Wan/Santos/Pinero teaches claim 3.
Further Santos teaches, Wherein a CNN is used to project [input vectors] to dense representations. (Abstract “Our two-way attention mechanism is a general framework independent of the underlying representation learning, and it has been applied to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in our studies.” Section 1 introduction ¶04 “Thanks to the two-way attention, our model projects the paired inputs, … into a common representation space” Examiner notes that the attention framework is usable in both RNN and CNN topologies. Further it is understood that a “representation space” is equivalent to a ‘dense representation’.)
Further Wan teaches, wherein [a neural network model] is used to project drugs represented by the chemical structure (Section 2.1 “predicting interactions of compound-protein (or drug-target) pairs by a deep neural network” as demonstrated in previous rejections, the compound or drugs in Wan are represented as SMILES string which describe the chemical structure.)
Regarding Claim 29
Zhiyanov/Wan/Santos/Pinero teaches claim 1.
Further Santos teaches, wherein the vector representation of [the first input vector] and the vector representation of [the second input vector] is determined based on a weighted sum of its hidden matrix and a softmax of its attention vector. (pg 4 “ Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa” the softmax of the attention vectors are σq and σa. “Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa and the output of the convolution (or biLSTM) over q and a” the claimed vector representations for both input vectors are based on a ‘dot product’ between the softmax of the attention vector and the output of the convolution. The output of a convolution layer results in a hidden matrix. One of ordinary skill in the art would recognize that a dot product is a mathematical operation that weights each element of a vector by its corresponding weight and performs a summation on the result, thus equivalent to the claimed ‘weighted sum’.)
Further Pinero teaches, [the first input vector] the gene sequence, [the second input vector] the disease sequence (“Gene vocabulary. For human genes, HGNC symbols and UniProt accession numbers have been converted to NCBI Entrez Gene identifiers” “Disease vocabulary. The vocabulary used for diseases in the current release of DisGeNET is the Unified Medical Language System” “characterize the relationships between genes and diseases, we use the DisGeNET association type ontology” pg 13 examiner notes that as shown in Figure 7 a system for discovering gene-disease association, which measures association between a disease sequence and a gene sequence.)
Regarding Claim 30
Zhiyanov/Wan/Santos/Pinero teaches claim 1.
Further Santos teaches, further comprising: repeating, iteratively, steps of the method using the training set; (Section 2.2 “Both networks are trained by minimizing a pairwise ranking loss function over the training set D” training a neural network according to a loss function over a training set amounts to repeating, iteratively the steps of the network.) and optimizing parameters of the neural network to maximize the predicted probability of an association for the training dataset. (Section 2.2 “The input in each round is two pairs (q, a+) and (q, a−), where a+ is a ground truth answer for q, and a− is an incorrect answer… we define the training objective as a hinge loss… 
    PNG
    media_image3.png
    39
    296
    media_image3.png
    Greyscale
” the loss function seeks to maximise correctly predicted associations or answers while minimizing incorrect associations. In the context of boimedical entities a ground truth answer is a matching association between the two inputs of the model.)

Regarding Claim 31
Zhiyanov/Wan/Santos/Pinero teaches claim 1.
Further Santos teaches, maximizing a likelihood of observing the training set data for each protein by increasing a margin between interacting drugs and non-interacting drugs using a pairwise ranking loss. (Section 2.2 “Both networks are trained by minimizing a pairwise ranking loss function over the training set D” training a neural network according to a loss function over a training set amounts to repeating, iteratively the steps of the network Section 2.2 “The input in each round is two pairs (q, a+) and (q, a−), where a+ is a ground truth answer for q, and a− is an incorrect answer… we define the training objective as a hinge loss… 
    PNG
    media_image3.png
    39
    296
    media_image3.png
    Greyscale
” the loss function seeks to maximise correctly predicted associations or answers while minimizing incorrect associations. In the context of boimedical entities a ground truth answer is a matching association between the two inputs of the model.)
Further Wan teaches, [the first input] is a protein, [the second input] is a drug  (Section 2.1 “predicting interactions of compound-protein (or drug-target) pairs by a deep neural network” as demonstrated in previous rejections, the compound or drugs in Wan are represented as SMILES string which describe the chemical structure.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.R.G./
Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/            Supervisory Patent Examiner, Art Unit 2122