DETAILED ACTION
	This action is in response to the arguments filed 07/23/2021. Currently claims 1-4, 6-7, 9-12, 14-15, 17-19, 21-24 are pending. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/23/2021 has been entered.
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 23 and 24 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 23 and 24 recites the limitation "the weighted vector representation". There is insufficient antecedent basis for this limitation in the claim.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 2, 4, 6, 7, 9, 10, 12, 14, 15, 17, 19, 21-24  are rejected under 35 U.S.C. 103 as being unpatentable over Zhiyanov et al. US Patent number US-10565498-B1, hereinafter Zhiyanov in view of Yerebakan et al. US publication number US-20180196873-A1 hereinafter Yerebakan. Further in view of Santos et al. “Attentive Pooling Networks” hereinafter Santos

Regarding Claim 1
	Zhiyanov teaches a computer-implemented method for using a neural network model (Col 19 line 46-47 “Methods for Similarity Analysis Using Deep Neural Network Models”) generating, by a computer, vector representations of respective tokens  (Col 12 line 60-62  “the text content of an example attribute (Title 402) may be processed into a set of zero or more text tokens”) generating, using a neural network, hidden vectors for the vector representations (Col 10 line 66-68 “The raw text of the attributes may be processed and converted into a set of intermediate [hidden] vectors by a token model layer [FIG4. Neural network]” The examiner notes that the intermediate vectors in 255 are between two dense layers, which corresponds to the claim term, “hidden”) to generate hidden matrices (Col 11 line 8-9 “an attribute model output[hidden] vector [matrix] (AMOV) may be generated” The examiner notes that a vector corresponds to a 1xn “matrix”) concatenating the hidden matrices and generating respective concatenated matrices (Col 11 line 16-17 “the AMOVs may be combined (e.g., by concatenation)”) correlating the concatenated matrices; and predicting a probability of an association…using the concatenated matrices. (Col 11 line 16-26 “In at least some embodiments, the AMOVs may be combined (e.g., by concatenation) and provided as input to a first dense or dully-connected layer 250A of the deep neural network 202…The output of the second dense layer 250B may comprise the similarity score 270 e.g., a real number or integer indicating the probability”) 
	Zhiyanov does not appear to teach for determining an association between biomedical entities in a biomedical entity pair… of biomedical entities of the biomedical entity pair… between the biomedical entities of the biomedical entity pair; repeating, iteratively, steps of the method using a training dataset; and optimizing parameters of the neural network to maximize the predicted probability of an association for the training data set. attention vectors generated by attentive pooling
	However, Yerebakan when addressing issues related to neural networks for similarity prediction for biomedical entities teaches for determining an association between biomedical entities in a biomedical entity pair (¶0038 “A pair of documents… The distance d is then used as the similarity value of the pair of documents”), (¶0003 “Radiologists typically search for previous relevant reports (or documents) from a radiological database”) of biomedical entities of the biomedical entity pair… between the biomedical entities of the biomedical entity pair (Fig. 3 Depicts a pair of documents describing biomedical entities) repeating, iteratively, steps of the method using a training dataset (¶0053 “The network was trained for 10 epochs with a batch size of 200.”, ¶0052 “To test the performance of the representation learning framework, a corpus containing a large number of anonymized radiology reports was obtained from hospitals”) and optimizing parameters of the neural network (¶0033 “….an objective function is optimized” in the art optimizing an objective function entails changing parameters) to maximize the predicted probability (¶0033 “Such objective function enforces a pair of documents with positive labels to have lower distance to each other [meaning high probability of similarity]”) of an association for the training dataset. (¶0053 “to determine the semantic similarity [association] between documents.” The examiner notes that similarity corresponds to the claimed “association”)
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network method for determining the similarity between biomedical document entities as taught by Yerebakan to the disclosed invention of Zhiyanov.
	One of ordinary skill in the arts would have been motivated to make this modification in order to improve radiologist comparison of relevant documents from a radiological database, specifically of unstructured text documents. The automatic matching of reports is non-trivial and requires systems which can provide semantic understanding of text. (Background Yerebaken)
Zhiyanov/Yerebaken is does not explicitly teach, attention vectors generated by attentive pooling
However Santos when addressing issues related to modify pooling layer with attentive pooling strategies teaches, attention vectors generated by attentive pooling (Section 3 “When AP[attentive pooling] is applied to CNN, which we call AP-CNN, the network learns the similarity measure over the convolved input sequences[ learns to predict]. … Next, we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL…Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa…Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa and the output of the convolution (or biLSTM) over q and a” the result of the max pooling layer is modified to produce attention vectors corresponding to attentive pooling.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network which determines similarity between two entities by employing an attentive pooling layer by Santos to the disclosed invention of Zhiyanov/Yerebaken.
	One of ordinary skill in the arts would have been motivated to make this modification as “AP can be effectively used with CNNs and biLSTM in the context of the answer selection task” (Conclusion Santos)

Regarding Claim 2 
	Zhiyanov/Yerebakan/Santos teaches claim 1.
	Further Zhiyanov teaches, wherein generating vector representations of biomedical entities of the biomedical entity pairs comprises processing tokens of the biomedical entities (Col 15 line 8-16 “The raw token [biomedical entity]…is mapped to a numerical value…producing a token string vector... in a dictionary generated for token embedding in the similarity analysis” combined with Yerebakan, the tokens can be tokenized biomedical documents corresponding to “biomedical entity”) via an embedding lookup layer (Col 15 line 15-16 “in a dictionary [lookup entity] generated for token embedding in the similarity analysis”, Col 4 line 36-38 “The neural network model may logically comprise a hierarchy of layers in some embodiments, including a token model layer [consisting of embedded lookup]” The examiner notes that tokenization is done via a LSTM depicted in Fig. 4 that consists of layers embedding the text into an attribute vector corresponding to an “embedded lookup layer” )

Regarding Claim 4
Zhiyanov/Yerebakan/Santos teaches claim 1.
Further Zhiyanov teaches, wherein the neural network is a Long Short Term Memory (LSTM) recurrent neural network (RNN) (Col 8 line 15-17 “In some embodiments, Long Short Term Memory (LSTM) units may be used for one or more RNN layers of the deep neural network model”) 
	
Regarding Claim 6
Zhiyanov/Yerebakan/Santos teaches claim 1.
Further, Santos when addressing issues related to neural networks teaches wherein the attentive pooling comprises row-wise attentive pooling and column-wise attentive pooling (Figure 2 Section 3 “we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL, Respectively… Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa …Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa” as shown in the figure the composite concatenated matrix G has max pooling applied. The column wise and row wise pooling is seen in the figure, the max pooling vectors are used to create attention vectors.)

Regarding Claim 7
Zhiyanov/Yerebakan/Santos teaches claim 6.
Further Yerebakan teaches, corresponding to the biomedical entity pairs (Fig. 3 Depicts a pair of documents describing biomedical entities as inputs to the matching system)
	Further, Santos when addressing issues related to neural networks teaches generating attention vectors, based on the attentive pooling (Figure 2 Section 3 “we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL, Respectively” “Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa” as shown in the figure the composite concatenated matrix G has max pooling applied. The column wise and row wise pooling is seen in the figure, the max pooling vectors are used to create attention vectors.)

Regarding Claim 9
Zhiyanov/Yerebakan/Santos teaches claim 1.
	Further Yerebakan teaches, processing a new biomedical entity pair not appearing in the training set and for which a prior association is not known (¶0053 “Data was split randomly into training and testing sets at patient level”) and determining a probability of association between biomedical entities of the new biomedical entity pair (¶0052 “to determine the semantic similarity between documents”) 
	Further Santos teaches, based on weighted vector representations (Section 2.3 “Given the matrices Q and A, we compute the vector representations rq ∈ Rc and ra ∈ Rc by applying a column-wise max-pooling over Q and A, followed by a non-linearity. Formally, the j-th elements of the vectors rq and ra are compute as follows:…
    PNG
    media_image1.png
    108
    349
    media_image1.png
    Greyscale
 The last layer in QA-CNN and QA-biLSTM scores the input pair (q,a) by computing the cosine similarity between the two representations:” the rq and ra vectors correspond to weighted vector representations, the score is corresponds to the degree of association)

Regarding Claim 10
Zhiyanov teaches A computer system for using a neural network model (Col 19 line 46-47 “Methods for Similarity Analysis Using Deep Neural Network Models”) one or more computer devices each having one or more processors and one or more tangible storage devices; and a program embodied on at least one of the one or more storage devices, the program having a plurality of program instructions for execution by the one or more processors, the program instructions comprising instructions for ( Col 11 line 42-48 “In various embodiments, respective data structures or objects may be allocated in memory at one or more computing devices to represent neurons or nodes of each of the layers of the deep neural network model. Furthermore, portions of the memory may also be utilized to store program instructions representing the logic exercised to train and execute the model”) generating, by a computer, generating vector representations of respective tokens (Col 12 line 60-62  “the text content of an example attribute (Title 402) may be processed into a set of zero or more text tokens”) generating, using a neural network, hidden vectors for the vector representations (Col 10 line 66-68 “The raw text of the attributes may be processed and converted into a set of intermediate [hidden] vectors by a token model layer [FIG4. Neural network]” The examiner notes that the intermediate vectors in 255 are between two dense layers, which corresponds to the claim term, “hidden”) to generate hidden matrices (Col 11 line 8-9 “an attribute model output[hidden] vector [matrix] (AMOV) may be generated” The examiner notes that a vector corresponds to a 1xn “matrix) concatenating the hidden matrices and generating respective concatenated matrices (Col 11 line 16-17 “the AMOVs may be combined (e.g., by concatenation)”) correlating the concatenated matrices; and predicting a probability of an association… using the concatenated matrices.  (Col 11 line 16-26 “In at least some embodiments, the AMOVs may be combined (e.g., by concatenation) and provided as input to a first dense or dully-connected layer 250A of the deep neural network 202…The output of the second dense layer 250B may comprise the similarity score 270 e.g., a real number or integer indicating the probability”)
	Zhiyanov does not appear to teach for determining an association between biomedical entities in a biomedical entity pair… of biomedical entities of the biomedical entity pair… between the biomedical entities of the biomedical entity pair. repeating, iteratively, steps of the method using a training dataset; and optimizing parameters of the neural network to maximize the predicted probability of an association for the training data set. attention vectors generated by attentive pooling
	However, Yerebakan when addressing issues related to neural networks teaches for determining an association between biomedical entities in a biomedical entity pair (¶0038 “A pair of documents… The distance d is then used as the similarity [association] value of the pair of documents”), (¶0003 “Radiologists typically search for previous relevant reports (or documents) from a radiological database”) of biomedical entities of the biomedical entity pair… between the biomedical entities of the biomedical entity pair (Fig. 3 Depicts a pair of documents describing biomedical entities)
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network method for determining the similarity between biomedical document entities as taught by Yerebakan to the disclosed invention of Zhiyanov.
	One of ordinary skill in the arts would have been motivated to make this modification in order to improve radiologist comparison of relevant documents from a radiological database, specifically of unstructured text documents. The automatic matching of reports is non-trivial and requires systems which can provide semantic understanding of text. (Background Yerebaken)
Zhiyanov/Yerebaken does not appear to teach, attention vectors generated by attentive pooling
However Santos when addressing issues related to modify pooling layer with attentive pooling strategies teaches, attention vectors generated by attentive pooling (Section 3 “When AP[attentive pooling] is applied to CNN, which we call AP-CNN, the network learns the similarity measure over the convolved input sequences[ learns to predict]. … Next, we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL…Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa…Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa and the output of the convolution (or biLSTM) over q and a” the result of the max pooling layer is modified to produce attention vectors corresponding to attentive pooling.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network which determines similarity between two entities by employing an attentive pooling layer by Santos to the disclosed invention of Zhiyanov/Yerebaken.
	One of ordinary skill in the arts would have been motivated to make this modification as “AP can be effectively used with CNNs and biLSTM in the context of the answer selection task” (Conclusion Santos)

Regarding Claim 12
Claim 12 is rejected for the reasons set forth in claim 4 and claim 10
Regarding Claim 14
Claim 14 is rejected for the reasons set forth in claim 6 and claim 10
Regarding Claim 15
Claim 15 is rejected for the reasons set forth in claim 7 and claim 10

Regarding Claim 17
Zhiyanov teaches A computer program product for using a neural network model  (Col 19 line 46-47 “Methods for Similarity Analysis Using Deep Neural Network Models”) the computer program product comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising (Col 5 line 55-58 “it may be deployed to respond to similarity queries which may be submitted using one or more programmatic interfaces in various embodiments”) generating, by a processor, generating vector representations of respective tokens (Col 12 line 60-62  “the text content of an example attribute (Title 402) may be processed into a set of zero or more text tokens”) generating, by the processor, using a neural network, hidden vectors for the vector representations (Col 10 line 66-68 “The raw text of the attributes may be processed and converted into a set of intermediate [hidden] vectors by a token model layer [FIG4. Neural network]” The examiner notes that the intermediate vectors in 255 are between two dense layers, which corresponds to the claim term, “hidden) to generate hidden matrices (Col 11 line 8-9 “an attribute model output[hidden] vector [matrix] (AMOV) may be generated” The examiner notes that a vector corresponds to a 1xn “matrix”) concatenating, by the processor, the hidden matrices and generating respective concatenated matrices (Col 11 line 16-17 “the AMOVs may be combined (e.g., by concatenation)”) correlating, by the processor, the concatenated matrices; and predicting a probability of an association…using the concatenated matrices. (Col 11 line 16-26 “In at least some embodiments, the AMOVs may be combined (e.g., by concatenation) and provided as input to a first dense or dully-connected layer 250A of the deep neural network 202…The output of the second dense layer 250B may comprise the similarity score 270 e.g., a real number or integer indicating the probability”)
	Zhiyanov does not appear to teach for determining an association between biomedical entities in a biomedical entity pair… of biomedical entities of the biomedical entity pair… between the biomedical entities of the biomedical entity pair. repeating, iteratively, steps of the method using a training dataset; and optimizing parameters of the neural network to maximize the predicted probability of an association for the training data set. attention vectors generated by attentive pooling
	However, Yerebakan when addressing issues related to neural networks teaches for determining an association between biomedical entities in a biomedical entity pair (¶0038 “A pair of documents… The distance d is then used as the similarity value of the pair of documents”), (¶0003 “Radiologists typically search for previous relevant reports (or documents) from a radiological database”) of biomedical entities of the biomedical entity pair… between the biomedical entities of the biomedical entity pair (Fig. 3 Depicts a pair of documents describing biomedical entities)
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network method for determining the similarity between biomedical document entities as taught by Yerebakan to the disclosed invention of Zhiyanov.
	One of ordinary skill in the arts would have been motivated to make this modification in order to improve radiologist comparison of relevant documents from a radiological database, specifically of unstructured text documents. The automatic matching of reports is non-trivial and requires systems which can provide semantic understanding of text. (Background Yerebaken)
Zhiyanov/Yerebaken does not appear to teach, attention vectors generated by attentive pooling
However Santos when addressing issues related to modify pooling layer with attentive pooling strategies teaches, attention vectors generated by attentive pooling (Section 3 “When AP is applied to CNN, which we call AP-CNN, the network learns the similarity measure over the convolved input sequences. When AP is applied to biLSTM, which we call AP-biLSTM, the network learns the similarity measure over the hidden states produced by the biLSTM when processing the two input sequences… Next, we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL…Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa…Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa and the output of the convolution (or biLSTM) over q and a” the result of the max pooling layer is modified to produce attention vectors corresponding to attentive pooling.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network which determines similarity between two entities by employing an attentive pooling layer by Santos to the disclosed invention of Zhiyanov/Yerebaken.
	One of ordinary skill in the arts would have been motivated to make this modification as “AP can be effectively used with CNNs and biLSTM in the context of the answer selection task” (Conclusion Santos)

Regarding Claim 18
	Claim 18 is rejected for the reasons set forth in claim 3 and claim 17
Regarding Claim 19
Claim 19 is rejected for the reasons set forth in claim 4 and claim 17

Regarding Claim 21
Zhiyanov/Yerebakan/Santos teaches claim 9.
Further Santos teaches, wherein the weighted vector representations are derived by a trained neural network (Section 3.2 “Both networks are trained by minimizing a pairwise ranking loss function over the training set D” Section 5.1 ¶02 “This dataset provides a training set, a validation set, and two test sets.” A trained network produces the weighted vectors described in claim 9, Further the trained network is used with a validation and test set, which when fed to the network as described produces weighted vector representations.)based on at least the attention vectors generated by attentive pooling. (Section 3 “When AP is applied to CNN, which we call AP-CNN, the network learns the similarity measure over the convolved input sequences. When AP is applied to biLSTM, which we call AP-biLSTM, the network learns the similarity measure over the hidden states produced by the biLSTM when processing the two input sequences… Next, we apply column-wise and row-wise max-poolings over G to generate the vectors gq ∈ RM and ga ∈ RL…Next, we apply the softmax function to the vectors gq and ga to create attention vectors σq and σa…Finally, the representations rq and ra are computed as the dot product between the attention vectors σq and σa and the output of the convolution (or biLSTM) over q and a” the result of the max pooling layer is modified to produce attention vectors corresponding to attentive pooling.)

Regarding Claim 22
Zhiyanov/Yerebakan/Santos teaches the method in claim 9.
Further Santos teaches, wherein the weighted vector representations are a degree of contribution for each input of the probability of an association determination.  ( Section 2.3 “Given the matrices Q and A, we compute the vector representations rq ∈ Rc and ra ∈ Rc by applying a column-wise max-pooling over Q and A, followed by a non-linearity. Formally, the j-th elements of the vectors rq and ra are compute as follows:…
    PNG
    media_image1.png
    108
    349
    media_image1.png
    Greyscale
 The last layer in QA-CNN and QA-biLSTM scores the input pair (q,a) by computing the cosine similarity between the two representations:” the rq and ra vectors correspond to weighted vector representations, the score is corresponds to the degree of association. The vectors represent a degree of contribution for each input in the pair of input, whose elements indicate similarity or association in the pair of inputs.)

Regarding Claim 23
Zhiyanov/Yerebakan/Santos teaches claim 10.
Further Yerebaken teaches, processing a new biomedical entity pair not appearing in the training set and for which a prior association is not known (¶0053 “Data was split randomly into training and testing sets at patient level”) and determining a probability of association between biomedical entities of the new biomedical entity pair (¶0052 “to determine the semantic similarity between documents”)
Further Santos teaches, based on the weighted vector representations of the new … entity pair.  (Figure 2 Section 3.2 “Both networks are trained by minimizing a pairwise ranking loss function over the training set D” Section 5.1 ¶02 “This dataset provides a training set, a validation set, and two test sets.” A trained network produces the weighted vectors described in claim 9, rq and ra shown in figure 2. Further the trained network is used with a validation and test set, which when fed to the network as described produces weighted vector representations.)

Regarding Claim 24
Claim 24 is rejected for the reasons set forth in claim 23 and claim 17

Claims 3, 11, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhiyanov/Yerebaken/Santos et al. Further in view of Muhammed S. Amed “SIGNET: A Neural Network architecture for Predicting Protein-Protein Interactions” hereinafter Ahmed

-- Regarding Claim 3
	Zhiyanov/Yerebakan/Santos teaches the method in claim 1.
Zhiyanov/Yerebaken/Santos does not appear to teach, wherein a biomedical entity is selected from a group consisting of gene sequences, protein sequences, chemical structures, knowledge graphs, and combinations thereof.
However Ahmed when addressing issues related neural networks for determining similarity between pairs of entities teaches, wherein a biomedical entity is selected from a group consisting of gene sequences, protein sequences, chemical structures, knowledge graphs, and combinations thereof. (Section 4 “we describe our method of preparing proteins as protein signatures and
outline a novel neural network architecture, called SigNet, for PPI prediction” Section 4.1 “Majority of neural network models require fixed-length feature vectors as inputs. Since proteins are represented as variable length strings (primary amino acid sequence), we use Martin et al.’s definition of a protein signature to prepare proteins as fixed-length vectors.” Section 4.2 “SigNet is a siamese convolutional neural-network. An overview of its architecture is shown in Figure 4.2. Each protein signature vector first goes through two 1D convolutional layers” the PPI prediction method predicts interactions or associations between two entities selected from a training set. The protein signature is a representation of the protein sequence. Examiner notes that a chemical structure corresponds to a protein sequence. Presently the claims only require 1 of the entities in the list to be taught by prior art.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a neural network architecture that takes representations of proteins as input in order to determine associations between the proteins taught by Ahmed to the disclosed invention of Zhiyanov/Yerebaken/Santos.
	One of ordinary skill in the arts would have been motivated to make this modification as “architecture based on protein signatures, SigNet, was the best predictive model… SigNet outperforms them [other methods] on many test cases… can be used to create powerful models which can outperform, or at the very least augment the performance of, classical techniques for PPI predictions” (Conclusion Ahmed)

Regarding Claim 11
	Claim 11 is rejected for the reasons set forth in claim 3 and claim 9
Regarding Claim 18
	Claim 18 is rejected for the reasons set forth in claim 3 and claim 17

Response to Arguments
Applicant’s arguments, see Remarks, filed 07/23/2021, with respect to the claims 1-4, 8-12, 16-20 have been fully considered and are persuasive.  The rejections under 35 U.S.C 101 of 1-4, 8-12, 16-20 has been withdrawn. 
Applicant’s arguments with respect to claim(s) 1-4, 8-12, 16-20, 5-7, 13-15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Specifically the attentive pooling features challenged by the applicant are taught by the art Santos et al.
Further Ahmed teaches the amended limitations of claim 3.

Conclusion


THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached on Monday-Friday 7:30 am – 4:00 pm (EST).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki, can be reached at telephone number 5712723719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.



/J.R.G./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122