DETAILED ACTION
This communication is in response to the application filed on 15 April 2020.  Claims 1-20 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 15 April 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the IDS is being considered by the examiner.

Claim Objections
Claim 1 is objected to because of the following informalities:  
In lines 15-16, “and the third independent premise independent data” should be changed to “and the third independent premise data” to establish proper antecedent basis for the claim element.
Appropriate correction is required.

Claim Rejections - 35 USC § 112.
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 6, 13, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The claims depend on claims 1, 8, and 15 respectively and recite the following limitations in lines 4-6 of their respective claims: 
“the fifth recurrent network is a fifth Bi-LSTM network, the sixth recurrent network is a sixth Bi-LSTM network, the seventh recurrent network is a seventh Bi-LSTM network, and the eighth recurrent network is an eighth Bi-LSTM network.”
There is insufficient antecedent basis for this limitation in the claim as claims 1, 8, and 15 make no mention of fifth, sixth, seventh, or eighth recurrent networks.

The elements detailed above with insufficient antecedent basis are introduced in claims 3, 10, and 17 of the as-written instant application.  Due to the allowable subject matter contained in claims 2-5, 9-12, and 16-19, in concert with the insufficient antecedent basis detailed above, only the elements with sufficient antecedent basis are considered for further examination in the Claim Rejections - 35 USC § 102 section below.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 6-8, and 13-15, 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hashimoto et al. (US 2018/0121799; hereafter Hash).
Regarding claim 1,
		Hash teaches: 
 method implemented with one or more processors (see Hash ¶ 246: computer system used to implement the model typically includes one or more CPU processors), comprising:
obtaining data indicative of a premise and data indicative of a hypothesis (see Hash ¶ 39, 41, 54, 103, 111, FIG. 1: [39] model is a LSTM sentence processor; --[41] representing the words in the input sentence; --[54] a joint-embedding technique used to encode the input words [obtaining data], joint-embedding includes for each word in the input sequence constructing a “word representation” [data indicative of a premise/hypothesis]; --[Fig. 1 shows that the word representations (102 a and b) are the first blocks at the input to the model and that a respective sentence is their input, hence they are obtained]; --[103] next two NLP tasks of the model encode the semantic relationships between two input sentences [obtained], second task is a textual entailment task which requires one to determine whether a premise sentence [a premise] entails a hypothesis sentence [a hypothesis]; --[111] to classify the premise-hypothesis pair (s, s′) into one of the three classes the model computes the feature vector d2 (s, s′) to identify which is the premise (or hypothesis)), 
wherein the data indicative of the premise and the data indicative of the hypothesis form a natural language inference classification pair (see Hash ¶ 103, 106, 111: NLP [natural language] tasks of the model, first task measures the semantic relatedness between two sentences, output of the semantic relatedness layer is a score for the input sentence pair [form a pair], second task is a textual entailment task which requires one to determine [inference] whether a premise sentence entails a hypothesis sentence [the data indicative of the premise and the data indicative of the hypothesis]; --[106] semantic relatedness between s and s′ [premise/hypothesis], a feature vector is calculated; --[111] entailment classification between two sentences, classify the premise-hypothesis pair (s, s′) [form a natural language inference classification pair] into one of the three classes to identify which is the premise (or hypothesis));
processing the data indicative of the hypothesis independently using a first recurrent network to generate first independent hypothesis data (see Hash ¶ 39, 41, 103, FIG. 1, FIG. 4A:  [39] [in] FIG. 1 model 100 includes two LSTM [recurrent network] stacks (i.e., stack a and stack b) with similar architectures, model 100 [can] include more than two LSTM stacks (e.g., 3, 4, 10, and so on) [Examiner interprets that an individual stack (e.g. stack a in Fig. 1) can process a first sentence representing first data and another individual stack (e.g. stack b in Fig 1) can process a second sentence representing second data] [independently]; --[41] POS label embedding layer is implemented as a bi-directional LSTM [using a first recurrent network to] that uses a POS label classifier [that] processes word embedding vectors [processing the data] (e.g., 102a or 102b) representing the words in the input sentence [data indicative of, independently] and produces POS label embedding vectors and POS state vectors for each of the words [generate first independent hypothesis data]; --[103] textual entailment task which requires one to determine whether a premise sentence entails a hypothesis sentence [indicative of the hypothesis]; --[FIG. 4A shows a diagram depicting example Bi-LSTM used] [first recurrent network]);
processing the data indicative of the premise independently using a third recurrent network to generate third independent premise data (see Hash ¶ 39, 41, 103, FIG. 1, FIG. 4A: [39] [in] FIG. 1 model 100 includes two LSTM [recurrent network] stacks (i.e., stack a and stack b) with similar architectures, model 100 [can] include more than two LSTM stacks (e.g., 3, 4, 10, and so on) [Examiner interprets that an individual stack (e.g. stack a in Fig. 1) can process a first sentence representing first data and another individual stack (e.g. stack b in Fig 1) can process a second sentence representing second data] [independently]; --[41] POS label embedding layer is implemented as a bi-directional LSTM [using a third recurrent network to] that uses a POS label classifier [that] processes word embedding vectors [processing the data] (e.g., 102a or 102b) representing the words in the input sentence [data indicative of, independently] and produces POS label embedding vectors and POS state vectors for each of the words [generate third independent data]; --[103] textual entailment task which requires one to determine whether a premise sentence entails a hypothesis sentence [indicative of the premise]; --[FIG. 4A shows a diagram depicting example Bi-LSTM used] [third recurrent network]);
processing the data indicative of the premise dependently with the first independent hypothesis data using a second recurrent network to generate second dependent premise data (see Hash ¶ 42, 46, 48, FIG. 1, FIG. 5A: [42] chunk label embedding layer is implemented as a bi-directional LSTM [using a second recurrent network] that uses a chuck label classifier [that] processes at least the word embedding vectors [processing the data indicative of the premise], the POS label embedding vectors and the POS state vectors [first independent hypothesis data], to produce [to generate] chunk label embeddings and chunk state vectors [second dependent premise data]; --[46] “Type 2” bypass connections provide the word representations [the data indicative of the premise] directly to each layer in the model 100, “Type 3” bypass connections provide POS label embedding vectors [the first independent hypothesis data] generated at the POS label embedding layer to each of the overlaying layers; --[48] some of the components can be combined, operated in parallel or, in a different sequence than that shown in FIG. 1 without affecting the functions achieved; --[FIG 1 in concert with ¶ 48, as cited previously, shows that the output from the POS component (104 a or b) is fed into the Chunking component (106 a or b) and that, via the Type 2 bypass connections, data indicative of either the hypothesis or premise (side a or side b respectively) can be fed directly to each layer on either side (a or b) of the model.  Similarly, via the Type 3 bypass connections, first independent hypothesis or premise data (side a or side b respectively) can be fed directly to each layer on either side (a or b) of the model.  As such, Examiner interprets that the data indicative of the premise and the first independent hypothesis data are thus able to be provided to and then processed together [dependently with] by Chunking component 106a]; --[FIG. 5A shows a diagram depicting example Bi-LSTM used] [second recurrent network]);
processing the data indicative of the hypothesis dependently with the third independent premise data using a fourth recurrent network to generate fourth dependent hypothesis data (see Hash ¶ 42, 46, 48, FIG. 1, FIG. 5A: [42] chunk label embedding layer is implemented as a bi-directional LSTM [using a fourth recurrent network] that uses a chuck label classifier [that] processes at least the word embedding vectors [processing the data indicative of the hypothesis], the POS label embedding vectors and the POS state vectors [third independent premise data], to produce [to generate] chunk label embeddings and chunk state vectors [fourth dependent hypothesis data]; --[46] “Type 2” bypass connections provide the word representations [the data indicative of the hypothesis] directly to each layer in the model 100, “Type 3” bypass connections provide POS label embedding vectors [the third independent premise data] generated at the POS label embedding layer to each of the overlaying layers; --[48] some of the components can be combined, operated in parallel or, in a different sequence than that shown in FIG. 1 without affecting the functions achieved; --[FIG 1 in concert with ¶ 48, as cited previously, shows that the output from the POS component (104 a or b) is fed into the Chunking component (106 a or b) and that, via the Type 2 bypass connections, data indicative of either the hypothesis or premise (side a or side b respectively) can be fed directly to each layer on either side (a or b) of the model.  Similarly, via the Type 3 bypass connections, third independent hypothesis or premise data (side a or side b respectively) can be fed directly to each layer on either side (a or b) of the model.  As such, Examiner interprets that the data indicative of the hypothesis and the third independent premise data are thus able to be provided to and then processed together [dependently with] by Chunking component 106b]; --[FIG. 5A shows a diagram depicting example Bi-LSTM used] [fourth recurrent network]);
pooling the second dependent premise data and the third independent premise independent data to combine independent and dependent premise data and generate pooled premise data (see Hash ¶ 46, 48, 107, FIG. 1, FIG. 7A: [46] “Type 3” bypass connections provide POS label embedding vectors [the third independent premise data] generated at the POS label embedding layer to each of the overlaying layers, “Type 4” bypass connections provide chunk label embeddings [the second dependent premise data] generated at the chunk label embedding layer to each of the overlaying layers; --[48] some of the components can be combined, operated in parallel or, in a different sequence than that shown in FIG. 1 without affecting the functions achieved; --[107] semantic relatedness layer with a bi-directional LSTM overlying the dependency parent identification and dependency relationship label embedding layer [Dependency component 108], also includes a relatedness vector calculator [Relatedness Encoder component 110] [that] calculates an element-wise max pooling calculation [pooling] for the words in the respective sentences to produce sentence-level state vectors [generate pooled premise data]; --[FIG 1 in concert with ¶ 48, as cited previously, shows that the output from the POS component (104b) (third independent premise data) is fed into the Relatedness Encoder component (110b) via the Type 3 bypass connection.  Similarly, the output from the Chunking component (106b) (the second dependent premise data) is fed into the Relatedness Encoder component (110b) via the Type 4 bypass connection.  As such, Examiner interprets that the third independent premise data and the second dependent premise data are thus able to be provided to and then processed together [to combine independent and dependent premise data] by Relatedness Encoder component (110b)]; --[FIG. 7A shows a diagram depicting example pooling mechanism used (706)];
pooling the first independent hypothesis data and the fourth dependent hypothesis data to combine independent and dependent hypothesis data and generate pooled hypothesis data; (see Hash ¶ 46, 48, 107, FIG. 1, FIG. 7A: [46] “Type 3” bypass connections provide POS label embedding vectors [the first independent hypothesis data] generated at the POS label embedding layer to each of the overlaying layers, “Type 4” bypass connections provide chunk label embeddings [the fourth dependent hypothesis data] generated at the chunk label embedding layer to each of the overlaying layers; --[48] some of the components can be combined, operated in parallel or, in a different sequence than that shown in FIG. 1 without affecting the functions achieved; --[107] semantic relatedness layer with a bi-directional LSTM overlying the dependency parent identification and dependency relationship label embedding layer [Dependency component 108], also includes a relatedness vector calculator [Relatedness Encoder component 110] [that] calculates an element-wise max pooling calculation [pooling] for the words in the respective sentences to produce sentence-level state vectors [generate pooled hypothesis data]; --[FIG 1 in concert with ¶ 48, as cited previously, shows that the output from the POS component (104a) (the first independent hypothesis data) is fed into the Relatedness Encoder component (110a) via the Type 3 bypass connection.  Similarly, the output from the Chunking component (106a) (the fourth dependent hypothesis data) is fed into the Relatedness Encoder component (110a) via the Type 4 bypass connection.  As such, Examiner interprets that the first independent hypothesis data and the fourth dependent hypothesis data are thus able to be provided to and then processed together [to combine independent and dependent hypothesis data] by Relatedness Encoder component (110a)]; --[FIG. 7A shows a diagram depicting example pooling mechanism used (706)];
and generating a pooled classification output by combining the pooled premise data and the pooled hypothesis data, wherein the pooled classification output is selected from the group consisting of entailment, neutral, and contradiction. (see Hash ¶ 45, 47, 48, 103, 114, 115, FIG. 1, FIG 8A: [45] relatedness layer [e.g. 112] provides a categorical classification of relatedness between the first and second sentences and delivers the classification to an entailment layer (e.g., 116) via entailment encoders (e.g., 114a or 114b), entailment layer outputs a categorical classification of entailment between the first and second sentences [output][generating a classification output]; --[47] “Type 1” connections provide hidden state vectors generated at a given layer only to the successive overlaying layer; “Type 6” connection outputs a categorical classification of entailment between the first and second sentences from the entailment layer; --[48] some of the components can be combined, operated in parallel or, in a different sequence than that shown in FIG. 1 without affecting the functions achieved; --[103] NLP tasks of the model [see FIG. 1] encode the semantic relationships between two input sentences, textual entailment task determine[s] whether a premise sentence entails a hypothesis sentence [premise data, hypothesis data], these are typically three classes: entailment, contradiction, and neutral [wherein the classification output is selected from the group consisting of entailment, neutral, and contradiction]; --[114] entailment layer includes an entailment vector calculator and an entailment classifier, entailment vector calculator calculates a bi-directional LSTM element-wise max pooling [pooled] calculation over the forward and backward state vectors for the words in the respective sentences to produce sentence-level state vectors representing the respective sentences, entailment vector calculator further calculates an element-wise sentence-level entailment vector that is processed by the entailment classifier to derive a categorical classification of entailment between the first and second sentences [generating a pooled classification output]; --[115] entailment vector calculator calculates element-wise products between sentence-level entailment vectors for the first and second sentences and uses vectors of element-wise products [by combining] as inputs to the entailment classifier [by combining the pooled premise data and the pooled hypothesis data]; --[FIG. 8A shows a diagram depicting example pooling mechanism used (806)]).

Regarding claims 8 and 15, 
non-transitory computer-readable medium claim 8, system claim 15, and method claim 1 are related as a non-transitory computer-readable medium, system, and method of using the same, with each claimed element's function corresponding to the claimed method step. Accordingly claims 8 and 15 are similarly rejected under the same rationale as applied above with respect to the method claim.


Regarding claim 8, 
Hash further teaches:
at least one non-transitory computer-readable medium comprising instructions that, in response to execution of the instructions by one or more processors, cause one or more processors to perform the following operations (see Hash ¶ 246, 250: [246] computer system used to implement the model typically includes a file storage subsystem; --[250] storage subsystem  stores programming and data constructs that provide the functionality of the modules and methods described herein, these software modules are generally executed by CPU processors).

Regarding claim 15, 
Hash further teaches:
a system comprising one or more processors and memory operably coupled with the one or more processors, wherein the memory stores instructions that, in response to execution of the instructions by one or more processors, cause the one or more processors to perform the following operations (see Hash ¶ 243, 246, 251: [243] computing system comprising one or more processors and memory coupled to the processors; --[246] computer system used to implement the model typically includes a memory subsystem including memory devices; --[251] Memory subsystem includes a number of memories for storage of instructions and data during program execution.


Regarding claims 6, 13, and 20:
Hash teaches:
wherein the first recurrent network is a first bidirectional long short term memory (Bi-LSTM) network (see Hash ¶ 41, FIG 4A: POS label embedding layer is implemented as a bi-directional LSTM; --[FIG. 4A shows a diagram depicting example Bi-LSTM used]),
the second recurrent network is second Bi-LSTM network (see Hash ¶ 42, FIG. 5A: chunk label embedding layer is implemented as a bi-directional LSTM; --[FIG. 5A shows a diagram depicting example Bi-LSTM used]), 
the third recurrent network is a third Bi-LSTM network (see Hash ¶ 41, FIG. 4A: POS label embedding layer is implemented as a bi-directional LSTM; --[FIG. 4A shows a diagram depicting example Bi-LSTM used]), 
the fourth recurrent network is a fourth Bi-LSTM network (see Hash ¶ 42, FIG. 5A: chunk label embedding layer is implemented as a bi-directional LSTM; --[FIG. 5A shows a diagram depicting example Bi-LSTM used]),


Regarding claims 7 and 14:
Hash teaches: 
preprocessing the data indicative of the premise and the data indicative of the hypothesis which form the natural language inference classification pair (see Hash ¶ 59: word embedder maps the words in the sentence [the data indicative of the premise and the data indicative of the hypothesis], into the word embedding space [preprocessing] represented by a word embedding vector, word embedding processor combines results of the word embedder and the n-character-gram embedder, [the combination is a word representation, 102a and 102b for hypothesis and premise respectively] [which form the natural language inference classification pair]).

Allowable Subject Matter
The following is a statement of reasons for the indication of allowable subject matter:
The closest prior art of record, Hashimoto et al. (US 2018/0121799), is cited to disclose a 
method of determining entailment of a natural language inference pair, namely a premise sentence and a hypothesis sentence, and classifying the pair as either entailment, contradiction, or neutral. 
The closest prior art of record, Chen et al. (“Enhanced LSTM for Natural Language Inference”), is cited to disclose neural network models for natural language inference using an enhanced sequential inference model with recursive architectures in both local inference modeling and inference composition.  Key features include hybrid neural inference networks using chained Bi-LSTM and Tree-LSTM architectures to infer relationships between hypothesis and premise sentences to classify the pair as either entailment, contradiction, or neutral.
The closest prior art of record, Wang et al. (“Bilateral Multi-Perspective Matching for Natural Language Sentences”), is cited to disclose a bilateral multi-perspective matching (BiMPM) model under a “matching-aggregation” framework that matches premise and hypothesis sentences from multiple perspectives using three tasks: paraphrase identification, natural language inference and answer sentence selection, and can classify a sentence pair as either entailment, contradiction, or neutral. 
The closest prior art of record, Rocktaeschel et al. (“REASONING ABOUT ENTAILMENT WITH
NEURAL ATTENTION”), is cited to disclose recognizing textual entailment on a large, human curated and annotated corpus, improving performance with general end-to-end differentiable models using LSTM recurrent neural networks that read pairs of sequences to produce a final representation from which a simple classifier predicts entailment.
The closest prior art of record, Sha et al. (“Reading and Thinking: Re-read LSTM Unit for Textual Entailment Recognition”), is cited to disclose a LSTM architecture for recognizing textural entailment, employing “re-read mechanics” into the LSTM (rLSTM) for dual sentence modeling and taking the representation of the premise as general input when dealing with the hypothesis.
However, none of these cited references either alone or in combination thereof teaches or makes obvious the combination of limitations as recited in the dependent claims; specifically, the limitation(s) of: 
“generating data representative of dependent premise attention embedding by combining the pooled premise data, the additional representation of the hypothesis data, a difference between the pooled premise data and the additional representation of the hypothesis data, and an element wise product between the pooled premise data and the additional representation of the hypothesis data;
generating data representative of dependent hypothesis attention embedding by combining the pooled hypothesis data, the additional representation of the premise data, a difference between the pooled hypothesis data and the additional representation of the premise data, and an element wise product between the pooled hypothesis data and the additional representation of the premise data;”  as recited in claims 2, 9, and 16;
all other limitations contained in claims 2-5, 9-12, and 16-19 qualify as allowable subject matter due to these claims’ dependence on claims 2, 9, and 16.

Conclusion	
Any inquiry concerning this communication or earlier communications from Examiner should be directed to AARON G. ZELLER whose telephone number is (571) 272-5765.  Examiner can normally be reached Monday - Thursday 10 AM - 7:30 PM and every other Friday 10:00 AM - 6:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach Examiner by telephone are unsuccessful, Examiner’s supervisor, Pierre-Louis Desir can be reached at (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AARON G ZELLER/
Examiner, Art Unit 2659                                                                                                                                                                                                        
                                                                                                                                                                                                   14 July 2022

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659