DETAILED ACTION
This communication is in response to the Application filed on 09/08/2020. Claims 1-20 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/15/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claim 8 and 18 are objected to because of the following informalities: “fist utterance” should be “first utterance”. There were two instances of this issue in each of the noted claims. Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 11-13 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Qu et al.  (“Attentive History Selection for Conversational Question Answering”,2019).
As to claim 1, 11, and 20, Qu teaches a system for dynamic topic tracking in a multi-party conversation, the system comprising: 
a memory (see page 1397, sect 4.2.3, models implemented with TensorFlow thus implying use of memory) configured to store a language model, a context history of a plurality of prior utterances, and a plurality of candidate responses at a current turn of the multi-party conversation (see page 1394, left column, sect. 3.3.1, 1st paragraph, where BERT model described, question qk, passage p, and conversation history H-k and Table 1); 
a processor (see page 1397, sect 4.2.3, models implemented with TensorFlow thus implying use of processor) configured to: 
input, to the language model, each prior utterance from the context history paired with a candidate response from the plurality of candidate responses (see page 1394, sect. 3.3.1, left column, 1st paragraph, where BERT model receives as input question qk, k  and see Figure 1, left side of training instance, variations, input sequence, batches which are then input into BERT); 
encode, via the language model, pairs of the prior utterances and the candidate response into a plurality of topic vectors (see page 1394, sect. 3.3.1, left column, 1st paragraph, where BERT model encodes question qk, passage p, and conversation history H-k  into a contextualized representation); 
generate, by a self-attention layer, a plurality of self-attended topic vectors indicative of topic relevance at an utterance level based on the plurality of topic vectors (see page 1395, sect. 3.4, right column, last full paragraph, where attention weights are applied to tk which is based on the token sequence derived from the BERT see Figure 1 Encoder to token level and see Figure 2, output of BERT as contextualized representations); 
compute a relevance score for the candidate response given the context history based on max-pooling of the plurality of attended topic vectors (see page 1398, right column, bullet 3, max pooling used by HAM in order to promote useful history and demotes unhelpful ones see, page 1392, left column, last 8 lines); and 
determine whether to select the candidate response as a response at the current turn of the multi-party conversation base on the relevance score (see page 1396, sect. 3.5, entire section, where aggregated token representation is used to predict answer based on token).
As to claim 11, apparatus claim 1 and 20 and method claim 11 are related as apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 1 is similarly rejected 

As to claim 2 and 12, Qu teaches wherein the processor is further configured to encode, via the language model, pairs of the prior utterances and the candidate response into the plurality of topic vectors by: 
generating an input sequence of tokens representing a pair of an utterance and a candidate response (see Figure 2, where QT1 and PT1-PT4 is shown); 
encoding, via a transformer layer in the language model, the input sequence of tokens into an encoded representation including a first portion representing a start token in the input sequence of tokens and a second portion representing remaining tokens in the input sequence of tokens (see Figure1, input into BERT and yielding output of T0-T7, where T0 interpreted to be start token and the rest the second portion); 
attending over the second portion of the encoded representation based on the first portion as query (see page 1395, sect. 3.4, right column, last full paragraph, where attention weights are applied to tk which is based on the token sequence derived from the BERT see Figure 1 Encoder to token level and see Figure 2, output of BERT as contextualized representations); 
and concatenating the attended second portion of the encoded representation and the first portion to result in a topic vector from the plurality of topic vectors (see Figure 1, where aggregated step occurs from the history attention based on contextualized representation as a result of BERT processing).

As to claim 3 and 13, Qu teaches wherein the processor is further configured to compute the relevance score for the candidate response given the context history based on max-pooling of the plurality of attended topic vectors by (see below, probability of begin and end token computed): 
generating a max-pooling output from the plurality of attended topic vectors (see page 1398, left column, bullet 3, where sequence level representations obtained with max pooling); 
performing a softmax operation over a linear mapping of the max-pooling output to obtain the relevance score, wherein the relevance score indicates a relevance level between the respective prior utterance and the candidate response (see page 1396, sect. 3.5, where softmax function use to compute probabilities across all tokens in the sequence, which will be used to predict answer based on probability of begin and end token); and 
computing a first entropy loss based on the relevance score and a ground truth label (see page 1396, sect. 3.5, where cross entropy loss shown in equation 5 and  which is based on the ground truth begin and end token and probabilities).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-6 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Qu (as applied above) in view of Whang et al. (“Domain Adaptive Training BERT for Response Selection”, 2019).
As to claim 4 and 14, Qu teaches all of the limitations as in claim 3 and 13, above.
However, Qu does not specifically disclose compute a binary topic classifier based on the plurality of topic vectors, wherein the binary topic classifier indicates whether the respective prior utterance and the candidate response belongs to a same topic; and compute a second cross entropy loss based on the binary topic classifier.
	Whang does disclose compute a binary topic classifier based on the plurality of topic vectors, wherein the binary topic classifier indicates whether the respective prior utterance and the candidate response belongs to a same topic (page 2, left column, sect. 3, where task of ranking responses is based on binary classification problem and see right column, 5 lines above equation 1, where contextual representations utilized to classify whether a given dialog context and response IsNextUtterance or not) ; and 
compute a second cross entropy loss based on the binary topic classifier (see page 2, right column, equation 2 where cross entropy loss computed based on the classifications).


	As to claim 5 and 15, Qu in view of Whang teach all of the limitations as in claim 4 and 14, above.
	Furthermore, Qu teaches where the processor is further configured to: compute a reply-to distribution based on the plurality of attended topic vectors, wherein the reply-to distribution indicates a probability that the candidate response replies to the respective prior utterance (see page 1396, equation 6, where probability to predict dialog act is computed based on sequence level representations as a result of the history attention); and 
compute a third cross-entropy loss based on the reply-to distribution (see page 1396, equation 6, where cross entropy loss is computed based on probability).

	As to claim 6 and 16, Qu in view of Whang teach all of the limitations as in claim 4 and 15, above.
	Furthermore, Qu teaches wherein the processor is further configured to: compute a combined loss as a weighted sum of the first cross-entropy loss, the second cross-entropy loss and the third cross-entropy loss (see page 1396, sect 3.7, equation 7, 
jointly update a response selection module, a topic prediction module and a topic entanglement module based on the combined loss (see page 1396, sect 3.7.2, where joint learning of the answer span and dialogue act prediction task which is performed).

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Qu (as applied above) in view of Gu et al. (“Pre-Trained and Attention-Based Neural Networks for Building Noetic Task-Oriented Dialogue Systems”,2020).
As to claim 7 and 17, Qu teaches all of the limitations as in claim 1 and 11, above.
However, Qu does not specifically disclose wherein the language model is pre-trained with a pretraining dataset including a plurality of utterances, each utterance being paired with a respective positive response from the multi-party conversation and a respective negative response from outside the multi-party conversation.
Gu discloses wherein the language model is pre-trained  (see page 3, right column, “Pre-training tasks”, BERT pre-training described) with a pretraining dataset including a plurality of utterances, each utterance being paired with a respective positive response from the multi-party conversation and a respective negative response from outside the multi-party conversation (see page 3, right column, “Pre-training tasks”, where pre-training is based on MLM and NSP, and where for NSP positive responses are true responses and negative responses are randomly sampled from the Ubuntu 
Qu and Gu are in the same field of endeavor of response selection, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the response selection as taught by Qu with the positive and negative pretraining as taught by Gu in order to improve performance of the BERT model (see page 3, right column, “Pre-training tasks”, last sentence in this first paragraph).

Claims 8-10 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Qu (as applied above) in view of Gu et al. (“Pre-Trained and Attention-Based Neural Networks for Building Noetic Task-Oriented Dialogue Systems”), as applied in claims 7 and 17, above and further in view of Devlin et al. (“Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding”,2019) in view of Whang (as applied above)
As to claim 8 and 18, Qu in view of Gu teach all of the limitations as in claims 7 and 17, above.
However, Qu in view of Gu do not specifically disclose wherein the processor is further configured to: input a first utterance and a second utterance, from the pretraining dataset, in a form of a training sequence to the language model, wherein the training sequence includes a first token that predicts whether the first utterance and the second utterance belong to a same topic; generate, by embedding, a token representation of the training sequence; generate, by an encoder layer of the language model, encoded 
Devlin does disclose wherein the processor is further configured to:
input a first utterance and a second utterance, from the pretraining dataset, in a form of a training sequence to the language model, wherein the training sequence includes a first token that predicts whether the first utterance and the second utterance belong to a same topic (see Figure 1, left side figure, pre-training, where unlabeled Sentence A and B pair are input to BERT and which includes a token Tok 1…TokM); 
generate, by embedding, a token representation of the training sequence (see page 4, left column, 2nd paragraph, where wordpiece embeddings used and further later in the paragraph describes input embedding E and see Figure 2); 
generate, by an encoder layer of the language model, encoded topic vectors of the token representation, wherein the encoded topic vector includes a first encoded topic vector corresponding to the first token and wherein the first encoded topic vector encodes a topic relationship between the fist utterance and the second utterance (see Figure 3, where the BERT model on left hand side shows the layout of the BERT model, where the Embedding is passed through to obtain T1-TN  which is based on the relationships of Tm).
nd bullet)
However, Qu in view of Gu in view of Devlin do not specifically teach determine whether the first utterance and the second utterance are matched in topic using the encoded first token as a contextual embedding.
Whang does disclose determine whether the first utterance and the second utterance are matched in topic using the encoded first token as a contextual embedding (see page 2, right column, 5 lines above equation 1, where contextual representations utilized to classify whether a given dialog context and response IsNextUtterance or not in which the contextual representation is used from BERT); and 
update the language model using a determined topic relationship between the first utterance and the second utterance (see page 2, sect 3.1, where BERT post-training performed based on task specific corpora)
Qu and Gu and Devlin and Whang are in the same field of endeavor of response selection, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the response selection as taught by Qu with the updating of the model as taught by Whang in order to be able to optimize the model (see Whang page 2, right column, 3 lines prior to equation 2).

As to claim 9, Qu and Gu and Devlin and Whang teach all of the limitations as in claim 8, above.
Furthermore, Gu discloses a masking language model as well as positive and negative responses (see page3, right column, under pretraining tasks)
Qu and Gu are in the same field of endeavor of response selection, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the response selection as taught by Qu with the positive and negative pretraining as taught by Gu in order to improve performance of the BERT model (see page 3, right column, “Pre-training tasks”, last sentence in this first paragraph).
Furthermore, Devlin discloses mask at least a portion of the pretraining dataset comprising the plurality of utterances, paired positive responses and paired negative responses (see page 4, right column, first two paragraphs, where masked LM described where 15% of the token position are chosen at random for prediction); and 
train the language model using the masked pretraining dataset based on a masked language modeling loss (see page 4, right column, first two paragraphs, where cross entropy loss is computed based on masking).
Qu and Gu and Devlin are in the same field of endeavor of response selection, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the response selection as taught by Qu in view of Gu with the pre-training nd bullet).
Furtermore, Whang discloses mask at least a portion of the pretraining dataset comprising the plurality of utterances, [paired positive responses and paired negative responses (see page 3, left column, continued paragraph, where masked LM is used and see example and see sect 4.1 which discusses positive and negative response ); and 
train the language model using the masked pretraining dataset based on a masked language modeling loss (see page 3, left column, paragraph above and including equation 3 where masked LM is used to compute loss for post training).
Qu and Gu and Devlin and Whang are in the same field of endeavor of response selection, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the response selection as taught by Qu with the updating of the model as taught by Whang in order to be able to optimize the model (see Whang page 2, right column, 3 lines prior to equation 2).

As to claims 10 and 19, Qu and Gu and Devlin and Whang teach all of the limitations as in claim 8 and 18, above.
	Furthermore, Devlin discloses wherein the token representation includes a first representation corresponding to a start token in the training sequence (see Figure 1, where [CLS] is the start which is input into BERT during pre-training), and the processor is further configured to: 

Qu and Gu and Devlin are in the same field of endeavor of response selection, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the response selection as taught by Qu in view of Gu with the pre-training as taught by Devlin in order to reduce the need of heavily engineered task specific architectures (see Devlin page 2 left column 2nd bullet).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lu et al. (NPL) is cited to disclose response selection based on a matching network. Wu et al. (NPL) is cited to disclose response selection based on figure 1). Zhao (NPL) is cited to disclose document grounded matching network for response selection (see Figure 1). Yang et al. (NPL) is cited to disclose response ranking with transformers. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARAS D SHAH whose telephone number is (571)270-
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
03/16/2022