DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Office Action mailed 1/28/2022, applicant has submitted an amendment filed 4/25/2022.
Claim(s) 1, 3, 5-9, 11, and 13-20, has/have been amended.  
EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Adesh Bhargava on 5/2/2022.

The application has been amended as follows: 

	Amend “the one of the perturbabilities” in the 4th to last line of claim 1 to recite –one of the perturbabilities---.

Amend “the one of the perturbabilities” in the 5th to last line of claim 9 to recite –one of the perturbabilities---.

Amend “the one of the perturbabilities” in the 4th to last line of claim 18 to recite –one of the perturbabilities---.

Amend “determining, by the at least one hardware processor, based on the updated response to the perturbed variant for each semantic segment of the plurality of semantic segments, a perturbability of the natural dialogue system.” in the last 3 lines of claim 8 to recite --determining, by the at least one hardware processor, for each semantic segment of the plurality of semantic segments, based on the updated response to the perturbed variant, a perturbability of the natural dialogue system.—
	
Amend “determining, by the at least one hardware processor, based on the updated response to the perturbed variant for each semantic segment of the plurality of semantic segments, the perturbability of the natural dialogue system,” in lines 1-3 of claim 9 to recite --determining, by the at least one hardware processor, for each semantic segment of the plurality of semantic segments, based on the updated response to the perturbed variant, the perturbability of the natural dialogue system,—

	Amend “determine, based on the perturbed variant for each semantic segment of the plurality of semantic segments, a perturbability of the natural dialogue system.” in the last 2 lines of claim 16 to recite –determine, for each semantic segment of the plurality of semantic segments, based on the perturbed variant, a perturbability of the natural dialogue system.--

Amend “determine, based on the perturbed variant for each semantic segment of the plurality of semantic segments, the perturbability of the natural dialogue system.” in the lines 2-4 of claim 17 to recite –determine, for each semantic segment of the plurality of semantic segments, based on the perturbed variant, the perturbability of the natural dialogue system,--
	Amend “determine, based on the updated response to the perturbed variant for each semantic segment of the plurality of semantic segments, the perturbability of the natural dialogue system.” in the last 2 lines of claim 17 to recite --determine, for each semantic segment of the plurality of semantic segments, based on the updated response to the perturbed variant, the perturbability of the natural dialogue system.--

Amend “determine, based on the updated response to the perturbed variant for each semantic segment of the plurality of semantic segments, the perturbability of the natural dialogue system.” in lines 2-4 of claim 18 to recite --determine, for each semantic segment of the plurality of semantic segments, based on the updated response to the perturbed variant, the perturbability of the natural dialogue system,--

Claim Interpretation
	The elements of claim 1 are not interpreted as invoking 112(f) because they are recited as being “execute by at least one hardware processor”, and are thus interpreted as program code.
In claims 17-18, and 20, “the machine readable instructions to determine…” (line 2 of each of claims 17-18 and 20) is interpreted as having implicit antecedent basis (because in claim 16, in order for instructions to cause processor[s] to perform the claims steps, those instructions naturally must include, for each step, respective program code that is configured to cause the processor[s] to perform a respective step)
Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance:

As per Claim(s) 16 (and similarly claims 1 and 8 which are narrower than claim 16, the prior art of record does not teach or suggest the combination of all limitations in claim(s) 16, including (i.e. in combination with the remaining limitations in claim[s] 16) identify a plurality of semantic segments for conversation data for a natural dialogue system; generate, for each semantic segment of the plurality of semantic segments, a perturbed variant that includes at least one perturbation; and determine, based on the perturbed variant for each semantic segment of the plurality of semantic segments, a perturbability of the natural dialogue system.
	The prior art suggests:
I. determining robustness/perturbability of a system (particularly a parser) based on how similar a parser’s output for a well-formed sentence is to the parser’s output for a “problematic sentence” (where a sentence can be interpreted as a “semantic segment”).
	Homa B. Hashemi and Rebecca Hwa. 2016. An evaluation of parser robustness for ungrammatical sentences. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, pages 1765–1774 teaches “For the purpose of robustness evaluation, we take the automatically produced parse tree of a well-formed sentence as “gold-standard” and compare the parser output for the corresponding problematic sentence against it. Even if the “gold-standard” is not perfectly correct in absolute terms, it represents the norm from which parse trees of problematic sentences diverge: if a parser were robust against ungrammatical sentences, its output for these sentences should be similar to its output for the well-formed ones” (Section 3, first paragraph).  This reference suggests evaluating robustness (a form of “determining perturbability”) based on the output “response” of a parser to “problematic sentences” (i.e. how similar the parser output for problematic sentences is to parser outputs for well-formed sentences).  This reference does not appear to specifically describe where the problematic sentences were generated by perturbing the well-formed sentences (See Section 4 which describes where parser robustness is evaluated over two datasets that contain ungrammatical sentences [writings of English-as-a-Second language learners and machine translation outputs] and where the datasets have “corresponding correct sentences…available [or easily reconstructed], which indicates that the correct/well-formed sentences are derived from ungrammatical sentences [not the other way around, as claimed]).  In the independent claims of this application, the perturbations are applied to “a plurality of semantic segments for conversation data for a natural dialogue system” (which appear to be the output of a parser, and not the input to a parser), and the perturbability being evaluated is perturbability of the natural dialogue system.
II. generating negative examples from positive examples.
	2003/0120481 teaches predicting negative examples from positive examples (paragraph 2)
	Heung-Seon Oh, Jong-Beom Kim, and Sung-Hyon Myaeng. 2011. Extracting targets and attributes of medical findings from radiology reports in a mixture of languages. In Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB '11). Association for Computing Machinery, New York, NY, USA, 550–552. DOI:https://doi.org/10.1145/2147805.2147897 teaches automatically generating negative examples from positive examples (e.g. when T1, A1 and T2, A2 are true pairs, then T1, A2 can serve as a negative example) (see Section 2.2)
III. Perturbing dialog/conversation.
2019/0377795 teaches “As another example, suppose a classifier is trained to determine if and when a conversation between a user and a chatbot should be escalated. An escalation occurs when the user is transferred to a human representative because the conversation between the user and the chatbot is failing to progress. In this case, the conversation can be perturbed to determine the importances of user turns in the conversation, instead of, e.g., sentences. Another interesting feature of conversational data is the presence of repetitions in user text. If a user repeats his request multiple times in a conversation, the chances for escalation should intuitively increase as the conversation is obviously not progressing. Thus, each repeated turn should increase in importance. However, if the visualization scheme treats each sample as independent, this information will be lost. Therefore, a visualization method that will highlight the samples most influential to the model in sequential data as well as not assuming independence of samples is desirable” (paragraph 18).  This reference appears to perturb a conversation by removing turns (paragraph 25) and does not appear to perturb each of a plurality of semantic segments.
2021/0174798 (provisional 62/945792 precedes effective date of this application by 10 days) teaches “In addition, one goal of DST module 110 is the robustness to a small perturbation of input dialogue history, e.g., a slight change in wording of the input would not significantly alter the dialogue and any system action resulted from the dialogue. Embodiments described herein further provide a mechanism to boost prediction consistency of a few perturbed inputs in self-supervised DST training, making a DST model more robust to unseen scenarios. Specifically, a small number of input words from the original input of unlabeled dialogue data 105 are randomly masked into unknown words for multiple times to generate multiple masked training dialogues. Model prediction distributions generated from the multiple masked training dialogues and the original training dialogue are then used to generate an error metric indicating a degree of difference between prediction distributions from masked and unmasked dialogues. The error metrics are then incorporated into the loss metric 123 for updating the DST module to boost the prediction consistency. Further details relating to prediction consistency are discussed in relation to FIGS. 6-7” (paragraph 34, supported by paragraph 21 of provisional application 62/945,792 which recites “Figure 3 is a simplified block diagram for prediction consistency reinforcement 300 according to some embodiments. In some embodiments, the goal of prediction consistency is that DST models should be robust to a small perturbation of input dialogue history. In some embodiments, the system randomly masks out a small number of input words into unknown words for Ndrop times. Then, the system uses Ndrop dialogue history together with the one without dropping any word as input to the base model, and generates or obtains Ndrop + 1 model predictions. In some embodiments, Ndrop + 1 attention and slot gate distributions are averaged (and sharpened) to be the guessed distribution. The guessed distributions can be used to train the base model to be consistent for the attention and slot gate”).  This reference appears to describe perturbing input dialog by replacing input words with unknown words, but does not appear to specifically describe determining robustness/trustworthiness/perturbability of a natural dialogue system based on the perturbed words.
IV. determining trustworthiness of a conversation system/chatbot.
2021/0097085 teaches “A first conversation system used in testing is "Eliza" which is a well-studied general conversation system created in the 1960s to model a patient's interaction with a Rogerian therapist. It uses cues from the user's input to generate a response using pre-canned (e.g., stored or predetermined) rules without deeper understanding of the text, or the context of the conversation. Since Eliza uses pattern recognition on the user's input, it can be easily manipulated via such text to become abusive (AL) and exhibit bias (B). Since the chatbot uses input text and scripted rules to create its response, it preserves the conversation style of the input, thus behaving well in terms of language complexity (CC). Finally, since it retains no context of a conversation, two users giving the same inputs will get the same response, leading to no information leakage (IL). The output of the rating method for an Eliza implementation will be an aggregated trustworthiness score (L, M or H) and an explanation of how it was calculated from raw issue scores. Since this chatbot can be configured with alternative users, the system can check the chatbot for rating sensitivity and include the result in the output” (paragraph 48).  This reference does not appear to describe determining trustworthiness of a chatbot/conversation system based on altered/perturbed/permuted/distorted semantic segments.
V. robustness against variety of user utterances and variety of user intents.
2019/0130904 teaches “This example implementation shows an alternative NLU algorithm which can not only understand a user utterance that intends predefined dialog acts, but also a user utterance which show an intent that the developer did not expect to be said by the user. It improves the robustness of the system utterance against the variety of the user utterances and the variety of user intents” (paragraph 185)

Upon further search and consideration (in response to the amendment filed 12/6/2021):
2017/0329873 teaches “The basic idea behind row adjustment derives from the observation that two queries in an attack pair must differ by a single user, the victim. Given this, row adjustment actively tests modifications of a given query to see if those modifications result in a small difference between the original query's answer and the modified query's answer. If the difference is small, then the rows that constitute the difference are adjusted so as to remove the difference between the original query and the modified query” (paragraph 57).  This reference does not appear to use the difference between an original query’s answer and a modified query’s answer to determine perturbability of a natural dialogue system.  This reference appears to be directed to identifying difference attacks and producing perturbed answers to prevent information about individuals from being inferred.
2020/0183962 teaches “wherein analyzing answers comprises: determining how a confidence of an answer to the original question compares to confidences of answers to the altered questions” (claim 5).  Figure 1 depicts swapping an object of an original question with concept terms and submitting to QA service, then analyzing results and proactively identifying answer gaps/inconsistencies between concepts.  Paragraph 23 describes different medications being swapped and “net results of how well a system is able to answer the same question against different medications” which “indicate the coverage of the side effects of a given medication.”  This appears to be directed to determining corpus coverage, not how easily perturbed a question answering system is.
2019/0073598 teaches “The method may also comprise eliminating an inappropriate (i.e., poor or inadequate) response of a cognitive engine by assessing the response in terms of self-consistency, by provoking the same or similar answer to slightly modified questions, relevancy, and coverage, by reformulating the query in different ways and comparing related partial responses” (paragraph 28) and “The self-consistency may be given if responses to slightly modified queries may result in the same response. Also, a predefined threshold may be applicable. If a comparison value of the responses stays below a predefined threshold value, the responses are determined as having self-consistency” (paragraph 29).  This reference appears to use self-consistency to eliminate responses.  Paragraph 25 describes where the information about corpus gaps are used to fill in the gaps using techniques familiar to those having ordinary skill in the art.
6584346 teaches comparing physiological response to each changed sound output to previously sensed physiological responses (claims 1 and 7).  This reference’s comparison is used to select muffler configurations.
8990778 teaches “For example, subsequent to a change in the candidate version 114, the dashboard service 118 may request that the shadow proxy service 112 replay the shadow requests 124 that resulted in unacceptable differences between the candidate responses 134 and authority responses 136 to the changed candidate version 114, and in some implementations, to the authority version 116 as well. Once the requests 124 have been replayed to the changed candidate version 114, either the shadow proxy service 112 or the dashboard service 118 makes a comparison between the new responses and the original responses to determine if the unacceptable differences have been resolved.” (col. 7, lines 27-38).  This reference appears to be directed to comparing results of execution of modified software against results of execution of deployed/current software.  Candidate version refers to a trial/test version (col. 1, lines 60-61) and authority version refers to a version for validating the candidate version which is, for example, a most recent version known to have acceptable functionality and performance (col. 3, lines 47-54).

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 5/2/2022
/ERIC YEN/           Primary Examiner, Art Unit 2658