DETAILED ACTION
This is responsive to the application filed 24 February 2020.
Claims 1-20 are pending and considered below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-5 and 7-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Tan et al. (US PGPub 2020/0227030).
Claim 1:
Tan discloses a method comprising: 
obtaining first training data (training sentences) comprising multiple first linguistic samples (sentence components); generating second training data (synthetic data) using the first training data and multiple symmetries, the symmetries (replacement of select sentence components, such as terms and paraphrases) identifying how to modify the first linguistic samples while maintaining structural invariants within the first linguistic Synthetic data is created from training sentences by replacement of select sentence components, such as terms and paraphrases”, [0025], note that only selected components are replaced thereby leaving the rest as invariables); and 
training a machine learning model using at least the second training data (“A select subset of the synthetic data is utilized in the adversarial training of the classification model as a member of the model training data. Accordingly, the adversarial training not only creates synthetic data, but selects a subset of the synthetic data having an adversarial characteristic, and assigns that subset of synthetic data as training data for the classification model”, [0025]), wherein at least some of the second linguistic samples in the second training data are selected during the training based on a likelihood of being misclassified by the machine learning model (“selects an entry in the synthetic data set with the minimum log likelihood value and merges the corresponding utterance with the training set. This selection maximizes the likelihood of the worst synthetic data set, e.g. synthetic utterance. Accordingly, the adversarial training of the model is directed at combining the worst synthetic utterance with the real training data”, [0034]).
Claim 2:
Tan discloses the method of claim 1, wherein the multiple symmetries comprise: substitution symmetries in which words or phrases in the first linguistic samples are replaced with other words or phrases (“the trainer selectively replaces the parsed sub-components with the paraphrase terms, with the replacement creating synthetic data”, [0032]); permutation symmetries in which words or phrases in the first linguistic samples I cannot find a company’s website” in Fig. 4, note that the training sentence is “How do I find a web address for a company”); insertion/deletion symmetries in which words, phrases, or punctuations are added to or removed from the first linguistic samples (see “I cannot find a company’s website” in Fig. 4, note that the training sentence is “How do I find a web address for a company”); and character-level or word-level symmetries in which characters or words in the first linguistic samples are manipulated to create typographical or grammatical errors (“It is understood that a subset of the synthetic data may be redundant or syntactically incorrect”, [0032]).
Claim 3:
Tan discloses the method of claim 2, wherein: the substitution symmetries comprise at least one of: (i) replacing words or phrases in the first linguistic samples with equivalent words or phrases and (ii) substituting details in the first linguistic samples that are irrelevant to a task (“the trainer selectively replaces the parsed sub-components with the paraphrase terms, with the replacement creating synthetic data”, [0032]); the permutation symmetries comprise switching an order of words or phrases in the first linguistic samples (see “I cannot find a company’s website” in Fig. 4, note that the training sentence is “How do I find a web address for a company” and that “company” has switched order); the insertion/deletion symmetries comprise at least one of: (i) inserting or removing articles or adjuncts in the first linguistic samples, (ii) inserting or removing politeness words or phrases in the first linguistic samples, and (iii) inserting or removing punctuation in the first linguistic samples (see “I cannot find a company’s website” in Fig. 4, note that the training sentence is “How do I find a web address for a company” and the insertion of “cannot” and the apostrophe in “company’s”); and the character-level or word-level symmetries comprise at least one of: (i) swapping characters or words in the first linguistic samples and (ii) adding or removing blank spaces in the first linguistic samples (“the trainer selectively replaces the parsed sub-components with the paraphrase terms, with the replacement creating synthetic data”, [0032], see also “I cannot find a company’s website” in Fig. 4, note that the training sentence is “How do I find a web address for a company”, note that the training sentence is longer than this synthetic sentence, i.e. blank spaces have been removed to make a shorter sentence).
Claim 4:
Tan discloses the method of claim 1, wherein generating the second training data comprises: applying the symmetries to the first linguistic samples to produce intermediate linguistic samples; filtering the intermediate linguistic samples to remove unnatural linguistic samples; and selecting one or more of the intermediate linguistic samples as the second linguistic samples for use in training the machine learning model (“a subset of the synthetic data may be redundant or syntactically incorrect, collectively referred to as irrelevant synthetic data. In one embodiment, low value synthetic data is removed from the synthetic data set. Examples of the low value synthetic data include, but are not limited to, a common word and/or a miss-spelled word”, [0032]), wherein the one or more selected intermediate linguistic samples (i) are relevant to a task associated with the first linguistic samples, (ii) lack new annotations relative to the first linguistic samples, and (iii) correct one or more misclassifications made by a prior version of the machine learning model (“A select subset of the synthetic data is utilized in the adversarial training of the classification model as a member of the model training data”, [0025], note that the adversarial training is applied to the same model as the training data (i.e. no new output or annotation is generated for the adversarial training data) and adversarial training improves the model. Also note that the wherein claim describes the general benefits of adversarial training (see for example Applicant’s specification [0036])).
Claim 5:
Tan discloses the method of claim 1, wherein: the first linguistic samples are contained in first dialogue samples associated with a context; and training the machine learning model comprises generating second dialogue samples associated with the context, the second dialogue samples used to train the machine learning model, at least some of the second dialogue samples containing the second linguistic samples ([0025], see also [0002] and [0004] for question/answer dialog).
Claims 7-11:
Tan discloses an apparatus comprising: at least one memory configured to store first training data comprising multiple first linguistic samples; and at least one processor ([0007]) configured to perform the steps of process claims 1-5 as shown above.
Claim 12:
 Tan discloses the apparatus of claim 11, wherein, to generate the second dialogue samples, the at least one processor is configured to at least one of: reorder at least some of the first linguistic samples in the first dialogue samples while maintaining the context to provide permutation symmetry; and insert one of the first dialogue samples into another of the first dialogue samples to provide interruption symmetry (see I cannot find a company’s website” in Fig. 4, note that the training sentence is “How do I find a web address for a company” and that “company” has switched order).
Claim 13:
Tan discloses the apparatus of claim 7, wherein: the first linguistic samples are contained in first dialogue samples associated with a first context; and to train the machine learning model, the at least one processor is configured to generate second dialogue samples associated with a second context that is related to the first context and to train the machine learning model using the second dialogue samples, at least some of the second dialogue samples containing the second linguistic samples ([0025], see also [0002] and [0004] for question/answer dialog).
Claims 14-20:
Tan discloses a non-transitory computer readable medium containing instructions that when executed cause at least one processor ([0008]) to perform the steps performed by the apparatus of claims 7-13 as shown above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Tan et al. (US PGPub 2020/0227030) in view of Ebrahimi et al. ("Hotflip: White-box adversarial examples for text classification." arXiv preprint arXiv:1712.06751 (2017)).
Claim 6:
Tan discloses the method of claim 1, the structural invariants defined at least partially by a task to be learned by the machine learning model (“Synthetic data is created from training sentences by replacement of select sentence components, such as terms and paraphrases”, [0025], note that only selected components are replaced thereby leaving the rest as invariables) but does not explicitly disclose wherein each of the structural invariants represents a linguistic object that cannot be replaced by another linguistic object while preserving how a meaning of an expression is determined.
In a similar adversarial training system including structural invariants, Ebrahimi discloses wherein each of the structural invariants represents a linguistic object that cannot be replaced by another linguistic object while preserving how a meaning of an expression is determined (“we only flip a word wi to wj only if these constraints are satisfied: … 3. We disallow replacing of stop-words, as for many of the stop-words, it is difficult to find cases where replacing them will still render the sentence grammatically correct. We also disallow changing a word to another word with the same lexeme for the same purpose”, section 5, paragraph 2).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to have combined the references to yield the predictable result of representing Tan’s structural invariants as linguistic object that cannot be replaced by another linguistic object while preserving how a meaning of an . 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Vijayaraghavan et al. ("Generating black-box adversarial examples for text classifiers using a deep reinforced model." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 2019) propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. The adversarial examples generated are semantics-preserving perturbations to the original text.
Karpukhin, Vladimir, et al. ("Training on synthetic noise improves robustness to natural noise in machine translation." arXiv preprint arXiv:1902.01509 (2019)) discloses making machine translation more robust to character-level variation at the source side, such as typos. Existing methods achieve greater coverage by applying subword models such as byte-pair encoding (BPE) and character-level encoders, but these methods are highly sensitive to spelling mistakes. It is shown how training on a mild amount of random synthetic noise can dramatically improve robustness to these variations, without diminishing performance on clean text.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL G NEWAY whose telephone number is (571)270-1058. The examiner can normally be reached Monday-Friday 9:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SAMUEL G NEWAY/Primary Examiner, Art Unit 2657