DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 07/01/2021. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 
The independent claims recite:
receiving a training dataset of original matrix language examples and a set of embedded languages, wherein an original matrix language example in the training dataset corresponds with a respective label;
constructing an adversarial distribution by enumerating perturbations per embedded language in successful adversaries of the training dataset;
sampling, for an original matrix language example, a subset of embedded languages from the adversarial distribution;
translating the original matrix language example into a set of translated examples corresponding to the subset of embedded languages;
sampling a perturbation according to a probability for a language unit in the original matrix language example;
generating a code-mixed adversarial example by applying one or more perturbations to the original matrix language example; and
training the multilingual model based on an input of the generated code-mixed adversarial example and the respective label corresponding to the original matrix language example. 

The claim relates to a human organizing of activities. This reads on a human:
receiving written or spoken words from another human and a specific language and/or category;
enumerating (making a list of) perturbations (e.g., misspellings) in specified language by the another human;
identifying languages from the enumerated (list);
translating the received words from the another human into the specified language;
calculating (by pen and paper)/determining the perturbation in the words received by the another human;
writing down a code-mixed (e.g., multilingual) version of the words received by the another human; and
defining a predetermined set of rules based on the code-mixed (e.g., multilingual) version of the words received by the another human and the  specified language and/or category.

This judicial exception is not integrated into a practical application because for example: claim 11 and 20 recite a memory, communication interface, processors, and a non-transitory processor-readable storage medium. As an example, in [0064] of the as filed specification, “Memory 820 may be used to store software executed by computing device 800 and/or one or more data structures used during operation of computing device 800. Memory 820 may include one or more types of machine readable media. Some common forms of machine readable media may include floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.” Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. 

With respect to claims 2 and 12, the claims recites: 
wherein the language unit is a word or a phrase comprising a sequence of words that have a designated meaning when recited as the sequence of words.
The claim relates to a human organizing of activities. This reads on a human:
receiving written or spoken words from another human and a specific language and/or category:
 wherein the word/phrase comprises a sequence of words with designated meaning.
No additional limitations are present. 	

With respect to claims 3 and 13, the claims recites: 
wherein the code-mixed adversarial example is a word-level adversarial example that is generated by:
generating a set of candidate adversaries by substituting one or more words in the original matrix language example with one or more translated words in a translated example from the set of translated examples;
computing a respective loss of the multilingual model by passing each candidate adversary from the set of candidate adversaries through the multilingual model and obtaining a respective output from the multilingual model in response to the respective candidate adversary;
determining a specific candidate adversary that maximizes the respective loss based on a beam search on the set of candidate adversaries.
The claim relates to a human organizing of activities. This reads on a human:
writing down a code-mixed (e.g., multilingual) version of the words received by the another human:
wherein the multilingual version of the words are generated using substitution of words (i.e., translated);
calculating (pen and paper) a difference between the different version of texts;
determining a final/candidate version of multilingual text/words.
No additional limitations are present. 	

With respect to claims 4 and 14, the claims recites: 
filtering candidate perturbations by checking whether a respective candidate perturbation exists in the translated example.
The claim relates to a human organizing of activities. This reads on a human:
determining if a word is present or not in the translated version.
No additional limitations are present. 	

With respect to claims 5 and 15, the claims recites: 
aligning words in the original matrix language example to translated words in a translated example from the set of translated examples;
identifying one or more phrases in the original matrix language example or the translated example based om the aligning;
generating a set of candidate adversaries by substituting the one or more phrases in the original matrix language example with one or more counterpart phrases in the translated example from the set of translated examples;
computing a respective loss of the multilingual model by passing each candidate adversary from the set of candidate adversaries through the multilingual model and obtaining a respective output from the multilingual model in response to the respective candidate adversary;
and determining a specific candidate adversary that maximizes the respective loss based on a beam search on the set of candidate adversaries.
The claim relates to a human organizing of activities. This reads on a human:
comparing the original words vs. the translated words;
identifying phrases in the original or translated version of words;
generate possible candidates for substitution of said words/phrases;
calculating (pen and paper) a difference between the different version of texts;
determining which text maximizes the loss.
No additional limitations are present. 	

With respect to claims 6 and 16, the claims recites: 
applying an equivalence constraint that prevents a perturbation from being applied if the perturbation is from a same language as a previous word and disrupts a syntax of a current phrase.
The claim relates to a human organizing of activities. This reads on a human:
applying a predetermined rule to avoid perturbations (e.g., misspellings) that may affect syntactically. 
No additional limitations are present. 	

With respect to claims 7 and 17, the claims recites: 
wherein the code-mixed adversarial example is repeatedly generated for a first pre-defined number of times thereby resulting in a first pre-defined number of code-mixed adversarial examples corresponding to the original matrix language example.
The claim relates to a human organizing of activities. This reads on a human:
generating the multilingual text for a predefined number of times.
No additional limitations are present. 	

With respect to claims 8 and 18, the claims recites: 
wherein the first pre-defined number is set to be equal to a second pre-defined number associated with training epochs for the multilingual model minus one.
The claim relates to a human organizing of activities. This reads on a human:
generating the multilingual text for a predefined number of time; wherein the predefined number of times is another predefined value minus one.
No additional limitations are present. 	

With respect to claims 9 and 19, the claims recites: 
wherein the training of the multilingual model is performed using a set of generated code-mixed adversarial examples for one training epoch.
The claim relates to a human organizing of activities. This reads on a human:
defining a set of rules to generate the multilingual text in a specified period of time.
No additional limitations are present. 	

With respect to claim 10, the claim recites: 
generating an adversarial attack example based on the original matrix language example and the set of embedded languages; and
testing an output robustness of the multilingual model using the adversarial attack example.
The claim relates to a human organizing of activities. This reads on a human:
generating a final multilingial text; and 
testing by comparing the accuracy of said generated multilingual text (e.g., meaning based).
No additional limitations are present. 	

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-4, 7, 10-14, 17, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Dong et al. (Dong, Xin, et al. "Leveraging adversarial training in self-learning for cross-lingual text classification." Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020.; https://dl.acm.org/doi/pdf/10.1145/3397271.3401209).
As to independent claim 1, Dong et al. teaches a method comprising:
1. A method for code-mixed adversarial training of a multilingual model (see Table 4 and 3.2 Results and Analysis: Influence of Adversarial Perturbations: “Table 4: Accuracy (in %) on MLDoc English code-switching data. 3.2 Results and Analysis: Influence of Adversarial Perturbations: … As a result, the test set documents consist of a form of code-switched language, in which many words are non-English but the word order remains unchanged. The replacement rates are listed in Table 4, along with the experimental results. We observe that the baselines have a low accuracy when faced with such codeswitching in the test set.”), the method comprising:
receiving a training dataset of original matrix language examples and a set of embedded languages (see 3.2 Results and Analysis: Influence of Adversarial Perturbations: “… We create challenge datasets that adopt the original English training data, while the test data consists of English documents in which we attempt to replace all vocabulary words with non-English translations based on the bilingual English to non-English dictionaries from MUSE. As a result, the test set documents consist of a form of code-switched language, in which many words are non-English but the word order remains unchanged.”), wherein an original matrix language example in the training dataset corresponds with a respective label (see 2. Method: Adversarial Training: “Our adversarial self-learning process proceeds as follows. First, we train the entire network 𝑓 (·; 𝜃) in 𝐾 epochs using a set of labeled data 𝐿 = {(x𝑖, 𝑦𝑖) | 𝑖 = 1, ..., 𝑛} from the source language, where 𝑛 is the number of labeled instances, x𝑖 ∈ X consists of embedding vectors [v1, v2, ..., v𝑇 ] for each instance (𝑇 is the length of one sequence), and 𝑦𝑖 ∈ Y are the corresponding ground truth labels.”);
constructing an adversarial distribution by enumerating perturbations per embedded language in successful adversaries of the training dataset (see 1. Introduction: Overview and Contributions: “Our model begins by learning just from available source language samples, drawing on a multilingual encoder with added adversarial perturbation. Without loss of generality, in the following, we assume English to be the source language. After training on English, subsequently, we use the same model to make predictions on unlabeled non-English samples and a part of those samples with high confidence prediction scores are repurposed to serve as labeled examples for a next iteration of adversarial training until the model converges.”);
sampling, for an original matrix language example, a subset of embedded languages from the adversarial distribution (see  1. Introduction: Overview and Contributions citation as in previous limitation, above.: Here, the sampling of subset of languages (i.e., English, non-English) from the adversarial distribution (i.e., confidence) is interpreted to be associated with the samples which are associated with high confidence score.);
translating the original matrix language example into a set of translated examples corresponding to the subset of embedded languages (see 3.2 Results and Analysis: Influence of Adversarial Perturbations: “… We create challenge datasets that adopt the original English training data, while the test data consists of English documents in which we attempt to replace all vocabulary words with non-English translations based on the bilingual English to non-English dictionaries from MUSE.”);
sampling a perturbation according to a probability for a language unit in the original matrix language example (see 1. Introduction: Overview and Contributions citation as in one of the previous limitations, above: Here, the sampling is associated with the repurposed samples based on high confidence scores.);
generating a code-mixed adversarial example by applying one or more perturbations to the original matrix language example (see 1.	Introduction: Overview and Contributions: “…At the same time, because adversarial training makes tiny perturbations that barely affect the prediction result, the perturbations on words during self-learning can be viewed as inducing a form of code-switching, which replaces some original source language words with potential nearby non-English word representations.”); and
training the multilingual model based on an input of the generated code-mixed adversarial example and the respective label corresponding to the original matrix language example (see Figure 1: “Illustration of self-learning process with adversarial training for cross-lingual classification.”).

As to independent claim 11, Dong et al. further teaches:
11. A system for code-mixed adversarial training of a multilingual model (see Table 4 and 3.2 Results and Analysis: Influence of Adversarial Perturbations citations as in claim 1, above.), the system comprising:
a memory that stores the multilingual model (see CCS concepts and 3.2. Results and Analysis: Cross-lingual Intent Classification: CCS concepts: “Computing methodologies → Natural language processing;” and 3.2. Results and Analysis: Cross-lingual Intent Classification: “To evaluate the generalization of our framework to cross-lingual intent classification, we consider a diverse set of baselines as listed in Table 3. Schuster et al. [17] propose to combine Multilingual CoVe [21] with an autoencoder objective and then use the encoder with a CRF model. We also run experiments on Multilingual BERT and observe that it does not outperform the method from Liu et al. [13],” Here, with the evaluation of framework and runs of experiments, the use of a computing device (e.g., computer) is inherent. Hence, the use of a memory, communication interface, and processors is also inherent.);
a communication interface that receives a training dataset of original matrix language examples and a set of embedded languages, wherein an original matrix language example in the training dataset corresponds with a respective label (see CCS concepts and 3.2. Results and Analysis: Cross-lingual Intent Classification citations and discussion as in previous limitation, above.); and
one or more hardware processors (see CCS concepts and 3.2. Results and Analysis: Cross-lingual Intent Classification citations and discussion as in previous limitation, above.) that:
[perform the limitations as in independent claim 1].

As to independent claim 20, Dong et al. further teaches:
20. A non-transitory processor-readable storage medium storing processor-executable instructions for code-mixed adversarial training of a multilingual model, the instructions being executed by a processor to perform operations (see CCS concepts and 3.2. Results and Analysis: Cross-lingual Intent Classification citations and discussion as in claim 11, above. Here, with the evaluation of framework and runs of experiments, the use of a computing device (e.g., computer) is inherent. Hence, the use of a memory and a non-transitory processor-readable storage medium storing processor-executable instruction is also inherent.) comprising:
[perform the limitations as in independent claim 1].
Regarding claims 2 and 12, Dong et al. further teaches:
2 and 12. The method of claim 1, wherein the language unit is a word or a phrase comprising a sequence of words that have a designated meaning when recited as the sequence of words (see 1. Introduction: Overview and Contributions: “… After training on English, subsequently, we use the same model to make predictions on unlabeled non-English samples and a part of those samples with high confidence prediction scores are repurposed to serve as labeled examples for a next iteration of adversarial training until the model converges… At the same time, because adversarial training makes tiny perturbations that barely affect the prediction result, the perturbations on words during self-learning can be viewed as inducing a form of code-switching, which replaces some original source language words with potential nearby non-English word representations.”).

Regarding claims 3 and 13, Dong et al. further teaches:
3 and 13. The method of claim 1, wherein the code-mixed adversarial example is a word-level adversarial example (see 1. Introduction: Overview and Contributions citations as in claim 2, above.) that is generated by:
generating a set of candidate adversaries by substituting one or more words in the original matrix language example with one or more translated words in a translated example from the set of translated examples (see 1. Introduction: Overview and Contributions citations as in claim 2, above. “English” and “Non-English word representations”);
computing a respective loss of the multilingual model by passing each candidate adversary from the set of candidate adversaries through the multilingual model and obtaining a respective output from the multilingual model in response to the respective candidate adversary (see 2. Method: Adversarial Training: “To perform adversarial training, the loss function becomes: Ladv (x𝑖 , 𝑦𝑖) = L (𝑓 (x𝑖 + radv; 𝜃), 𝑦𝑖) (1) where radv = argmax r,| |r| | ≤𝜖 L (𝑓 (x𝑖 + r; ˜𝜃), 𝑦𝑖). Here r is a perturbation on the input and ˜𝜃 is a set of parameters set to match the current parameters of the entire network, but ensuring that gradient propagation only proceeds through the adversarial example construction process. At each step of training, the worst case perturbations radv are calculated against the current model 𝑓 (x𝑖 ; ˜𝜃) in Equation 1, and we train the model to be robust to such perturbations by minimizing Equation 1 with respect to 𝜃… During the actual training, we optimize the loss function of the adversarial training in Equation 1 based on the adversarial perturbation defined by Equation 2 in each step.);
determining a specific candidate adversary that maximizes the respective loss based on a beam search on the set of candidate adversaries (see 2. Method: Self-Learning: “Subsequently, in order to encourage the model to adapt specifically to the target language, the next step is to make predictions for the unlabeled instances in 𝑈 = {x𝑢 | 𝑢 = 1, ...,𝑚}. We can then incorporate unlabeled target language data with high classification confidence scores into the training set. To ensure robustness, we adopt a balanced selection mechanism, i.e., we first select a separate subset {x𝑠 | 𝑠 = 1, ..., 𝐾t} of the unlabeled data for each class, consisting of the top 𝐾t highest confidence items based on the current trained model. The union set 𝑈s of selected items is merged into the training set 𝐿 and then we retrain the model, again with adversarial perturbation. This process is repeated iteratively until some termination criterion is met.”).
Regarding claims 4 and 14, Dong et al. further teaches:
4 and 14. The method of claim 3, further comprising:
filtering candidate perturbations by checking whether a respective candidate perturbation exists in the translated example (see 2. Method: Self-Learning: “Subsequently, in order to encourage the model to adapt specifically to the target language [i.e., associated with translation], the next step is to make predictions for the unlabeled instances in 𝑈 = {x𝑢 | 𝑢 = 1, ...,𝑚}. We can then incorporate unlabeled target language data with high classification confidence scores into the training set. To ensure robustness, we adopt a balanced selection mechanism, i.e., we first select a separate subset {x𝑠 | 𝑠 = 1, ..., 𝐾t} of the unlabeled data for each class, consisting of the top 𝐾t highest confidence items based on the current trained model. The union set 𝑈s of selected items is merged into the training set 𝐿 and then we retrain the model, again with adversarial perturbation. This process is repeated iteratively until some termination criterion is met.”).

Regarding claims 7 and17, Dong et al. further teaches:
7 and 17. The system of claim 1, wherein the code-mixed adversarial example is repeatedly generated for a first pre-defined number of times thereby resulting in a first pre-defined number of code-mixed adversarial examples corresponding to the original matrix language example (see  2. Method: Self-Learning and 3. Experiments: Model Details: “Subsequently, in order to encourage the model to adapt specifically to the target language, the next step is to make predictions for the unlabeled instances in 𝑈 = {x𝑢 | 𝑢 = 1, ...,𝑚}. We can then incorporate unlabeled target language data with high classification confidence scores into the training set. To ensure robustness, we adopt a balanced selection mechanism, i.e., we first select a separate subset {x𝑠 | 𝑠 = 1, ..., 𝐾t} of the unlabeled data for each class, consisting of the top 𝐾t highest confidence items based on the current trained model. The union set 𝑈s of selected items is merged into the training set 𝐿 and then we retrain the model, again with adversarial perturbation. This process is repeated iteratively until some termination criterion is met.” And 3. Experiments: Model Details: We tune the hyper-parameters [i.e., max. sequence length... # of training epochs] for our neural network architecture based on each non-English validation set. For the multilingual encoder, we invoke the Multilingual BERT model [4], which supports 104 languages1. Most hyper-parameters are shown in Table 2, with the exception that lower-casing is omitted for Thai and 𝜖 is 10 in the Japanese experiment. We rely on early stopping as a termination criterion, specifically, when the performance on the validation set stops improving in 2 [i.e., predefined number of times] self-learning iterations.”).

Regarding claim 10, Dong et al. further teaches:
10. The method of claim 1, further comprising:
generating an adversarial attack example based on the original matrix language example and the set of embedded languages (see 1. Introduction: Overview and Contributions: “Our model begins by learning just from available source language samples, drawing on a multilingual encoder with added adversarial perturbation. Without loss of generality, in the following, we assume English to be the source language…The adversarial perturbation improves robustness and generalization by regularizing our model. At the same time, because adversarial training makes tiny perturbations that barely affect the prediction result, the perturbations on words during self-learning can be viewed as inducing a form of code-switching, which replaces some original source language words with potential nearby non-English word representations.”); and
testing an output robustness of the multilingual model using the adversarial attack example (see 1. Introduction: Overview and Contributions citation as in previous limitation, above.). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Dong et al. (Dong, Xin, et al. "Leveraging adversarial training in self-learning for cross-lingual text classification." Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020.; https://dl.acm.org/doi/pdf/10.1145/3397271.3401209) as applied to claims 1 and 11 above, and further in view of He et al. (US 20110246177 A1). 

Regarding claims 5 and 15, Dong et al. teaches all of the limitations as in claims 1 and 11, above.
Dong et al. further teaches:
5 and 15. The method of claim 1, wherein the code-mixed adversarial example is a phrase-level adversarial example (see 1. Introduction: Overview and Contributions: “The adversarial perturbation improves robustness and generalization by regularizing our model. At the same time, because adversarial training makes tiny perturbations that barely affect the prediction result, the perturbations on words during self-learning can be viewed as inducing a form of code-switching, which replaces some original source language words with potential nearby non-English word representations. [i.e., source language (i.e., English) nearby non-English words, provide inherency of code-mixed phrase level examples]”) that is generated by:
computing a respective loss of the multilingual model by passing each candidate adversary from the set of candidate adversaries through the multilingual model and obtaining a respective output from the multilingual model in response to the respective candidate adversary (see 2. Method: Adversarial Training: “To perform adversarial training, the loss function becomes: Ladv (x𝑖 , 𝑦𝑖) = L (𝑓 (x𝑖 + radv; 𝜃), 𝑦𝑖) (1) where radv = argmax r,| |r| | ≤𝜖 L (𝑓 (x𝑖 + r; ˜𝜃), 𝑦𝑖). Here r is a perturbation on the input and ˜𝜃 is a set of parameters set to match the current parameters of the entire network, but ensuring that gradient propagation only proceeds through the adversarial example construction process. At each step of training, the worst case perturbations radv are calculated against the current model 𝑓 (x𝑖 ; ˜𝜃) in Equation 1, and we train the model to be robust to such perturbations by minimizing Equation 1 with respect to 𝜃… During the actual training, we optimize the loss function of the adversarial training in Equation 1 based on the adversarial perturbation defined by Equation 2 in each step.);
and determining a specific candidate adversary that maximizes the respective loss based on a beam search on the set of candidate adversaries (see 2. Method: Self-Learning: “Subsequently, in order to encourage the model to adapt specifically to the target language, the next step is to make predictions for the unlabeled instances in 𝑈 = {x𝑢 | 𝑢 = 1, ...,𝑚}. We can then incorporate unlabeled target language data with high classification confidence scores into the training set. To ensure robustness, we adopt a balanced selection mechanism, i.e., we first select a separate subset {x𝑠 | 𝑠 = 1, ..., 𝐾t} of the unlabeled data for each class, consisting of the top 𝐾t highest confidence items based on the current trained model. The union set 𝑈s of selected items is merged into the training set 𝐿 and then we retrain the model, again with adversarial perturbation. This process is repeated iteratively until some termination criterion is met.”).

However, Dong et al. does not explicitly teach, but He et al. does teach:
aligning words in the original matrix language example to translated words in a translated example from the set of translated examples (see ¶ [0022]: “According to another aspect of the present invention, a machine translation method is provided. The method includes receiving a bilingual aligned text and an annotated corpus, generating a bilingual aligned text based on the phrase to be translated”);
identifying one or more phrases in the original matrix language example or the translated example based [on] the aligning (see ¶ [0022] citation as in limitation above.: “phrase to be translated”);
generating a set of candidate adversaries by substituting the one or more phrases in the original matrix language example with one or more counterpart phrases in the translated example from the set of translated examples (see ¶ [0022]: “…generating syntactic annotated corpus based on the annotated corpus and the bilingual aligned text, generating a phrase alignment table based on the bilingual aligned text, generating a syntactic based noncontiguous phrase rule set based on the syntactic annotated corpus and the phrase alignment table, machine translating an input sentence into a target language based on at least one of the phrase alignment table and the syntactic based noncontiguous phrase rule set, evaluating results of the machine translation based on an evaluation model; and outputting, as a translated sentence, a result of the evaluation having a highest score among the evaluated results.”);
Dong et al. and He et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing and/or translation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have Dong et al. to incorporate the teachings of He et al. of aligning words in the original matrix language example to translated words in a translated example from the set of translated examples; identifying one or more phrases in the original matrix language example or the translated example based [on] the aligning; generating a set of candidate adversaries by substituting the one or more phrases in the original matrix language example with one or more counterpart phrases in the translated example from the set of translated examples which provides the benefit of improving the translation result (abstract of He et al.).

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Dong et al. (Dong, Xin, et al. "Leveraging adversarial training in self-learning for cross-lingual text classification." Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020.; https://dl.acm.org/doi/pdf/10.1145/3397271.3401209), and further in view of He et al. (US 20110246177 A1) as applied to claims 5 and 15 above and further in view of Wang et al. (Wang, Xiaosen, Hao Jin, and Kun He. "Natural Language Adversarial Attack and Defense in Word Level." (2019).; https://openreview.net/attachment?id=BJl_a2VYPH&name=original_pdf).

Regarding claims 6 and 16, Dong et al. in combination with He et al. teaches all of the limitations as in claims 5 and 15, above.
However, Dong et al. in combination with He et al. do not explicitly teach, but Wang et al. does teach:
6 and 16. The method of claim 5, further comprising:
applying an equivalence constraint that prevents a perturbation from being applied if the perturbation is from a same language as a previous word and disrupts a syntax of a current phrase (see Abstract and 3. The proposed text defense method: 3.1 Motivation: “Abstract: … there exists no defense method against the successful synonym substitution based attacks that aim to satisfy all the lexical, grammatical, semantic constraints and thus are hard to perceived by humans. We contribute to fill this gap and propose a novel adversarial defense method called Synonym Encoding Method (SEM), which inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations. 3 THE PROPOSED TEXT DEFENSE METHOD: 3.1 MOTIVATION: Based on this insight, we propose a new method called Synonym Encoding Method to locate the neighbors of an input x.”).
Dong et al. in combination with He et al. and Wang et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing and/or translation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dong et al. in combination with He et al. to incorporate the teachings of Wang et al. of applying an equivalence constraint that prevents a perturbation from being applied if the perturbation is from a same language as a previous word and disrupts a syntax of a current phrase which provides the benefit of a successful word substitution that satisfy grammatical constraints (abstract of Wang et al.).

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Dong et al. (Dong, Xin, et al. "Leveraging adversarial training in self-learning for cross-lingual text classification." Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020.; https://dl.acm.org/doi/pdf/10.1145/3397271.3401209 as applied to claims 7 and 17 above and further in view of Devarakonda et al. (Devarakonda, Aditya, Maxim Naumov, and Michael Garland. "Adabatch: Adaptive batch sizes for training deep neural networks." arXiv preprint arXiv:1712.02029 (2017).; https://arxiv.org/pdf/1712.02029.pdf). 

Regarding claims 8 and 18, Dong et al. teaches all of the limitations as in claims 7 and 17, above.
However, Dong et al. do not explicitly teach, but Devarakonda et al. does teach:
8 and 18. The system of claim 7, wherein the first pre-defined number is set to be equal to a second pre-defined number associated with training epochs for the multilingual model minus one (see ¶ 4.1 FIXED VS. DYNAMIC BATCH SIZES: “As we have illustrated in Section 2, learning rate decay is a widely used technique to avoid stagnation during training. While learning rate schedules may help improve test error, they rarely lead to faster training times. We begin by performing experiments to validate our claim that adaptive batch sizes can be used without significantly affecting test accuracy. For these experiments, we use SGD with momentum of 0.9, weight decay of 5 × 10−4 , and perform 100 epochs of training. We use a base learning rate of α = 0.01 and decay it every 20 epochs. For the adaptive method we decay the learning rate by 0.75 and simultaneously double the batch size at the same 20-epoch intervals. The learning rate decay of 0.75 and batch size doubling combine for an effective learning rate decay of 0.375; therefore, we use a learning rate decay of 0.375 for the fixed batch size experiments for the most direct comparison. All experiments in this section are performed on a single Tesla P100.”
Here, the adjustability of the batch size is interpreted to teach the predefined number of samples set to a second predefined number minus one.).
Dong et al. and Devarakonda et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing and/or translation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dong et al. in combination with He et al. to incorporate the teachings of Devarakonda et al. of wherein the first pre-defined number is set to be equal to a second pre-defined number associated with training epochs for the multilingual model minus one which provides the benefit of improving performance of the system (abstract of Devarakonda et al.).

Regarding claims 9 and 19, Dong et al. in combination with Devarakonda et al. teaches all of the limitations as in claims 8 and 18, above.
Dong et al. further teaches:
9 and 19. The system of claim 8, wherein the training of the multilingual model is performed using a set of generated code-mixed adversarial examples for one training epoch (see 2. Method: Adversarial Training: “Our adversarial self-learning process proceeds as follows. First, we train the entire network 𝑓 (·; 𝜃) in 𝐾 epochs using a set of labeled data 𝐿 = {(x𝑖 , 𝑦𝑖) | 𝑖 = 1, ..., 𝑛} from the source language…”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659



/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659     

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659