DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The disclosure is objected to because of the following informalities:
In paragraph 0006, line 14, “multilingual NPL model” should read “multilingual NLP model”.
In paragraph 0007, lines 13-14, “multilingual NPL model” should read “multilingual NLP model”.
In paragraph 0008, line 14, “multilingual NPL model” should read “multilingual NLP model”.
In paragraph 0013, line 1, “shows an example” should read “shows an example of”.
In paragraph 0026, lines 5-6, “using the multilingual provided” appears to be missing a word between “multilingual” and “provided”, such as “model”.
In paragraph 0043, line 9, “elements of that were” should read “elements that were”.
In paragraph 0046, line 12, “voting logic unit 260” should read “voting logic unit 250”.
In paragraph 0046, line 20, “voter logic 420” should read “voting logic unit 250”.
In paragraph 0051, line 6, “to send more a different number” should read “to send a different number”.
In paragraph 0053, line 1, “label processing unit 415” should read “label processing unit 420”.
In paragraph 0053, lines 3-4, “label processing unit 415” should read “label processing unit 420”.
In paragraph 0053, line 6, “the processing unit 415” should read “the response processing unit 415”.
In paragraph 0054, lines 1-2, “FIG. 5 is a flow chart of an example process 500 for performing a semantic analysis of digital ink stroke data.” is not an accurate description of the process in Figure 5.
In paragraph 0056, lines 7-8, “be configured automatically select” should read “be configured to automatically select”.
In paragraph 0057, lines 4-5, “large teacher NLP models 245 large pretrained NLP models” should read “large teacher NLP models 245 are large pretrained NLP models”.
In paragraph 0059, line 3, “voting logic unit 260” should read “voting logic unit 250”.
In paragraph 0060, line 3, “first candidate label determined” should read “first candidate label is determined”.
In paragraph 0061, line 2, “multilingual NPL model” should read “multilingual NLP model”.
In paragraph 0071, line 3, “rather than rather than” should read “rather than”.
In paragraph 0081, line 8, and paragraph 0082, line 8, the trademarks BLUETOOTH® and WI-FI® are used without being cited as a registered trademark.
Appropriate correction is required.
Claim Objections
Claims 1, 5, 8, 12, 15 and 19 are objected to because of the following informalities:
In Claim 1, line 19, “multilingual NPL model” should read “multilingual NLP model”.
In Claim 5, line 13, “multilingual NPL model” should read “multilingual NLP model”.
In Claim 8, line 17, “multilingual NPL model” should read “multilingual NLP model”.
In Claim 12, line 12, “multilingual NPL model” should read “multilingual NLP model”.
In Claim 15, line 17, “multilingual NPL model” should read “multilingual NLP model”.
In Claim 19, line 12, “multilingual NPL model” should read “multilingual NLP model”.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 3, 10 and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps.
Claim 3 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  The omitted steps are:
Before discarding the first content item, determining that the first content item should be discarded.
After discarding the first content item, selecting another content item to use for generating training data.
The specification, in paragraph 0042, lines 1-7, discloses “If, based on the human user feedback, the crowdsourced labeling unit 260 determines that the label is not representative of the candidate English-language phrase or sentence, the multilingual training data generation pipeline 210 may discard the candidate English-language phrase or sentence and label 255. The multilingual training data generation pipeline 210 may then select a new English-language selected parallel corpora element 235 from the parallel corpora 280 and repeat the label-generation process for that selected element.”  Without the “determines that the label is not representative of the candidate English-language phrase or sentence” step, discarding the first content item would not be meaningful, since the purpose of discarding the content item is to not use content items with unrepresentative labels for generating training data.  Also, without the “select a new English-language selected parallel corpora element 235 from the parallel corpora 280 and repeat the label-generation process for that selected element” step, no training data would be produced, preventing the subsequent step of “training a first pretrained multilingual NLP model with the first training data and the second training data” from being performed.
Claim 10 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  The omitted steps are:
Before discarding the first content item, determining that the first content item should be discarded.
After discarding the first content item, selecting another content item to use for generating training data.
The specification, in paragraph 0042, lines 1-7, discloses “If, based on the human user feedback, the crowdsourced labeling unit 260 determines that the label is not representative of the candidate English-language phrase or sentence, the multilingual training data generation pipeline 210 may discard the candidate English-language phrase or sentence and label 255. The multilingual training data generation pipeline 210 may then select a new English-language selected parallel corpora element 235 from the parallel corpora 280 and repeat the label-generation process for that selected element.”  Without the “determines that the label is not representative of the candidate English-language phrase or sentence” step, discarding the first content item would not be meaningful, since the purpose of discarding the content item is to not use content items with unrepresentative labels for generating training data.  Also, without the “select a new English-language selected parallel corpora element 235 from the parallel corpora 280 and repeat the label-generation process for that selected element” step, no training data would be produced, preventing the subsequent step of “training a first pretrained multilingual NLP model with the first training data and the second training data” from being performed.
Claim 17 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  The omitted steps are:
Before discarding the first content item, determining that the first content item should be discarded.
After discarding the first content item, selecting another content item to use for generating training data.
The specification, in paragraph 0042, lines 1-7, discloses “If, based on the human user feedback, the crowdsourced labeling unit 260 determines that the label is not representative of the candidate English-language phrase or sentence, the multilingual training data generation pipeline 210 may discard the candidate English-language phrase or sentence and label 255. The multilingual training data generation pipeline 210 may then select a new English-language selected parallel corpora element 235 from the parallel corpora 280 and repeat the label-generation process for that selected element.”  Without the “determines that the label is not representative of the candidate English-language phrase or sentence” step, discarding the first content item would not be meaningful, since the purpose of discarding the content item is to not use content items with unrepresentative labels for generating training data.  Also, without the “select a new English-language selected parallel corpora element 235 from the parallel corpora 280 and repeat the label-generation process for that selected element” step, no training data would be produced, preventing the subsequent step of “training a first pretrained multilingual NLP model with the first training data and the second training data” from being performed.
Claims 11 and 13 – 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 11 recites the limitation "the first pretrained multilingual NLP model" in line 5.  There is insufficient antecedent basis for this limitation in the claim.  Claim 11 depends from claim 8, and claim 8 recites “a pretrained multilingual NLP model”, but it is unclear if "the first pretrained multilingual NLP model" in claim 11 refers to “a pretrained multilingual NLP model” in claim 8.  Amending claim 8 to change “a pretrained multilingual NLP model” to “a first pretrained multilingual NLP model” would resolve the indefinite language of claim 11.
Claim 13 recites the limitation "the first pretrained multilingual NLP model" in line 2.  There is insufficient antecedent basis for this limitation in the claim.  Claim 13 depends from claim 8, and claim 8 recites “a pretrained multilingual NLP model”, but it is unclear if "the first pretrained multilingual NLP model" in claim 13 refers to “a pretrained multilingual NLP model” in claim 8.  Amending claim 8 to change “a pretrained multilingual NLP model” to “a first pretrained multilingual NLP model” would resolve the indefinite language of claim 13.
Claim 14 recites the limitation "the first pretrained multilingual model" in lines 2-3.  There is insufficient antecedent basis for this limitation in the claim.  Claim 14 depends from claim 8, and claim 8 recites “a pretrained multilingual NLP model”, but it is unclear if "the first pretrained multilingual model" in claim 14 refers to “a pretrained multilingual NLP model” in claim 8.  Amending claim 8 to change “a pretrained multilingual NLP model” to “a first pretrained multilingual NLP model” would resolve the indefinite language of claim 14.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 15 – 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim does not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation of “computer-readable storage medium” can encompass non-statutory transitory forms of signal transmission, such as a propagating electrical or electromagnetic signal per se.
The specification, in paragraph 0078, lines 12-13, recites “The term "machine-readable medium" excludes signals per se”, but does not specifically state that the term “computer-readable storage medium” excludes signals per se.
Claims 16 – 20 depend from claim 15 and are also rejected under 35 U.S.C. 101 because the broadest reasonable interpretation of “computer-readable storage medium” can encompass non-statutory transitory forms of signal transmission, such as a propagating electrical or electromagnetic signal per se.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 8 and 11 – 13 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Pikuliak et al. ("Cross-Lingual Learning for Text Processing: A Survey"), hereinafter Pikuliak.
Regarding claim 8, Pikuliak discloses a method implemented in a data processing system for generating training data for a multilingual natural language processing model, the method comprising:
obtaining a corpus comprising a plurality of first content items and a plurality of second content items, wherein the first content items comprise English-language textual content, and the plurality of second content items comprise translations of the first content items in one or more non-English target languages (Section 6.1.1, lines 1-5, "Corresponding samples – pair of samples in different languages that should share the label – are generated using cross-lingual resources. All label-based transfer techniques require a process to generate corresponding samples. Two main approaches are using machine translation and parallel corpora.");
selecting a first content item from the plurality of first content items (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; The source language sample (xS) reads on the first content item.);
generating a plurality of candidate labels for the first content item by analyzing the first content item with a plurality of first English-language natural language processing (NLP) models (Section 6.1, lines 48-50, "An unannotated LT sample has a corresponding LS sample generated (e.g. by MT) and then a preexisting LS model is used to infer its label."; Section 5.7, lines 1-9, "Pre-trained language models are a state-of-the-art NLP technique. A large amount of text data is used to train a high capacity (hundred of millions of parameters) language model. Then we can use the parameters from this language model to initialize further training with different NLP tasks. The parameters are fine-tuned with the additional target task data. This is a form of transfer learning, where we use language modeling as a source task. The most well known pre-trained language models are BERT (Devlin, Chang, Lee, & Toutanova, 2019) and ELMo (Peters et al., 2018)."; Section 6.1.3, line 16, "Amini et al. (2009) use multiple LS models"; Using a preexisting source language (LS) model to infer a label reads on generating a candidate label with an English-language natural language processing model, the BERT and ELMo pre-trained language models demonstrate a plurality of first English-language natural language processing models, and using multiple source language (LS) models reads on generating a plurality of candidate labels with a plurality of first English-language natural language processing (NLP) models.);
determining whether a majority of the candidate labels for the first content item are consistent (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label");
selecting a first candidate label from the plurality of candidate labels responsive to the majority of the candidate labels being consistent (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; The source language label (yS) reads on the first candidate label, and multiple source language (LS) models voting on the final projected label reads on the majority of the candidate labels being consistent.);
generating first training data for fine tuning the multilingual NLP model by associating the first candidate label with the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 54-56, "The preexisting LS model can be trained on LS data only, or it can be improved by adding LT samples as well."; Generating the source language label (yS) from the source language sample (xS) and the source language model (MS) reads on associating the first candidate label with the first content item, and training the source language (LS) model with source language data reads on the first training data.);
generating second training data for fine tuning the multilingual NLP model by associating the first candidate label with a second content item of the plurality of second content items (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language reads on associating the first candidate label with a second content item, and using the projected labels for further training reads on the second training data.);
and training a pretrained multilingual NLP model with the first training data and the second training data to fine tune the training of the NLP model with respect to English and a respective non-English target language associated with the second content item (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the first training data and the second training data.).
Regarding claim 11, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pikuliak discloses the method as claimed in claim 8, further comprising:
generating a set of third training data by generating a respective training data for each respective content item of the plurality of second content items other than the second content item by associating the respective content item with the first candidate label (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1.3, lines 5-9, " LS model and parallel model can be trained with additional data translated from LT. This can improve the results during inference, since the model was already exposed to the translated data during training and does not suffer from the domain shift that much."; Repeating Algorithm 2 for multiple target language samples (xT) reads on generating training data for each respective content item of the plurality of second content items.);
and training the first pretrained multilingual NLP model with the set of third training data to further fine tune the training of the NLP model with respect to each non-English target language associated with the second content item (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."’; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the third training data.).
Regarding claim 12, Pikuliak discloses the method as claimed in claim 8, further comprising:
automatically generating a plurality of candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1, lines 48-50, "An unannotated LT sample has a corresponding LS sample generated (e.g. by MT) and then a preexisting LS model is used to infer its label."; Section 6.1.3, line 16, "Amini et al. (2009) use multiple LS models"; Using a preexisting source language (LS) model to infer a label reads on generating a candidate label for each content item, and using multiple source language (LS) models reads on generating a plurality of candidate labels.);
automatically generating a plurality of second candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on generating candidate labels for each content item.);
automatically selecting a second candidate label from the plurality for candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on selecting a candidate label for each content item.);
generating third training data for fine tuning the multilingual NLP model by associating the second candidate label with each respective content item of the plurality of first content items other than the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 54-56, "The preexisting LS model can be trained on LS data only, or it can be improved by adding LT samples as well."; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on associating a second candidate label with each content item, and training the source language (LS) model with source language data reads on the third training data.);
and generating fourth training data for fine tuning the multilingual NLP model by associating the second candidate label with a fourth content item of the plurality of second content items, the fourth content item being a translation of the respective content item of the plurality of first content items other than the first content item  (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language reads on associating the second candidate label with a fourth content item, and using the projected labels for further training reads on the fourth training data.).
Regarding claim 13, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pikuliak discloses the method as claimed in claim 12, further comprising:
training the first pretrained multilingual NLP model with the third training data and the fourth training data (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the third training data and the fourth training data.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4 – 6, 15 and 18 – 19 are rejected under 35 U.S.C. 103 as being unpatentable over Pikuliak in view of Das et al. (US Patent No. 9,779,087), hereinafter Das.
Regarding claim 1, Pikuliak discloses a data processing system comprising:
instructions that, when executed, cause the processor to perform operations comprising:
obtaining a corpus comprising a plurality of first content items and a plurality of second content items, wherein the first content items comprise English-language textual content, and the plurality of second content items comprise translations of the first content items in one or more non-English target languages (Section 6.1.1, lines 1-5, "Corresponding samples – pair of samples in different languages that should share the label – are generated using cross-lingual resources. All label-based transfer techniques require a process to generate corresponding samples. Two main approaches are using machine translation and parallel corpora.");
selecting a first content item from the plurality of first content items (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; The source language sample (xS) reads on the first content item.);
generating a plurality of candidate labels for the first content item by analyzing the first content item with a plurality of first English-language natural language processing (NLP) models (Section 6.1, lines 48-50, "An unannotated LT sample has a corresponding LS sample generated (e.g. by MT) and then a preexisting LS model is used to infer its label."; Section 5.7, lines 1-9, "Pre-trained language models are a state-of-the-art NLP technique. A large amount of text data is used to train a high capacity (hundred of millions of parameters) language model. Then we can use the parameters from this language model to initialize further training with different NLP tasks. The parameters are fine-tuned with the additional target task data. This is a form of transfer learning, where we use language modeling as a source task. The most well known pre-trained language models are BERT (Devlin, Chang, Lee, & Toutanova, 2019) and ELMo (Peters et al., 2018)."; Section 6.1.3, line 16, "Amini et al. (2009) use multiple LS models"; Using a preexisting source language (LS) model to infer a label reads on generating a candidate label with an English-language natural language processing model, the BERT and ELMo pre-trained language models demonstrate a plurality of first English-language natural language processing models, and using multiple source language (LS) models reads on generating a plurality of candidate labels with a plurality of first English-language natural language processing (NLP) models.);
determining whether a majority of the candidate labels for the first content item are consistent (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label");
selecting a first candidate label from the plurality of candidate labels responsive to the majority of the candidate labels being consistent (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; The source language label (yS) reads on the first candidate label, and multiple source language (LS) models voting on the final projected label reads on the majority of the candidate labels being consistent.);
generating first training data for fine tuning the multilingual NLP model by associating the first candidate label with the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 54-56, "The preexisting LS model can be trained on LS data only, or it can be improved by adding LT samples as well."; Generating the source language label (yS) from the source language sample (xS) and the source language model (MS) reads on associating the first candidate label with the first content item, and training the source language (LS) model with source language data reads on the first training data.);
generating second training data for fine tuning the multilingual NLP model by associating the first candidate label with a second content item of the plurality of second content items (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language reads on associating the first candidate label with a second content item, and using the projected labels for further training reads on the second training data.);
and training a first pretrained multilingual NLP model with the first training data and the second training data to fine tune the training of the NLP model with respect to English and a respective non-English target language associated with the second content item (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the first training data and the second training data.).
Pikuliak does not specifically disclose: a processor; and a computer-readable medium storing executable instructions.
Das teaches: a processor; and a computer-readable medium storing executable instructions (Column 2, lines 35-45, "A non-transitory, computer-readable medium is also presented. The computer-readable medium can have instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to perform operations including obtaining (i) an aligned bi-text for a source language and a target language, and (ii) a supervised sequence model for the source language. The operations can include labeling a source side of the aligned bi-text using the supervised sequence model to obtain a labeled source side of the aligned bi-text. The operations can include projecting labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text.").
Das teaches a processor executing instructions stored in a computer-readable medium in order to implement a method of cross-lingual learning of language sequence models (Column 1, lines 29-34, "Obtaining labeled training data for other languages, also known as resource-poor languages (e.g., Catalan, Estonian, Norwegian, and Ukrainian), can be costly and/or time consuming. Therefore, efficient techniques for cross-lingual learning of sequence models are needed."; Column 4, lines 57-67, "The target language can be a resource-poor language. The term “resource-poor” can refer to the corresponding language having no labeled training data or approximately no labeled training data. Examples of resource-poor languages include Catalan, Estonian, Norwegian, and Ukrainian. A supervised sequence model for the source language can be obtained by the computing device 200 and utilized to label a source side of the aligned bi-text to obtain a labeled source side. The labels from the source side can then be projected to a target side of the aligned bi-text to obtain a labeled target side.").
Pikuliak and Das are considered to be analogous to the claimed invention because they are in the same field of cross-lingual learning of language models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak to incorporate the teachings of Das to use a processor executing instructions stored in a computer-readable medium.  Doing so would allow for implementing a method of cross-lingual learning of language sequence models.
Regarding claim 4, Pikuliak in view of Das discloses the data processing system as claimed in claim 1.  Pikuliak further discloses: wherein the computer-readable medium includes instructions configured to cause the processor to perform operations of:
generating a set of third training data by generating a respective training data for each respective content item of the plurality of second content items other than the second content item by associating the respective content item with the first candidate label (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1.3, lines 5-9, " LS model and parallel model can be trained with additional data translated from LT. This can improve the results during inference, since the model was already exposed to the translated data during training and does not suffer from the domain shift that much."; Repeating Algorithm 2 for multiple target language samples (xT) reads on generating training data for each respective content item of the plurality of second content items.);
and training the first pretrained multilingual NLP model with the set of third training data to further fine tune the training of the NLP model with respect to each non-English target language associated with the second content item (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."’; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the third training data.).
Regarding claim 5, Pikuliak in view of Das discloses the data processing system as claimed in claim 1.  Pikuliak further discloses: wherein the computer-readable medium includes instructions configured to cause the processor to perform operations of:
automatically generating a plurality of candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1, lines 48-50, "An unannotated LT sample has a corresponding LS sample generated (e.g. by MT) and then a preexisting LS model is used to infer its label."; Section 6.1.3, line 16, "Amini et al. (2009) use multiple LS models"; Using a preexisting source language (LS) model to infer a label reads on generating a candidate label for each content item, and using multiple source language (LS) models reads on generating a plurality of candidate labels.);
automatically generating a plurality of second candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on generating candidate labels for each content item.);
automatically selecting a second candidate label from the plurality for candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on selecting a candidate label for each content item.);
generating third training data for fine tuning the multilingual NLP model by associating the second candidate label with each respective content item of the plurality of first content items other than the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 54-56, "The preexisting LS model can be trained on LS data only, or it can be improved by adding LT samples as well."; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on associating a second candidate label with each content item, and training the source language (LS) model with source language data reads on the third training data.);
and generating fourth training data for fine tuning the multilingual NLP model by associating the second candidate label with a fourth content item of the plurality of second content items, the fourth content item being a translation of the respective content item of the plurality of first content items other than the first content item  (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language reads on associating the second candidate label with a fourth content item, and using the projected labels for further training reads on the fourth training data.).
Regarding claim 6, Pikuliak in view of Das discloses the data processing system as claimed in claim 5.  Pikuliak further discloses: wherein the computer-readable medium includes instructions configured to cause the processor to perform operations of:
training the first pretrained multilingual NLP model with the third training data and the fourth training data (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the third training data and the fourth training data.).
Regarding claim 15, Pikuliak discloses:
obtaining a corpus comprising a plurality of first content items and a plurality of second content items, wherein the first content items comprise English-language textual content, and the plurality of second content items comprise translations of the first content items in one or more non-English target languages (Section 6.1.1, lines 1-5, "Corresponding samples – pair of samples in different languages that should share the label – are generated using cross-lingual resources. All label-based transfer techniques require a process to generate corresponding samples. Two main approaches are using machine translation and parallel corpora.");
selecting a first content item from the plurality of first content items (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; The source language sample (xS) reads on the first content item.);
generating a plurality of candidate labels for the first content item by analyzing the first content item with a plurality of first English-language natural language processing (NLP) models (Section 6.1, lines 48-50, "An unannotated LT sample has a corresponding LS sample generated (e.g. by MT) and then a preexisting LS model is used to infer its label."; Section 5.7, lines 1-9, "Pre-trained language models are a state-of-the-art NLP technique. A large amount of text data is used to train a high capacity (hundred of millions of parameters) language model. Then we can use the parameters from this language model to initialize further training with different NLP tasks. The parameters are fine-tuned with the additional target task data. This is a form of transfer learning, where we use language modeling as a source task. The most well known pre-trained language models are BERT (Devlin, Chang, Lee, & Toutanova, 2019) and ELMo (Peters et al., 2018)."; Section 6.1.3, line 16, "Amini et al. (2009) use multiple LS models"; Using a preexisting source language (LS) model to infer a label reads on generating a candidate label with an English-language natural language processing model, the BERT and ELMo pre-trained language models demonstrate a plurality of first English-language natural language processing models, and using multiple source language (LS) models reads on generating a plurality of candidate labels with a plurality of first English-language natural language processing (NLP) models.);
determining whether a majority of the candidate labels for the first content item are consistent (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label");
selecting a first candidate label from the plurality of candidate labels responsive to the majority of the candidate labels being consistent (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; The source language label (yS) reads on the first candidate label, and multiple source language (LS) models voting on the final projected label reads on the majority of the candidate labels being consistent.);
generating first training data for fine tuning the multilingual NLP model by associating the first candidate label with the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 54-56, "The preexisting LS model can be trained on LS data only, or it can be improved by adding LT samples as well."; Generating the source language label (yS) from the source language sample (xS) and the source language model (MS) reads on associating the first candidate label with the first content item, and training the source language (LS) model with source language data reads on the first training data.);
generating second training data for fine tuning the multilingual NLP model by associating the first candidate label with a second content item of the plurality of second content items (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language reads on associating the first candidate label with a second content item, and using the projected labels for further training reads on the second training data.);
and training a first pretrained multilingual NLP model with the first training data and the second training data to fine tune the training of the NLP model with respect to English and a respective non-English target language associated with the second content item (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the first training data and the second training data.).
Pikuliak does not specifically disclose: a computer-readable storage medium on which are stored instructions that, when executed, cause a processor of a programmable device to perform functions.
Das teaches: a computer-readable storage medium on which are stored instructions that, when executed, cause a processor of a programmable device to perform functions (Column 2, lines 35-45, "A non-transitory, computer-readable medium is also presented. The computer-readable medium can have instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to perform operations including obtaining (i) an aligned bi-text for a source language and a target language, and (ii) a supervised sequence model for the source language. The operations can include labeling a source side of the aligned bi-text using the supervised sequence model to obtain a labeled source side of the aligned bi-text. The operations can include projecting labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text.").
Das teaches a processor executing instructions stored in a computer-readable medium in order to implement a method of cross-lingual learning of language sequence models (Column 1, lines 29-34, "Obtaining labeled training data for other languages, also known as resource-poor languages (e.g., Catalan, Estonian, Norwegian, and Ukrainian), can be costly and/or time consuming. Therefore, efficient techniques for cross-lingual learning of sequence models are needed."; Column 4, lines 57-67, "The target language can be a resource-poor language. The term “resource-poor” can refer to the corresponding language having no labeled training data or approximately no labeled training data. Examples of resource-poor languages include Catalan, Estonian, Norwegian, and Ukrainian. A supervised sequence model for the source language can be obtained by the computing device 200 and utilized to label a source side of the aligned bi-text to obtain a labeled source side. The labels from the source side can then be projected to a target side of the aligned bi-text to obtain a labeled target side.").
Pikuliak and Das are considered to be analogous to the claimed invention because they are in the same field of cross-lingual learning of language models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak to incorporate the teachings of Das to use a processor executing instructions stored in a computer-readable medium.  Doing so would allow for implementing a method of cross-lingual learning of language sequence models.
Regarding claim 18, Pikuliak in view of Das discloses the computer-readable storage medium as claimed in claim 15.  Pikuliak further discloses: further comprising:
generating a set of third training data by generating a respective training data for each respective content item of the plurality of second content items other than the second content item by associating the respective content item with the first candidate label (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1.3, lines 5-9, " LS model and parallel model can be trained with additional data translated from LT. This can improve the results during inference, since the model was already exposed to the translated data during training and does not suffer from the domain shift that much."; Repeating Algorithm 2 for multiple target language samples (xT) reads on generating training data for each respective content item of the plurality of second content items.);
and training the first pretrained multilingual NLP model with the set of third training data to further fine tune the training of the NLP model with respect to each non-English target language associated with the second content item (Section 6, lines 13-14, "We identify four main categories for CLL, that we call transfer paradigms."; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."’; Transferring the labels from the source language to the target language and using the projected labels for further training reads on training a pretrained multilingual NLP model with the third training data.).
Regarding claim 19, Pikuliak in view of Das discloses the data processing system as claimed in claim 15.  Pikuliak further discloses: further comprising:
automatically generating a plurality of candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1, lines 48-50, "An unannotated LT sample has a corresponding LS sample generated (e.g. by MT) and then a preexisting LS model is used to infer its label."; Section 6.1.3, line 16, "Amini et al. (2009) use multiple LS models"; Using a preexisting source language (LS) model to infer a label reads on generating a candidate label for each content item, and using multiple source language (LS) models reads on generating a plurality of candidate labels.);
automatically generating a plurality of second candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on generating candidate labels for each content item.);
automatically selecting a second candidate label from the plurality for candidate labels for each respective content item of the plurality of first content items other than the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on selecting a candidate label for each content item.);
generating third training data for fine tuning the multilingual NLP model by associating the second candidate label with each respective content item of the plurality of first content items other than the first content item (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 54-56, "The preexisting LS model can be trained on LS data only, or it can be improved by adding LT samples as well."; Repeating Algorithm 2 to generate the source language label (yS) from the source language sample (xS) and the source language model (MS) for multiple source language samples reads on associating a second candidate label with each content item, and training the source language (LS) model with source language data reads on the third training data.);
and generating fourth training data for fine tuning the multilingual NLP model by associating the second candidate label with a fourth content item of the plurality of second content items, the fourth content item being a translation of the respective content item of the plurality of first content items other than the first content item  (Section 6.1.3, lines 14-15, "multiple LS models can vote on the final projected label"; Section 6.1, Algorithm 2, lines 1-7, "Algorithm 2. LS model label transfer. Require: MS (pre-trained LS model); Require: xT (LT sample); xS ← correspondence(xT); yS ← annotate (xS, MS); yT ← annotation projection (xS, yS, xT); return yT"; Section 6.1, lines 7-10, "Then the labels are transferred from one language to the other – this step is also called annotation projection. The projected labels can then be used for further training."; Transferring the labels from the source language to the target language reads on associating the second candidate label with a fourth content item, and using the projected labels for further training reads on the fourth training data.).
Claims 2 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Pikuliak in view of Das and further in view of Zhu et al. (US Patent Application Publication No. 2022/0188575), hereinafter Zhu.
Regarding claim 2, Pikuliak in view of Das discloses the data processing system as claimed in claim 1, but does not specifically disclose: wherein the computer-readable medium includes instructions configured to cause the processor to perform operations of sending a request to a crowdsourced work platform requesting a determination by one or more human analysts on whether the first candidate label is representative of the first content item; receiving a response from the crowdsourced work platform indicative of whether the first candidate label is representative of the first content item.
Zhu teaches:
sending a request to a crowdsourced work platform requesting a determination by one or more human analysts on whether the first candidate label is representative of the first content item (Paragraph 0001, line 1-3,"The present invention relates generally to the field of crowdsourcing to select a correct answer from among multiple potential candidate answers"; Paragraph 0012, lines 1-5, "Some embodiments of the present invention may be directed to technology for selecting a correct answer (for example, a correct label for a data set to be used in machine learning algorithms) from among a plurality of candidate answers");
receiving a response from the crowdsourced work platform indicative of whether the first candidate label is representative of the first content item (Paragraph 0072, lines 1-3, "At operation S516, all remaining options are provided to the workers who will then select a correct option from them."; Paragraph 0059, lines 3-5, "In some embodiments, majority voting is used to determine best answer and to identify more difficult cases for the next phase.").
Zhu teaches using crowdsourcing to select a correct label from a data set to be used in machine learning algorithms in order to save time and increase accuracy of labeling results (Paragraph 0040, lines 1-10, "Processing proceeds to operation S280, where acceptance mod 312 outputs the accepted subset of candidate labels for further processing by a human or by some type of software algorithm. In this example, the accepted subset of candidate labels are sent through communication network 114 to client subsystem 112 which is used by a human expert. Because of the selective culling of the rejected labels, the human expert has fewer candidate labels to concern herself with when labelling. This can save time and or increase the accuracy of the results from the expert.").
Zhu is considered to be analogous to the claimed invention because it is in the same field of crowdsourcing training data evaluation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak in view of Das to incorporate the teachings of Zhu to use crowdsourcing to select a correct label from a data set to be used in machine learning algorithms.  Doing so would allow for saving time and increasing accuracy of labeling results.
Regarding claim 16, Pikuliak in view of Das discloses the computer-readable storage medium as claimed in claim 15, but does not specifically disclose: further comprising: sending a request to a crowdsourced work platform requesting a determination by one or more human analysts whether the first candidate label is representative of the first content item; receiving a response from the crowdsourced work platform indicative of whether the first candidate label is representative of the first content item.
Zhu teaches:
sending a request to a crowdsourced work platform requesting a determination by one or more human analysts on whether the first candidate label is representative of the first content item (Paragraph 0001, line 1-3,"The present invention relates generally to the field of crowdsourcing to select a correct answer from among multiple potential candidate answers"; Paragraph 0012, lines 1-5, "Some embodiments of the present invention may be directed to technology for selecting a correct answer (for example, a correct label for a data set to be used in machine learning algorithms) from among a plurality of candidate answers");
receiving a response from the crowdsourced work platform indicative of whether the first candidate label is representative of the first content item (Paragraph 0072, lines 1-3, "At operation S516, all remaining options are provided to the workers who will then select a correct option from them."; Paragraph 0059, lines 3-5, "In some embodiments, majority voting is used to determine best answer and to identify more difficult cases for the next phase.").
Zhu teaches using crowdsourcing to select a correct label from a data set to be used in machine learning algorithms in order to save time and increase accuracy of labeling results (Paragraph 0040, lines 1-10, "Processing proceeds to operation S280, where acceptance mod 312 outputs the accepted subset of candidate labels for further processing by a human or by some type of software algorithm. In this example, the accepted subset of candidate labels are sent through communication network 114 to client subsystem 112 which is used by a human expert. Because of the selective culling of the rejected labels, the human expert has fewer candidate labels to concern herself with when labelling. This can save time and or increase the accuracy of the results from the expert.").
Zhu is considered to be analogous to the claimed invention because it is in the same field of crowdsourcing training data evaluation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak in view of Das to incorporate the teachings of Zhu to use crowdsourcing to select a correct label from a data set to be used in machine learning algorithms.  Doing so would allow for saving time and increasing accuracy of labeling results.
Claims 3 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Pikuliak in view of Das and Zhu, and further in view of Liu et al. (US Patent No. 11,08,369), hereinafter Liu.
Regarding claim 3, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pikuliak in view of Das and Zhu discloses the data processing system as claimed in claim 2, but does not specifically disclose: wherein the computer-readable medium includes instructions configured to cause the processor to perform operations of discarding the first content item instead of generating the first training data and the second training data.
Liu teaches:
discarding the first content item instead of generating the first training data and the second training data (Column 10, lines 9-13, "The human annotators may provide scores associated with utterance tone, style, relevance, or any other property requested by a creator of a prompt or requester of a crowdsourcing task. Rules may be applied to discard utterances based on annotation scores").
Liu teaches using crowdsourcing to discard items based on annotation scores in order to improve the collection of high-quality training data (Column 2, lines 55 - Column 3, lines 4, "There exists a need for techniques that improve the collection of high-quality training data. Crowdsourcing can be used to collect training data. However, it is challenging to collect a large quantity of high-quality training data without substantial human involvement. For example, humans may be involved in reviewing grammar of utterances and relevance of utterances. Humans may also be involved in determining whether there is enough diversity in the utterances collected. Low-quality training data may result in poorly trained chatbots. Thus, there exists a need for techniques that improve the throughput of high-quality utterance sample collection by reducing human involvement. The techniques disclosed herein combine machine-assisted utterance collection with targeted human-in-the-loop elements, which increases the efficiency of utterance collection for the training of chatbots.").
Liu is considered to be analogous to the claimed invention because it is in the same field of crowdsourcing training data evaluation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak in view of Das and Zhu to incorporate the teachings of Liu to use crowdsourcing to discard items based on annotation scores.  Doing so would allow for improving the collection of high-quality training data.  
Regarding claim 17, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pikuliak in view of Das and Zhu discloses the computer-readable storage medium as claimed in claim 16, but does not specifically disclose: further comprising: discarding the first content item instead of generating the first training data and the second training data.
Liu teaches:
discarding the first content item instead of generating the first training data and the second training data (Column 10, lines 9-13, "The human annotators may provide scores associated with utterance tone, style, relevance, or any other property requested by a creator of a prompt or requester of a crowdsourcing task. Rules may be applied to discard utterances based on annotation scores").
Liu teaches using crowdsourcing to discard items based on annotation scores in order to improve the collection of high-quality training data (Column 2, lines 55 - Column 3, lines 4, "There exists a need for techniques that improve the collection of high-quality training data. Crowdsourcing can be used to collect training data. However, it is challenging to collect a large quantity of high-quality training data without substantial human involvement. For example, humans may be involved in reviewing grammar of utterances and relevance of utterances. Humans may also be involved in determining whether there is enough diversity in the utterances collected. Low-quality training data may result in poorly trained chatbots. Thus, there exists a need for techniques that improve the throughput of high-quality utterance sample collection by reducing human involvement. The techniques disclosed herein combine machine-assisted utterance collection with targeted human-in-the-loop elements, which increases the efficiency of utterance collection for the training of chatbots.").
Liu is considered to be analogous to the claimed invention because it is in the same field of crowdsourcing training data evaluation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak in view of Das and Zhu to incorporate the teachings of Liu to use crowdsourcing to discard items based on annotation scores.  Doing so would allow for improving the collection of high-quality training data.  
Claims 7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Pikuliak in view of Das, and further in view of Sun et al. (“MobileBERT: A Compact Task-Agnostic BERT for Resource-Limited Devices”), hereinafter Sun.
Regarding claim 7, Pikuliak in view of Das discloses the data processing system as claimed in claim 1, but does not specifically disclose: wherein the computer-readable medium includes instructions configured to cause the processor to perform operations of distilling a second pretrained multilingual NLP model from the first pretrained multilingual model; and installing the second pretrained multilingual NLP model on at least one client device.
Sun teaches:
distilling a second pretrained multilingual NLP model from the first pretrained multilingual model (Abstract, lines 7-9, "In this paper, we propose MobileBERT for compressing and accelerating the popular BERT model."; Section 3.6, lines 4-9, "In this strategy, we regard intermediate knowledge transfer as an auxiliary task for knowledge distillation. We use a single loss, which is a linear combination of knowledge transfer losses from all layers as well as the pre-training distillation loss."; The MobileBERT model reads on the second pretrained multilingual NLP model and the BERT model reads on the first pretrained multilingual model.);
and installing the second pretrained multilingual NLP model on at least one client device (Section 4.3, lines 16-20, "Besides, to verify the performance of Mobile-BERT on real-world mobile devices, we export the models with TensorFlow Lite5 APIs and measure the inference latencies on a 4-thread Pixel 4 phone with a fixed sequence length of 128."; The MobileBERT model reads on the second pretrained multilingual NLP model and the mobile devices read on the client device.).
Sun teaches using knowledge distillation to generate a compressed natural language model from a natural language model in order to generate a smaller and faster model for use on mobile devices (Section 5, lines 1-7, "We have presented MobileBERT which is a task agnostic compact variant of BERT. Empirical results on popular NLP benchmarks show that MobileBERT is comparable with BERTBASE while being much smaller and faster. MobileBERT can enable various NLP applications to be easily deployed on mobile devices.").
Sun is considered to be analogous to the claimed invention because it is in the same field of natural language models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak in view of Das to incorporate the teachings of Sun to use knowledge distillation to generate a compressed natural language model from a natural language model.  Doing so would allow for generating a smaller and faster model for use on mobile devices.
Regarding claim 20, Pikuliak in view of Das discloses the computer-readable storage medium as claimed in claim 15, but does not specifically disclose: further comprising: distilling a second pretrained multilingual NLP model from the first pretrained multilingual model; and installing the second pretrained multilingual NLP model on at least one client device.
Sun teaches:
distilling a second pretrained multilingual NLP model from the first pretrained multilingual model (Abstract, lines 7-9, "In this paper, we propose MobileBERT for compressing and accelerating the popular BERT model."; Section 3.6, lines 4-9, "In this strategy, we regard intermediate knowledge transfer as an auxiliary task for knowledge distillation. We use a single loss, which is a linear combination of knowledge transfer losses from all layers as well as the pre-training distillation loss."; The MobileBERT model reads on the second pretrained multilingual NLP model and the BERT model reads on the first pretrained multilingual model.);
and installing the second pretrained multilingual NLP model on at least one client device (Section 4.3, lines 16-20, "Besides, to verify the performance of Mobile-BERT on real-world mobile devices, we export the models with TensorFlow Lite5 APIs and measure the inference latencies on a 4-thread Pixel 4 phone with a fixed sequence length of 128."; The MobileBERT model reads on the second pretrained multilingual NLP model and the mobile devices read on the client device.).
Sun teaches using knowledge distillation to generate a compressed natural language model from a natural language model in order to generate a smaller and faster model for use on mobile devices (Section 5, lines 1-7, "We have presented MobileBERT which is a task agnostic compact variant of BERT. Empirical results on popular NLP benchmarks show that MobileBERT is comparable with BERTBASE while being much smaller and faster. MobileBERT can enable various NLP applications to be easily deployed on mobile devices.").
Sun is considered to be analogous to the claimed invention because it is in the same field of natural language models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak in view of Das to incorporate the teachings of Sun to use knowledge distillation to generate a compressed natural language model from a natural language model.  Doing so would allow for generating a smaller and faster model for use on mobile devices.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Pikuliak in view of Zhu.
Regarding claim 9, Pikuliak discloses the method as claimed in claim 8, further comprising: sending a request to a crowdsourced work platform requesting a determination by one or more human analysts on whether the first candidate label is representative of the first content item; and receiving a response from the crowdsourced work platform indicative of whether the first candidate label is representative of the first content item.
Zhu teaches:
sending a request to a crowdsourced work platform requesting a determination by one or more human analysts on whether the first candidate label is representative of the first content item (Paragraph 0001, line 1-3,"The present invention relates generally to the field of crowdsourcing to select a correct answer from among multiple potential candidate answers"; Paragraph 0012, lines 1-5, "Some embodiments of the present invention may be directed to technology for selecting a correct answer (for example, a correct label for a data set to be used in machine learning algorithms) from among a plurality of candidate answers");
and receiving a response from the crowdsourced work platform indicative of whether the first candidate label is representative of the first content item (Paragraph 0072, lines 1-3, "At operation S516, all remaining options are provided to the workers who will then select a correct option from them."; Paragraph 0059, lines 3-5, "In some embodiments, majority voting is used to determine best answer and to identify more difficult cases for the next phase.").
Zhu teaches using crowdsourcing to select a correct label from a data set to be used in machine learning algorithms in order to save time and increase accuracy of labeling results (Paragraph 0040, lines 1-10, "Processing proceeds to operation S280, where acceptance mod 312 outputs the accepted subset of candidate labels for further processing by a human or by some type of software algorithm. In this example, the accepted subset of candidate labels are sent through communication network 114 to client subsystem 112 which is used by a human expert. Because of the selective culling of the rejected labels, the human expert has fewer candidate labels to concern herself with when labelling. This can save time and or increase the accuracy of the results from the expert.").
Zhu is considered to be analogous to the claimed invention because it is in the same field of crowdsourcing training data evaluation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak to incorporate the teachings of Zhu to use crowdsourcing to select a correct label from a data set to be used in machine learning algorithms.  Doing so would allow for saving time and increasing accuracy of labeling results.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Pikuliak in view of Zhu, and further in view of Liu.
Regarding claim 10, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pikuliak in view of Zhu discloses the method as claimed in claim 9, further comprising: discarding the first content item instead of generating the first training data and the second training data.
Liu teaches:
discarding the first content item instead of generating the first training data and the second training data (Column 10, lines 9-13, "The human annotators may provide scores associated with utterance tone, style, relevance, or any other property requested by a creator of a prompt or requester of a crowdsourcing task. Rules may be applied to discard utterances based on annotation scores").
Liu teaches using crowdsourcing to discard items based on annotation scores in order to improve the collection of high-quality training data (Column 2, lines 55 - Column 3, lines 4, "There exists a need for techniques that improve the collection of high-quality training data. Crowdsourcing can be used to collect training data. However, it is challenging to collect a large quantity of high-quality training data without substantial human involvement. For example, humans may be involved in reviewing grammar of utterances and relevance of utterances. Humans may also be involved in determining whether there is enough diversity in the utterances collected. Low-quality training data may result in poorly trained chatbots. Thus, there exists a need for techniques that improve the throughput of high-quality utterance sample collection by reducing human involvement. The techniques disclosed herein combine machine-assisted utterance collection with targeted human-in-the-loop elements, which increases the efficiency of utterance collection for the training of chatbots.").
Liu is considered to be analogous to the claimed invention because it is in the same field of crowdsourcing training data evaluation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak in view Zhu to incorporate the teachings of Liu to use crowdsourcing to discard items based on annotation scores.  Doing so would allow for improving the collection of high-quality training data.  
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Pikuliak in view of Sun.
Regarding claim 14, as best understood based on the 35 U.S.C. 112(b) issues identified above, Pikuliak discloses the method as claimed in claim 8, further comprising: distilling a second pretrained multilingual NLP model from the first pretrained multilingual model; and installing the second pretrained multilingual NLP model on at least one client device.
Sun teaches:
distilling a second pretrained multilingual NLP model from the first pretrained multilingual model (Abstract, lines 7-9, "In this paper, we propose MobileBERT for compressing and accelerating the popular BERT model."; Section 3.6, lines 4-9, "In this strategy, we regard intermediate knowledge transfer as an auxiliary task for knowledge distillation. We use a single loss, which is a linear combination of knowledge transfer losses from all layers as well as the pre-training distillation loss."; The MobileBERT model reads on the second pretrained multilingual NLP model and the BERT model reads on the first pretrained multilingual model.);
and installing the second pretrained multilingual NLP model on at least one client device (Section 4.3, lines 16-20, "Besides, to verify the performance of Mobile-BERT on real-world mobile devices, we export the models with TensorFlow Lite5 APIs and measure the inference latencies on a 4-thread Pixel 4 phone with a fixed sequence length of 128."; The MobileBERT model reads on the second pretrained multilingual NLP model and the mobile devices read on the client device.).
Sun teaches using knowledge distillation to generate a compressed natural language model from a natural language model in order to generate a smaller and faster model for use on mobile devices (Section 5, lines 1-7, "We have presented MobileBERT which is a task agnostic compact variant of BERT. Empirical results on popular NLP benchmarks show that MobileBERT is comparable with BERTBASE while being much smaller and faster. MobileBERT can enable various NLP applications to be easily deployed on mobile devices.").
Sun is considered to be analogous to the claimed invention because it is in the same field of natural language models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pikuliak to incorporate the teachings of Sun to use knowledge distillation to generate a compressed natural language model from a natural language model.  Doing so would allow for generating a smaller and faster model for use on mobile devices.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657