DETAILED ACTION
Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/08/2021 has been considered by the examiner. 
Drawings
The drawings are objected to because:
In Fig. 5, the dashed line, the dotted line, and the greyscale line in plots 502A-C are not labeled with reference numbers and not explained in the specification.
In Fig. 6, the dashed line, the dotted line, and the greyscale line in plots 602A-C are not labeled with reference numbers and not explained in the specification.
In Fig. 7, the dashed line and the dotted line in plots 702A-C are not labeled with reference numbers and not explained in the specification.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because in line 1, the reference refers to “of the present disclosure” which is an implied phrase that should not be included.  Correction is required.  See MPEP § 608.01(b).
The disclosure is objected to because of the following informalities:
In para. 0100, line 3, “306embodies” should read “306 embodies”
In para. 139, line 1, “902Ais” should read “902A is”
Appropriate correction is required.
Claim Objections
Claims 1, 12, and 19 are objected to because of the following informalities:
In claim 1, line 8, “token in token sequence” should be changed to “token in the token sequence”
In claim 12, line 6, “token in token sequence” should be changed to “token in the token sequence”
In claim 19, line 8, “token in token sequence” should be changed to “token in the token sequence”
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites an apparatus configured to execute computer-coded instructions, so that for each data sequence of a plurality of data sequences (e.g., sentences in a plurality of sentences), each data sequence comprising a token sequence (e.g., words in a sentence), calculating a perplexity value set of perplexity values for each token (e.g., word), and then generating a probabilistic ranking set for the plurality of data sequences, based on one sequence arrangement metric (e.g., algorithm for scoring/sorting/ranking the data sequences) and the perplexity value, and then generating an arrangement of the plurality of data sequences based on the probabilistic ranking set (e.g., sorting/ranking and presenting the data sequences as an output array).  Under the broadest reasonable interpretation, these limitations cover performance of the limitations in the human mind with the assistance of physical aids (e.g., pen and paper), but for the recitation of generic computer components.  That is, other than reciting “processor”, “memory”, and “computer-coded instructions”, nothing in these claim limitations precludes the steps from practically being performed in the mind.  For example a human could read a plurality of data sequences (e.g., sentences on a piece of paper, where each sentence is made up of individual words), calculate using a language model (e.g., a paper list of potential next-words, with associated probabilities) a perplexity value (e.g., a probability function that can be calculated using pen and paper or in the human mind), then taking each sentence on the piece of paper, using the calculated perplexities to mentally (or using pen and paper) another score or rank for each sentence, and then re-ordering the sentences on a separate piece of paper according to the score/rank determined for each sentence.
The judicial exception is not integrated into a practical application. In particular, the claim only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of performing calculations and displaying/sorting information) such that they amount to no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic computer components amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Claim 1 is not patent eligible.
Claim 2 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 2 further recites outputting the arrangement of the plurality of data sequences (e.g., a sorted list of sentences) to a client device.  None of the additional limitations recited in claim 2 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 2: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Moreover, outputting data to a device is merely post-solution activity. Claim 2 is not patent eligible.
	Claim 3 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 3 further recites identifying based on the arrangement of the plurality of data sequences, at least one invalid sequence (e.g., mentally reading a sorted list of sentences and identifying the worst-ranked sentences as being invalid).  None of the additional limitations recited in claim 3 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 3: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 3 is not patent eligible.
Claim 4 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 4 further recites excluding at least one data sequence from the plurality of data sequences based on the arrangement of the plurality of data sequences (e.g., when presenting or sorting a list of sentences on a different sheet of paper, excluding the sentences that are ranked lower than a mentally-determined threshold).  None of the additional limitations recited in claim 4 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 4: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 4 is not patent eligible.
Claim 5 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 5 further recites that the claimed language model is trained on a domain-specific set of language training data.  The examiner notes that in para. 0049 of the instant specification, the applicant states that the “term ‘language model’ refers to a statistical, algorithmic, and/or machine learning model trained to generate probabilities for tokens in a data sequence given the context surrounding the token based on the remaining tokens.”  Under the broadest reasonable interpretation of this definition, “language model” may be a purely statistical or algorithmic model which may similarly be performed entirely in the human mind or using a pen and paper, and this limitation could be met by a human training their mind/model using domain-specific data, such as reading books in a library on a particular topic and noting which words likely follow other words.  None of the limitations in claim 5: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 5 is not patent eligible.
Claim 6 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 6 further recites a particular algorithm, using average sequence perplexity, for generating an arrangement of the data sequences.  Calculating perplexity-based values using statistical mean calculations can be performed entirely in the human mind or with a pen and paper. None of the additional limitations recited in claim 6 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 6: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 6 is not patent eligible.
Claim 7 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 7 further recites a particular algorithm, using area violating thresholds, for generating an arrangement of the data sequences.  Calculating perplexity-based values using area thresholds (e.g., area under the curve) can be performed entirely in the human mind or with a pen and paper. None of the additional limitations recited in claim 7 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 7: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 7 is not patent eligible.
Claim 8 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 8 further recites a particular algorithm, using bucket-based sequence perplexity, for generating an arrangement of the data sequences.  Calculating perplexity-based values using buckets (e.g., sorting sentences into buckets and then weighting the buckets) can be performed entirely in the human mind or with a pen and paper. None of the additional limitations recited in claim 8 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 8: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 8 is not patent eligible.
Claim 9 depends from claim 8 and does not remedy the deficiencies of claim 8 and is therefore rejected under the same grounds as claim 8 above.  Claim 9 further recites a particular equation for determining a probabilistic ranking set.  Performing an equation can be done entirely in the human mind or with a pen and paper, and the examiner further notes that a mathematical equation is merely another form of abstract idea. None of the additional limitations recited in claim 9 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 9: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 9 is not patent eligible.
Claim 10 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 10 further recites that the language model is language agnostic and direction agnostic.  The examiner notes that in para. 0049 of the instant specification, the applicant states that the “term ‘language model’ refers to a statistical, algorithmic, and/or machine learning model trained to generate probabilities for tokens in a data sequence given the context surrounding the token based on the remaining tokens.”  Under the broadest reasonable interpretation of this definition, “language model” may be a purely statistical or algorithmic model which may similarly be performed entirely in the human mind or using a pen and paper, and this limitation could be met by a human that is multi-lingual and can understand sentences by reading backwards and forwards.  None of the additional limitations recited in claim 10 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 10: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 10 is not patent eligible.
Claim 11 depends from claim 1 and does not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 11 further recites that the language model is trained using training data collected from one or more external computing devices associated with the language domain.  The examiner notes that in para. 0049 of the instant specification, the applicant states that the “term ‘language model’ refers to a statistical, algorithmic, and/or machine learning model trained to generate probabilities for tokens in a data sequence given the context surrounding the token based on the remaining tokens.”  Under the broadest reasonable interpretation of this definition, “language model” may be a purely statistical or algorithmic model which may similarly be performed entirely in the human mind or using a pen and paper, and this limitation could be met by a human that trains his/her mind using sources from a language domain (e.g., English) that are accessed using an external source (e.g., a library).  The examiner further notes that the claimed “external computing devices” are just further examples of generic computing components. None of the additional limitations recited in claim 11 amount to anything more than the same or a similar abstract idea as recited in claim 1.    Nor do any limitations in claim 10: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Moreover, generating a training set is mere pre-solution activity. Claim 11 is not patent eligible.
Claim 12 claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 1, and is therefore rejected on the same grounds as claim 1 above.
Claim 13 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 2, and is therefore rejected on the same grounds as claim 2 above.
	Claim 14 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 3, and is therefore rejected on the same grounds as claim 3 above.
Claim 15 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 6, and is therefore rejected on the same grounds as claim 6 above.
Claim 16 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 7, and is therefore rejected on the same grounds as claim 7 above.
Claim 17 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 8, and is therefore rejected on the same grounds as claim 8 above.
Claim 18 depends from claim 17 and claims is a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 9, and is therefore rejected on the same grounds as claim 9 above.
Claim 19 claims a computer program product comprising a non-transitory computer-readable storage medium having computer program code, where the limitations correspond to the computer-coded instructions executed by the apparatus of claim 1, and are therefore rejected on the same grounds as claim 1 above.
Claim 20 depends from claim 19 and claims a computer program product that corresponds to the computer-coded instructions executed by the apparatus of claim 2, and is therefore rejected on the same grounds as claim 2 above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 11-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al., U.S. Patent Application Publication 2021/0056266 A1, hereinafter referenced as MA in view of Gao, Jianfeng, et al. "Toward a unified approach to statistical language modeling for Chinese." ACM Transactions on Asian Language Information Processing (TALIP) 1.1 (2002): pp. 3-33, hereinafter referenced as GAO.

Regarding claim 1, MA discloses:
An apparatus comprising at least one processor and at least one memory, the at least one memory having computer-coded instructions stored thereon, wherein the computer-coded instructions, in execution with the at least one processor, configures the apparatus to: (Figs. 1-3, sentence generation apparatus 200 implemented on smart device 3, includes processor 302 and storage 301, where storage 301 includes computer program instructions executable by processor 302 to perform the method of the Fig. 1 flow chart; paras. 0010, 0063, 0093 and 0094)
for each data sequence of a plurality of data sequences, (Fig. 1, step 102, find structurally similar sentences to the input sentence at step 101, and step 103, find semantically similar sentences to the input sentence at step 101; paras. 0013, 0014, 0024-0026) each data sequence comprising a token sequence: (Fig. 1, step 106, each sentence is represented as a sum of word vectors; para. 0049; a word vector is described as a word token; para. 0059)
calculate, utilizing a language model, a perplexity value set associated with the data sequence, (Fig. 1, step 106 F1, calculating a perplexity of a sentence based on a trained language model, e.g., a LSTM language model and a preset perplexity calculation formula (see below); paras. 0054-0056, 0059;

    PNG
    media_image1.png
    88
    187
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    61
    168
    media_image2.png
    Greyscale

where PP = preset perplexity, M = number of words in the sentence, p(wi) = probability of i-th word in the sentence obtained from the language model; para. 0059)
generate a probabilistic ranking set for the plurality of data sequences, the probabilistic ranking set including a probabilistic ranking for each data sequence in the plurality of data sequences, (Fig. 1, step 106 F2, sentences are sorted in ascending order based on the calculated perplexity, where perplexity is a function of the i-th word probability of the next-appearing word based on the language model; paras. 0056, 0057, 0059; Fig. 1, step 106 F3, the top X sentences are retained, e.g., a perplexity threshold X, where X may be a preset positive integer, or could be dynamically determined proportionally to the number of sentences in the sorted list, and can also be changed in any manner based on user demand; para. 0059) and the probabilistic ranking set generated based on at least one sequence arrangement metric (the perplexity threshold X discussed above with respect to this claim 1; para. 0059) and the perplexity value set for each data sequence of the plurality of data sequences; and (Fig. 1, step 106 F2, sentences are sorted in ascending order based on the calculated perplexity, with respect to the perplexity threshold X; para. 0057, 0059)
generate an arrangement of the plurality of data sequences based on the probabilistic ranking set. (Fig. 1, step 106 F2, sentences are sorted in ascending order based on the calculated perplexity, with respect to the perplexity threshold X; para. 0057, 0059)

	However, MA fails to explicitly teach:
wherein the perplexity value set comprises a perplexity value for each data token in token sequence of the data sequence; and

However, in a related field of endeavor, GAO pertains to statistical language modeling that utilizes perplexity for evaluating the model.  (GAO, p. 5, section 1.2, p. 16, section 5.2).  In particular, GAO describes word-based and cluster-based (e.g., chunks of words, such as sentences or paragraphs) techniques for relating perplexities and for generating model perplexities based on word and cluster perplexities.  (GAO, p. 16, section 5.2).  


	The combination of MA in view of GAO makes obvious:
calculate, utilizing a language model, a perplexity value set associated with the data sequence, wherein the perplexity value set comprises a perplexity value for each data token in token sequence of the data sequence; and (the preset perplexity calculation in MA performed on an entire sentence is now performed on each individual word as disclosed in GAO, where the sentence-level perplexity may be a function of word-level perplexities as set forth in GAO; MA, paras. 0054-0056, 0059 with GAO, p. 16, section 5.2; the examiner notes that MA already calculates the sentence-level perplexity in view of the individual probabilities of each word in the sentence obtained from the language model, so calculating individual perplexities using these individual probabilities would be straightforward).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of GAO with MA to relate word-level perplexities to a sentence-level, or cluster-level, perplexity.  As disclosed in GAO, one of ordinary skill may be motivated to use the techniques of GAO in order to use the word-level and cluster-level perplexities in different combinations to form different models, such as for pruning a large search space.  (GAO, p. 16, section 5.2).  Further, as disclosed in GAO, one of ordinary skill would be motivated to combine the teachings of GAO with MA to use perplexity-based techniques to optimize the training set for the language model and for reducing language model size.  (GAO, p. 32, section 7).
	The examiner notes that MA discloses that the perplexity threshold X may be dynamically modified based on changes on demands by the user, so one of ordinary skill would understand that such perplexity threshold X may be determined using the various word-level and cluster-level perplexity techniques disclosed in GAO.  (MA, para. 0059)

Regarding claim 2, MA in view of GAO discloses the apparatus of claim 1.  MA further discloses:
provide the arrangement of the plurality of data sequences to a client device for outputting. (new generated sentences are applied to intelligent products that interact with a user, such as a robot, smart phone, or tablet computer, so that ; paras. 0009, 0061, 0092, 0142; multiple similar questions may be generated that coincide with an input question, e.g., during human-computer interactions; para. 0003; new sentences are sorted and sentences satisfying the perplexity threshold are retained; para. 0059)

Regarding claim 3, MA in view of GAO discloses the apparatus of claim 1.  MA further discloses: 
identify, based on the arrangement of the plurality of data sequences, at least one invalid data sequence from the plurality of data sequences. (the greater the perplexity, the lower the degree of fluency of the sentence; paras. 0055, 0059; after sorting the sentences  in ascending order based on perplexity, the sentences below the perplexity threshold X, e.g., invalid sentences, are generally believed to have lower fluency and are filtered out and not retained; paras. 0059, 0060).

Regarding claim 4, MA in view of GAO discloses the apparatus of claim 1.  MA further discloses:
exclude at least one data sequence from the plurality of data sequences based on the arrangement of the plurality of data sequences. (the greater the perplexity, the lower the degree of fluency of the sentence; paras. 0055, 0059; after sorting the sentences in ascending order based on perplexity, the sentences below the perplexity threshold X are generally believed to have lower fluency and are filtered out and not retained; paras. 0059, 0060).

Regarding claim 5, MA in view of GAO discloses the apparatus of claim 1.  MA does not explicitly teach:
wherein the language model is trained on a domain- specific set of language training data. 

	However, as set forth above, GAO is in a related field of endeavor.  The combination of MA in view of GAO makes obvious:
wherein the language model is trained on a domain- specific set of language training data. (GAO discloses, in the context of Chinese language data, using domain-specific training sets including general newspapers, science-tech newspapers, literature, and books; GAO, p. 17, Table 1, p. 18, section 6.1; in combination with MA, the trained language model (MA para. 0059) may be trained using a domain-specific training set as disclosed in GAO)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of GAO related to using domain-specific sources to train a model in MA.  As disclosed in GAO, one of ordinary skill would be motivated to do so in order to achieve better lexicons, better segmentation, and better language models based on building a large data corpus balanced among several domains, styles, and times.  (GAO, p. 18, sections 6.1 and 6.2). Further, as disclosed in GAO, one of ordinary skill would be motivated to combine the teachings of GAO with MA to use perplexity-based techniques to optimize the training set for the language model and for reducing language model size.  (GAO, p. 32, section 7).

Regarding claim 11, MA discloses the apparatus of claim 1.  MA does not explicitly teach:
collect a set of training data sequences associated with a language domain, wherein the set of training data sequences is collected from one or more external computing devices associated with the language domain; and 
train the language model based on the set of training data. 

However, as set forth above, GAO is in a related field of endeavor.  The combination of MA in view of GAO makes obvious:
collect a set of training data sequences associated with a language domain, wherein the set of training data sequences is collected from one or more external computing devices associated with the language domain; and (GAO discloses, in the context of Chinese language data, using training sets including general newspapers, science-tech newspapers, literature, and books, collected from domains such as filtered web data and raw web data from Chinese websites); GAO, p. 18, section 6.1; training data is segmented into clusters or training chunks; GAO, p. 12, section 4.1.1; in combination with MA, the trained language model (MA para. 0059) may be trained using a training set as disclosed in GAO)
train the language model based on the set of training data. (GAO discloses training a model using 1.6 billion characters of training data; p. 28, section 6.5; in combination with MA, the trained language model (MA para. 0059) may be trained using a training set as disclosed in GAO)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of GAO related to using domain-specific sources to train a model in MA.  As disclosed in GAO, one of ordinary skill would be motivated to do so in order to achieve better lexicons, better segmentation, and better language models based on building a large data corpus using both filtered and raw web data, with mixed-quality.  (GAO, p. 18, sections 6.1 and 6.2). Further, as disclosed in GAO, one of ordinary skill would be motivated to combine the teachings of GAO with MA to use perplexity-based techniques to optimize the training set for the language model and for reducing language model size.  (GAO, p. 32, section 7).

	Claim 12 claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 1, and is therefore rejected on the same grounds as claim 1 above.
Claim 13 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 2, and is therefore rejected on the same grounds as claim 2 above.
	Claim 14 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 3, and is therefore rejected on the same grounds as claim 3 above.

Regarding claim 19, MA discloses:
A computer program product comprising at least one non-transitory computer-readable storage medium having computer program code stored thereon, the computer program code, in execution with at least one processor, configured for: for each data sequence of a plurality of data sequences, each data sequence comprising a token sequence:
The remaining limitations in claim 19 correspond to the computer-coded instructions executed by the apparatus of claim 1, and are therefore rejected on the same grounds as claim 1 above.

Claim 20 depends from claim 19 and claims a computer program product that corresponds to the computer-coded instructions executed by the apparatus of claim 2, and is therefore rejected on the same grounds as claim 2 above.

Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over MA in view of GAO and further in view of Bouamor et al., U.S. Patent Application Publication 2020/0302023 A1, hereinafter referenced as BOUAMOR.

Regarding claim 6, MA in view of GAO discloses the apparatus of claim 1, including the limitation wherein to generate the probabilistic ranking set for the plurality of data sequences based on at least one sequence arrangement metric and the perplexity value set for each data sequence, the apparatus is configured to (see claim 1).  However, MA does not explicitly teach:
generate an average sequence perplexity value set including an average sequence perplexity value for each data sequence of the plurality of data sequences by, for each data sequence of the plurality of data sequences:
determining the average sequence perplexity value for the data sequence, wherein the average sequence perplexity value represents a mean value based on the perplexity value for each data token in the token sequence of the data sequence; and 
generate the probabilistic ranking set based on the average sequence perplexity value set.

However, in a related field of endeavor, BOUAMOR pertains to a model for generating natural language text (e.g., sentences) from structured data using a fusion model.  (paras. 0005, 0014).  BOUAMOR discloses a technique for measuring the fluency and grammaticality of a generated sentence in terms of its perplexity, where fluency scores are normalized by the mean perplexity of the training set.  (para. 0111).  Normalizing fluency scores is used to determine if fluency is above or below a threshold, e.g., 100, where scores below the threshold are grammatically coherent.  (paras. 0111, 0113).

The combination of MA in view of GAO and BOUAMOR makes obvious:
generate an average sequence perplexity value set including an average sequence perplexity value for each data sequence of the plurality of data sequences by, for each data sequence of the plurality of data sequences: (BOUAMOR discloses generating an average perplexity for a training set and a normalized perplexity for each generated sentence; BOUAMOR, para. 0111; BOUAMOR in combination with MA and GAO: the preset perplexity calculation in MA performed on an entire sentence is now performed on each individual word as disclosed in GAO and BOUAMOR, where the word-level perplexities are averaged as taught by BOUAMOR to generate an average sentence-level perplexity for each sentence, which can be compared to a threshold as disclosed in both MA and BOUAMOR; MA, paras. 0054-0056, 0059 with GAO, p. 16, section 5.2 and BOUAMOR, para. 0111; sentences are now sorted in ascending order based on calculated average perplexity as discussed below with respect to this claim 6, with respect to the perplexity threshold X; MA, para. 0057, 0059).
determining the average sequence perplexity value for the data sequence, wherein the average sequence perplexity value represents a mean value based on the perplexity value for each data token in the token sequence of the data sequence; and (BOUAMOR discloses generating an average perplexity for a training set and a normalized perplexity for each generated sentence; BOUAMOR, para. 0111; BOUAMOR in combination with MA and GAO: the preset perplexity calculation in MA performed on an entire sentence is now performed on each individual word as disclosed in GAO and BOUAMOR, where the word-level perplexities are averaged as taught by BOUAMOR to generate an average sentence-level perplexity for each sentence, which can be compared to a threshold as disclosed in both MA and BOUAMOR; MA, paras. 0054-0056, 0059 with GAO, p. 16, section 5.2 and BOUAMOR, para. 0111)
generate the probabilistic ranking set based on the average sequence perplexity value set. (MA discloses: Fig. 1, step 106 F2, sentences are sorted in ascending order based on the calculated perplexity, with respect to the perplexity threshold X; para. 0057, 0059; MA in combination with GAO and BOUAMOR: sentences are now sorted in ascending order based on calculated average perplexity as discussed above with respect to this claim 6, with respect to the perplexity threshold X; MA, para. 0057, 0059).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of BOUAMOR, particularly the teachings of calculating average perplexity and normalizing sentences to measure fluency and grammatical correctness of generated sentences against a threshold, with MA and GAO, to determine a threshold for determining if a generated sentence in MA is fluent.  As disclosed in BOUAMOR, one of ordinary skill would be motivated to utilize the teachings of BOUAMOR to take advantage of the average perplexity for the entire training set, which may be a domain-specific training set (e.g., investment rules, BOUAMOR, para. 0110), which can be used to normalize generated sentences to determine such sentence’s fluency.  (BOUAMOR, para. 0111).  One of ordinary skill would further be motivated to utilize the teachings of BOUAMOR because the fusion model disclosed in BOUAMOR outperforms traditional approaches for generating sentences from data.  (BOUAMOR, para. 0112)
	The examiner notes that MA discloses that the perplexity threshold X may be dynamically modified based on changes on demands by the user, so one of ordinary skill would understand that such perplexity threshold X may be determined using the average perplexity for the training set as discussed with respect to BOUAMOR.  (MA, para. 0059)

Claim 15 is a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 6, and is therefore rejected on the same grounds as claim 6 above.

Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over MA in view of GAO and further in view of Andreoli et al., U.S. Patent Application Publication 2016/0124944 A1, hereinafter referenced as ANDREOLI.

Regarding claim 7, MA in view of GAO discloses the apparatus of claim 1, including the limitation wherein to generate the probabilistic ranking set for the plurality of data sequences based on at least one sequence arrangement metric and the perplexity value set for each data sequence the apparatus is configured to (as claimed in claim 1).  However, MA fails to explicitly teach:
generate an area violating threshold value set including an area violating threshold value for each data sequence of the plurality of data sequences by, for each data sequence of the plurality of data sequences:
determining the area violating threshold value for the data sequence, wherein the area violating threshold value is based on the perplexity value set for the data sequence and an unacceptable perplexity threshold; and 
generating the probabilistic ranking set based on the area violating threshold value set.

However, in a related field of endeavor, ANDREOLI pertains to a system and method for predicting the quality of a machine-translated document.  (para. 0013).  Precision of the translations was compared using an area under the curve technique on quality scores for individual sentences.  (para. 0095).

The combination of MA in view of GAO and further in view of ANDREOLI makes obvious:
generate an area violating threshold value set including an area violating threshold value for each data sequence of the plurality of data sequences by, for each data sequence of the plurality of data sequences: (ANDREOLI discloses generating an average precision value for individual sentences using an area under the curve technique; ANDREOLI, para. 0095; ANDREOLI in combination with MA and GAO: the preset perplexity calculation in MA is further used to generate an average precision value, e.g., area violating threshold value, using the area under the curve technique of ANDREOLI, where the perplexity threshold in MA is supplemented with, or replaced by, a threshold where the average precision value is compared to such threshold, e.g., unacceptable perplexity threshold, to determine if the sentence should be retained, as disclosed in MA; MA, paras. 0054-0056, 0059; sentences are now sorted in ascending order based on average precision value and according to the threshold as discussed above with respect to this claim 7; MA, para. 0057, 0059).
determining the area violating threshold value for the data sequence, wherein the area violating threshold value is based on the perplexity value set for the data sequence and an unacceptable perplexity threshold; and (ANDREOLI discloses generating an average precision value for individual sentences using an area under the curve technique; ANDREOLI, para. 0095; ANDREOLI in combination with MA and GAO: the preset perplexity calculation in MA is further used to generate an average precision value, e.g., area violating threshold value, using the area under the curve technique of ANDREOLI, where the perplexity threshold in MA is supplemented with, or replaced by, a threshold where the average precision value is compared to such threshold, e.g., unacceptable perplexity threshold, to determine if the sentence should be retained, as disclosed in MA; MA, paras. 0054-0056, 0059).
generating the probabilistic ranking set based on the area violating threshold value set. (MA discloses: Fig. 1, step 106 F2, sentences are sorted in ascending order based on the calculated perplexity, with respect to the perplexity threshold X; para. 0057, 0059; MA in combination with GAO and ANDREOLI: sentences are now sorted in ascending order based on average precision value and according to the threshold as discussed above with respect to this claim 7; MA, para. 0057, 0059).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of ANDREOLI, particularly the teachings of calculating average precision of a sentence using an area under the curve technique, with MA and GAO, to determine a threshold for comparing generated sentences. As disclosed in ANDREOLI, one of ordinary skill would be motivated to utilize the teachings of ANDREOLI to take advantage of the average precision calculated using a baseline of a small collection of document-level estimations to normalize the generated sentences. (ANDREOLI, para. 0095).  One of ordinary skill would further be motivated to utilize the teachings of ANDREOLI to evaluate the quality of a plurality of sentences in context (e.g., an entire message or document) using the quality of the individual sentences, without the need for a training set that is annotated at both sentence-level and message-level/document-level.  (ANDREOLI, para. 0009)
	The examiner notes that MA discloses that the perplexity threshold X may be dynamically modified based on changes on demands by the user, so one of ordinary skill would understand that such perplexity threshold X may be supplemented by the average precision values as discussed above with respect to ANDREOLI.  (MA, para. 0059)

Claim 16 is a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 7, and is therefore rejected on the same grounds as claim 7 above.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over MA in view of GAO and further in view of Saha, Punyajoy, et al. "Hatemonitors: Language agnostic abuse detection in social media." arXiv preprint arXiv:1909.12642 (2019) pp. 1-8, hereinafter referenced as SAHA.

Regarding claim 10, MA in view of GAO discloses the apparatus of claim 1.  However, MA does not explicitly teach:
wherein the language model is language agnostic and direction agnostic.

However, in a related field of endeavor, SAHA pertains to a language agnostic system and method for detecting hate speech in social media, such as twitter.  SAHA notes that the majority of research on hate speech and offensive language is in the English language, but there is a need to classify abusive language in other languages (p. 2, section 2).  SAHA creates a language model utilizing the multilingual Bidirectional Encoder Representations from Transformers (BERT), which is a language model that can be used with 104 languages and is bi-directional, e.g., looks at both right and left context.  (p. 3, section 4).  SAHA further discloses the LASER sentence embeddings from Facebook, which are language agnostic. (p. 3, section 4).  

The combination of MA in view of GAO and SAHA makes obvious:
wherein the language model is language agnostic and direction agnostic. (SAHA discloses the multi-lingual BERT language model, which is bi-directional, e.g., looks at left and right context, and can be used with 104 languages; SAHA, p. 3, section 4; in combination with MA, the trained language model may be a fine-tuned version of multi-lingual BERT, or may be supplemented with multi-lingual BERT; MA para. 0059; the examiner notes that in the instant specification, the only disclosure about the language model being language and direction agnostic is in para. 0013, the examples in the specification are in the English language only, and directionality is described as left-to-right or right-to-left in para. 0108; therefore the broadest reasonable interpretation of “agnostic” includes not taking a preference of two or more options or positions)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of SAHA with MA and GAO, particularly the teachings in SAHA about using a language agnostic language model that is bi-directional.  As disclosed in SAHA, one would be motivated to use the same language model for multiple languages, such as to detect hate speech and offensive language, which is not limited to the English language.  (SAHA, p. 2, section 2).  As disclosed in SAHA, one would be motivated to use a bi-directional language model such as BERT to process natural language to predict words in a sentence, where bi-directionality is used for the left-to-right context and right-to-left context. (SAHA, p. 3, section 4).  

Allowable Subject Matter
Claims 8, 9, 17, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and if amended to overcome the rejections under 35 U.S.C. 101 set forth above.
Claim 8 would be allowable because the following limitations are not obvious in view of the combination of MA and GAO:
generate a bucket-based sequence perplexity value including a bucket-based sequence perplexity values for each data sequence of the plurality of data sequences by, for each data sequence of the plurality of data sequences: 
determining an unacceptable bucket token count associated with the data sequence; and determining the bucket-based sequence perplexity values for the data sequence based at least on the unacceptable bucket token count associated with the data sequence; and 
generate the probabilistic ranking set based on the bucket-based sequence perplexity value.

	Claim 17 depends from claim 12 and claims a computer-implemented method that corresponds to the computer-coded instructions executed by the apparatus of claim 8, and therefore would be allowable for the reasons set forth above with respect to claim 8.
Claims 9 and 18 would be allowable because they depend on claims 8 and 17, respectively.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 11392773 B1 (Gangadharaiah et al. discloses a goal-oriented conversational training data generation system.  Perplexity scores are used to evaluate generated sentences.  (col. 6, line 29).
US 11340923 B1 (Kapoor) discloses a heuristic-based messaging generation and testing system and method.  Messages can be provided by a ranked order across a plurality of buckets.  (col. 7, lines 12-27).
US 20180365220 A1 (Chakraborty) discloses a method for ranking and summarizing natural language passages.  A recurrent neural network may be trained to utilize perplexity.  (paras. 0068, 0069, 0075, 0076).
US 20160125751 A1 (Barker) discloses an automated question-and-answer environment (QA), where the QA system is designed to receive input questions, analyze such questions, and then return applicable answers, where potential answers are given a confidence score and assigned to one of a plurality of confidence buckets using thresholds.  (paras. 0003, 0053).  Bucket thresholds may also be dynamically determined based on answer confidence scores, e.g., to adjust due to relative strength of answers, to capture relative confidence within a framework of a standard of confidence.  (para. 0061).
US 10019577 (Ladikov) discloses evaluating unknown applications executable on a computer to determine if such applications are malicious and/or permissions for users on the network to use such applications (e.g., whitelisting the application).  (col. 1, lines 18-54).  Unknown applications may be assigned a criticality score, using a power series equation, to determine potential risks associated with the application. (col. 9, lines 45-65).
Toral, Antonio, et al. "Linguistically-augmented perplexity-based data selection for language models." Computer Speech & Language 32.1 (2015): 11-26.  Discloses perplexity-based data selection to train language models for the English, Spanish, Czech, and Chinese languages.
Sethy, Abhinav, et al. "An iterative relative entropy minimization-based data selection approach for n-gram model adaptation." IEEE transactions on audio, speech, and language processing 17.1 (2009): 13-23.  Discloses filtering text to remove artifacts using a perplexity histogram as part of pre-filtering.  (p. 16, section IV)
Akhtar, Md Shad, et al. "Language-agnostic model for aspect-based sentiment analysis." Proceedings of the 13th International Conference on Computational Semantics-Long Papers. 2019, pp. 1-11.  Discloses a language-agnostic model, using bi-directional transformers, for analyzing text for sentiment.
Huang, Jian, et al. "Exploring web scale language models for search query processing." Proceedings of the 19th international conference on World wide web. 2010, pp. 1-10.  Discloses using a perplexity metric with n-gram language models when evaluating search queries.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL C. LEE/Examiner, Art Unit 2655                                                         /JESSE S PULLIAS/                                                                                                 Primary Examiner, Art Unit 2655