DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is in response to the Request for Continued Examination filed on 15 June 2021. Claims 12-21 are pending and have been examined. The Applicants’ amendment and remarks have been carefully considered, but they moot in view of new grounds for rejection.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 29 July 2021 has been entered.
All previous objections and rejections directed to the Applicant’s disclosure and claims not discussed in this Office Action have been withdrawn by the Examiner.


Response to Amendments and Arguments
The examiner has carefully reviewed the applicant’s arguments and remarks, but they are moot in view of new grounds for rejection. The amendment which necessitates a new reference is “wherein the predetermined rule requires that n equals a first value for any input that is less than an input-length threshold, and n equals a second value inputs for any input that is greater than the input-length threshold, and wherein the second value is greater than the first value”. The examiner notes 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20140324434, hereinafter referred to as Vozila et al., in view of US 20120323557, hereinafter referred to as Koll et al., and further in view of further in view of WO 2014111959, hereinafter referred to as Gummadidala et al.

Regarding claim 12 (Currently Amended), Vozila et al. discloses a system having one or more computer processors, the system configured to: 

segment a data input into a sequential n-gram chunks based on a predetermined rule, wherein the data input is a speech to text transcription (“For example, computer 108 may generate a unigram language model, a bi-gram language model, a trigram language model, any other suitable n-gram language model, Vozila et al., para [0024]. And, “In some embodiments, the evaluation may be performed by generating a unigram language model for each of the groups of training data obtained at act 504 (i.e., each of the above-described "child" language models and the "parent" language model would be a unigram model). For example, unigram language models may be generated when process 500 is used to identify the first set of metadata attributes at act 304 of process 300 (during the first training stage). In this way, the amount computational of resources used to perform process 500 may be reduced. However, in other embodiments, any other suitable type of language model (e.g., an n-gram model for any value of n greater than or equal to two, a mixture of language models, etc.) may be generated as part of the evaluation performed at act 506,” Vozila et al., para [0075]. The type of n-gram (such as uni-gram) is a predetermined rule.),  

receive metadata regarding characteristics of the data input (“When a new voice utterance is obtained along with information about the context in which the new voice utterance was spoken (e.g., metadata at least partially identifying the context), the contextual information may be used to identify a corresponding cluster for the new voice utterance from among the multiple clusters,” Vozila et al., para [0016]. Also, “At act 306, a first set of one or more metadata attributes for clustering the training data instances is identified. In some embodiments, each of the attributes in the first set of metadata attributes may be identified based at least in part on the training data instances and the corresponding metadata attribute values,” Vozila et al., para [0041].); 
generate a language model based on the metadata (“Language models generated in accordance with embodiments described herein may be referred to as "metadata-dependent language models,” Vozila et al., para [0016].).

Vozila et al., though, does not disclose generating a first set of language model variations of the data input; training the language model based on at least the first set of language model variations; generating one or more alternatives for the data input using the trained language model; and transmitting an output comprising the one or more alternatives for the data input.

Koll et al. is cited to disclose generating a first set of language model variations of the data input (“As another example, alternative spoken forms of the same term may be learned and used to generate and/or modify a language model automatically. For example, if speech pronounced as "PEE AY" and "PENNSYLVANIA" are both dictated into the same text field to produce the textual recognition result "PA," the system may learn that both such examples of speech are alternative spoken of the text "PA," and generate and/or modify an applicable language model appropriately,” Koll et al., para [0047].); 

training the language model based on at least the first set of language model variations (“As another example, alternative spoken forms of the same term may be learned and used to generate and/or modify a language model automatically. For example, if speech pronounced as "PEE AY" and "PENNSYLVANIA" are both dictated into the same text field to produce the textual recognition result "PA," the system may learn that both such examples of speech are alternative spoken of the text "PA," and generate and/or modify an applicable language model appropriately,” Koll et al., para [0047].); 

generating one or more alternatives for the data input using the trained language model (“Even though using mistakes as training data may result in generation of incorrect output, it may result in generating the same wrong output consistently, thereby reducing output variability, and potentially enabling the incorrect output to be corrected using output correction techniques (e.g., spelling correction techniques),” Koll et al., para [0042]. See also Koll et al., para [0065].); and 

transmitting an output comprising the one or more alternatives for the data input (“Embodiments of the present invention have a variety of advantages. For example, embodiments of the present invention may be used to achieve higher speech recognition accuracy by observing usage patterns that are dependent upon the state of applications to which speech recognition output is provided, and by tailoring the language models that are used to generate such speech recognition output to those application states,” Koll et al., para [0066].). Koll et al. benefits Vozila et al. by providing improved techniques for enabling automatic speech recognition systems to interoperate with a wide variety of target applications in the various states of such applications easily and with high recognition accuracy (Koll et al., para [0007]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Vozila et al. with those of Koll et al. to enhance the application usage of the metadata-dependent language models of Vozila et al.

Neither nor Vozila et al., though, disclose wherein the predetermined rule requires that n equals a first value for any input that is less than an input-length threshold, and n equals a second value inputs for any input that is greater than the input-length threshold, and wherein the second value is greater than the first value.

Gummadidala et al. is cited to disclose wherein the predetermined rule requires that n equals a first value for any input that is less than an input-length threshold, and n equals a second value inputs for any input that is greater than the input-length threshold, and wherein the second value is greater than the first value (“The portion of the input pattern can be an initial portion of the input pattern. Thus, the string combination can correspond to a character string at the beginning of the candidate word. The string combination can comprise or consist of a bigram or a trigram. The bigram or trigram can be identified with reference to a start point of the input pattern and one or more subsequent comer points. The bigram or trigram list can be sorted offline or at runtime based on offline and learnt user input data. The method can comprise determining a length of the input pattern. The length of the input pattern can be measured from the input pattern or can be calculated, for example based on a character string derived from the input pattern. A threshold input pattern length can be calculated based on the measured input pattern length. The shortlist of words can be selected by removing words having an input pattern length less than a lower threshold input pattern length and/or an input pattern length greater than an upper threshold input pattern length. The input pattern length can be defined by an upper length threshold and/or a lower length threshold,” Gummadidala et al., highlighted portions of p. 2-3. The examiner notes that this teaching is equivalent to the wording of the applicant’s claim limitation. For example, if the measured input pattern length is three, then n=3 (i.e., a tri-gram) which corresponds to the applicant’s first value for an input. Thus, the measured input pattern length, n=3, is less than a lower threshold input pattern length set to 4 (i.e., what the applicant terms an input length threshold). Also, if another input pattern length is 5, then n=5, which corresponds to the applicant’s second value for an input. Thus, the measured input pattern length, n=5, is greater than an upper threshold input pattern length set to 4 (i.e., what the applicant terms an input length threshold). Hence, these input patterns (i.e., n-gram chunks) above or below an input-length threshold value of 4 are identified. Furthermore, the second value (n=5) is greater than the first value (n=3).). Gummadidala et al. benefits Vozila et al. by identifying candidate words based on the correlation between input and candidate patterns (Gummadidala et al., Abstract). Therefore, it would be obvious for one skilled in the art to combine the teachings of Vozila et al. with those of Gummadidala et al. to improve the identification metadata patterns within Vozila et al.
As to claim 17, CRM claim 17 and system claim 12 are related as system and CRM of using same, with each claimed element’s function corresponding to the system step. Accordingly claim 17 is similarly rejected under the same rationale as applied above with respect to system claim. Also, Vozila et al., para [0093] teaches a processor, memory, CRM, and instructions.

Regarding claim 13 (Original), Vozila et al., as modified by Koll et al. and Gummadidala et al., discloses the system of claim 12, wherein the generating the first set of language model variations is based on at least the metadata and the predetermined rule (“Accordingly, in some embodiments, training data comprising multiple training data instances may be obtained, each of the training data instances associated with one or more metadata attribute values. The training data instances may be clustered based on their respective values for a particular set of metadata attributes. In some embodiments, the particular set of metadata attributes used to cluster the training data may be identified automatically based at least in part on an evaluation of the training data instances and associated metadata attribute values. Once the clusters have been determined, a language model may be generated for each of the clusters of training data instances,” Vozila et al., para [0021]. And, as noted above, the type of n-gram (such as uni-gram) is a predetermined rule.).  
As to claim 18, CRM claim 18 and system claim 13 are related as system and CRM of using same, with each claimed element’s function corresponding to the system step. Accordingly claim 18 is similarly rejected under the same rationale as applied above with respect to system claim. Also, Vozila et al., para [0093] teaches a processor, memory, CRM, and instructions.

Claims 14-15 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20140324434, hereinafter referred to as Vozila et al., in view of US 20120323557, hereinafter referred to as Koll et al., further in view of WO 2014111959, hereinafter referred to as Gummadidala et al., and further in view of US 20060111907, hereinafter referred to as Mowatt et al.

Regarding claim 14 (Original), Vozila et al., as modified by Koll et al. and Gummadidala et al., discloses the system of claim 13, further configured to: 

in response to generating the first set of language model variations, generate a second set of language model variations using common user metadata (“(A) obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data; (B) identifying, by processing the language data using at least one processor, a set of one or more of the metadata attributes to use for clustering the instances of training data into a plurality of clusters; (C) clustering the training data instances based on their respective values for the identified set of metadata attributes into the plurality of clusters; and (D) generating a language model for each of the plurality of clusters,” Vozila et al., para [0004]. Here, the combinations of metadata attributes generate language model variations.).

Vozila et al., though, does not disclose creating prepended text and appended text for a subset of the first set and the second set of language model variations.  

Mowatt et al. is cited to disclose P201904765US0130creating prepended text and appended text for a subset of the first set and the second set of language model variations (“The method 300 further includes extracting pronunciations for each of the characters and/or character strings responsive to a predefined pronunciation dictionary for the speech recognition software application to create an alternative pronunciation dictionary of character pronunciation representations, as shown in operational block 306. For example, again consider the character "a", wherein the pronunciations for words starting in "a" are extracted from the pronunciation dictionary of the speech recognition software application being used for desktop dictation. Using this dictionary, the word "ARON" is found to have a character pronunciation representation of "ae r ax n" as shown in FIG. 4. For each of the characters and/or character strings in the predefined pronunciation dictionary, an alternative pronunciation may be created by prepending each character with its new Language Model token and by appending a long silence "sil", as shown in operational block 308. For example, consider the new Language Model token "a AsIn" and the word "ARON." Given the above relationship the pronunciation alternative would be represented by "ey AA1 ey ae z ih n ae r ax n sil", wherein "ey AA1 ey ae z ih n" is the prepended pronunciation for "a AsIn", "ae r ax n" is the pronunciation for "ARON" and "sil" is the appended long silence,” Mowatt et al., para [0028].). Mowatt et al. benefits Vozila et al. by creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary (Mowatt et al., para [0006]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Vozila et al. with those of Mowatt et al. to enhance the language models of Vozila et al.
As to claim 19, CRM claim 19 and system claim 14 are related as system and CRM of using same, with each claimed element’s function corresponding to the system step. Accordingly claim 19 is similarly rejected under the same rationale as applied above with respect to system claim. Also, Vozila et al., para [0093] teaches a processor, memory, CRM, and instructions.
Regarding claim 15 (Original), Vozila et al., as modified by Koll et al., Gummadidala et al., and Mowatt et al., discloses the system of claim 14, wherein the training the language model is further based on the second set of language model variations, the prepended text, and the appended text (“The method 300 further includes extracting pronunciations for each of the characters and/or character strings responsive to a predefined pronunciation dictionary for the speech recognition software application to create an alternative pronunciation dictionary of character pronunciation representations, as shown in operational block 306. For example, again consider the character "a", wherein the pronunciations for words starting in "a" are extracted from the pronunciation dictionary of the speech recognition software application being used for desktop dictation. Using this dictionary, the word "ARON" is found to have a character pronunciation representation of "ae r ax n" as shown in FIG. 4. For each of the characters and/or character strings in the predefined pronunciation dictionary, an alternative pronunciation may be created by prepending each character with its new Language Model token and by appending a long silence "sil", as shown in operational block 308. For example, consider the new Language Model token "a AsIn" and the word "ARON." Given the above relationship the pronunciation alternative would be represented by "ey AA1 ey ae z ih n ae r ax n sil", wherein "ey AA1 ey ae z ih n" is the prepended pronunciation for "a AsIn", "ae r ax n" is the pronunciation for "ARON" and "sil" is the appended long silence,” Mowatt et al., para [0028]. And, Mowatt et al., para [0016] and [0030] describe language model training/learning.).


Allowable Subject Matter
Claims 1-10 and 21 are allowed.
Claims 16 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. None of the prior art, alone or in combination, teach generating a plurality of template sentences for the subset of the first set and the second set of language model variations.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656