DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed April 6, 2021 (herein “Amendment”), with respect to the objections to claims 1-15 have been fully considered and are persuasive.  The objections to claims 1-15 have been withdrawn. 
Applicant's arguments in the Amendment regarding the rejection of claims 1-15 under 35 U.S.C. 103 have been fully considered but they are not persuasive. Specifically, Applicant argues on pages 6-7 of the Amendment that the portions of Hu relied upon to reject the claimed limitation of “including at least one character that is a non-letter character,” are not supported by the priority provisional application of Hu and hence, do not qualify as prior art. Applicant contends that because the priority provisional of Hu does not include the text of para. 36, cited in the rejection, that the priority date of the provisional cannot be given to the subject matter relied upon from para. 36. However, as discussed in detail below, the priority provisional of Hu does provide supporting disclosure to the extent the teachings of Hu in the specific portions of para 36 were relied upon in the rejection. 
Hu was relied upon to provide teachings of text data “including at least one character that is a non-letter character,” from claim 1. In particular, Hu, para. 36 was cited and the accompanying rejection rationale was to clarify that the portion of para 36 
Accordingly, the predictive network of Hu, as it is disclosed in the provisional application, not only discloses receiving input text data such as an out-of-lexicon word “$9.95” (which includes the “$” and “.” symbols which are “non-letter characters”), but also processes this word to provide an output through the output unit. It is noted though that Hu was only relied upon for the teachings of the input text containing the non-letter character. 
Therefore, in view of the above, while Applicant’s arguments have been fully considered, they are not persuasive, and the rejection in view of Hu is maintained.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-19 are rejected under 35 U.S.C. 103 as being unpatentable over Adams et al., (US 10,339,920 B2, herein “Adams”) in view of Hu et al., (US 2020/0349923 A1, herein “Hu”).
Regarding claim 1, Adams teaches a computer-implemented method comprising (Adams col. 15, lines 50-56 and col. 16, lines 60-63, a method for predicting an expected pronunciation of a foreign text based on a language of origin in speech recognition, implemented as a computer implemented method): 
receiving text data (Adams col. 15, lines 58-60, in the method for predicting pronunciations, textual identifiers linked to content items are provided (receiving) to the ASR system); 
providing the text data as input into a trained machine-learning model (Adams col. 15, lines 60-66, the ASR system processes (thus receives as an input) the textual identifiers to determine one or more expected pronunciations of the textual identifiers, where col. 7, lines 58-65 teach that the ASR processing is performed using a trained speech recognition model); and 
receiving as output from the trained machine-learning model, output data indicative of a pronunciation of the text data (Adams col. 15, line 64 – col. 16, line 32, at block 608, the ASR system determines one or more expected pronunciation(s) of the textual identifier which is then further processed (thus receiving as output) to combine the pronunciations).
Adams does not explicitly teach including at least one character that is a non-letter character.
Hu teaches including at least one character that is a non-letter character (Hu para. [0036], prediction network able to process an into to a label representing a symbol or character in a specified natural language, where symbols include punctuation and other symbols (thus non-letter character) – as disclosed above, support for this specific portion of para 36 of Hu is found at least in paras. 21-26 disclosing the inputs to the prediction network including such out-of-lexicon words as “$9.95”).
Therefore, taking the teachings of Adams and Hu together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the pronunciation prediction as cited to above in Adams with the speech recognition hypothesis for symbols as disclosed in Hu at least because doing so would allow for biasing a model to recognize a foreign word even if the ASR model was not trained on any words other than American English (Hu para. [0024]).
Regarding claim 2, Adams teaches further comprising: training a machine-learning model with training data to form the trained machine-learning model, wherein the training data includes text-based words and associated correct pronunciations (Adams col. 7, line 41 – col. 8, line 23, ASR processing using a trained speech recognition model, the model trained using a training corpus (training data) including a number of sample utterances with associated feature vectors (correct pronunciations) and associated text (text-based words)).
Regarding claim 3, Adams teaches wherein at least one of the text-based words includes (Adams col. 7, line 59 - col. training corpus with recorded speech and corresponding transcription).
Adams does not explicitly teach a character that is a non-letter character.
Hu teaches a character that is a non-letter character (Hu para. [0036], ASR system preconfigured to recognize punctuation and other symbols (non-letter) – as disclosed above, support for this specific portion of para 36 of Hu is found at least in paras. 21-26 disclosing the inputs to the prediction network including such out-of-lexicon words as “$9.95”).
Therefore, taking the teachings of Adams and Hu together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the pronunciation prediction as cited to above in Adams with the speech recognition hypothesis for symbols as disclosed in Hu at least because doing so would allow for biasing a model to recognize a foreign word even if the ASR model was not trained on any words other than American English (Hu para. [0024]).
Regarding claim 4, Adams teaches wherein the training data includes at least a portion of a text-based word identified as having a correct pronunciation different than phonetic pronunciation (Adams col. 8, lines 2-6, col. 14, lines 24-50 and col. 13, lines 30-56, training corpus including sample utterances for creating mathematical models corresponding to audio for particular speech units including a word or a part of a syllable (different than phonetic pronunciation), where pronunciations can be of a letter in another language (like German) where there is no correspondence to an English phoneme, and can instead be described by linguistic articulatory features).
Regarding claim 5, Adams teaches wherein training the machine-learning model includes training the machine-learning model to produce pronunciation output based on text input (Adams col. 7, line 58 – col. 8, line 6 and col. 15, line 58-66, speech recognition model is a trained model, and uses textual identifiers linked to the content item as input to determine an expected pronunciation of the textual identifier
Regarding claim 6, Adams teaches further comprising: receiving an utterance from a user; and further training the trained machine-learning model using the utterance (Adams fig. 1, col. 16, lines 26-32, expected pronunciations and weights of the trained model are adjusted (further training) based on user history such as a typical pronunciation of a user (receiving utterance from a user)).
Regarding claim 7, Adams teaches wherein further training the trained machine-learning model using the utterance includes (Adams col. 13, line 10 – col. 14, line 65, further adjusting of the pronunciation prediction model including mixtures of pronunciations from different languages such as English and German): 
processing the utterance to identify within the utterance at least a first portion and a second portion (Adams col. 14, lines 36-50, a user pronounces a first portion of a band name Kraftwerk in English and a second portion in German); 
conducting a search to find a match to the first portion in a database, the match having associated data (Adams fig. 1, col. 14, lines 4-46, hybrid-pronunciations of the band name Kraftwerk matching to an English pronunciation for “Kraft” (first portion) to a textual identifier Kraftwerk which has multiple expected pronunciations associated with it, and has a first syllable of Kraft, where col. 11, lines 15-28 teach that textual identifiers  come from a dictionary of words or a lexicon (database));
analyzing the associated data to identify a third portion in the associated data that has a similarity to the second portion (Adams col. 14, lines 10-53, the combination pronunciation of an English Kraft and a German Werk are matched to the band name Kraftwerk as a textual identifier (third portion/a portion of the match in the lexicon dictionary), and then the pronunciation graphs are combined when a hybrid pronunciation is determined (thus analysis of the hybrid pronunciation determining that the hybrid pronunciation with the German Werk is similar/matches to the textual identifier (third portion) Kraftwerk); and 
training the trained machine-learning model with the third portion as training input data and the second portion as training output data (Adam col. 7, lines 58-62, col. 8, lines 2-26, col. 11, lines 47-64, col. 13, lines 52-67, col. 14, line 54 – col. 15, line 49, ASR system is trained to recognize that a hybrid pronunciation including a German Werk (second portion) is an output expected pronunciation, given an input corresponding transcription of Kraftwerk, where the training corpus used to train the speech recognition model is adapted to incorporate the tendencies of users).
Adams does not explicitly teach with a speech to text engine.
Hu teaches a speech to text engine (Hu paras. [0027]-[0028], ASR generates a transcription from input speech – support from Hu priority provisional found at least in paras. 18 and 39).
Therefore, taking the teachings of Adams and Hu together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the pronunciation prediction as cited to above in Adams with the speech transcription as disclosed in Hu at least because doing so would allow for processing of an input utterance by natural language understanding modules for further functionality, such as executing a user command (Hu paras. [0027]-[0028]).
Regarding claim 8, Adams teaches further comprising: identifying a media content item associated with the match; and initiating playback of the media content item (Adams col. 16, lines 47-51, a content item associated with the highest scoring matching textual identifier is determined and then the content item is accessed and a command such as playing music is executed by the device).
Regarding claim 9, Adams teaches wherein the training data includes training output data based on a standardized phonetic representation of a spoken language (Adams col. 13, lines 10-18 and 56-58 and col. 14, lines 4-15, expected pronunciations (training output data) based on a grapheme to phoneme conversion or pronguessing model (standard) for a particular language).
Regarding claim 10, Adams does not explicitly teach the limitations of claim 10.
Hu teaches wherein the standardized representation is an International Phonetic Alphabet (IPA), a Speech Assessment Methods Phonetic Alphabet (SAMPA), an Extended SAMPA (X-SAMPA), or a Speech Synthesis Markup Language (SSML) (Hu para. [0039], given that the claim recites the limitations in the alternate “or”, Hu teaches that the X-SAMPA phoneme set is used for mapping phonemes in its speech recognition process – support from the Hu provisional application found at least in para. 30).
Therefore, taking the teachings of Adams and Hu together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the pronunciation prediction as cited to above in Adams with the phoneme set as disclosed in Hu at least because doing so would allow for biasing a model to recognize a foreign word even if the ASR model was not trained on any words other than American English (Hu para. [0024]).
Regarding claim 11, Adams teaches wherein the training output data is formatted as a vector representation (Adams col. 7, line 66 – col. 8, line 2, training corpus including the utterances, associated correct text for transcription and associated feature vectors).
Regarding claim 12, Adams teaches wherein the trained machine-learning model comprises a neural network (Adams col. 10, lines 51-52, a neural-network is used to perform the ASR processing).
Regarding claim 13, Adams does not explicitly teach the limitations of claim 13.
Hu teaches further comprising: providing the output data to a text-to-speech system for producing speech output based on the output data (Hu para. [0028], a text-to-speech system converts a transcription output from the ASR system (thus received) to synthesized speech for audible output by another device).
Therefore, taking the teachings of Adams and Hu together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the pronunciation prediction as cited to above in Adams with text-to-speech synthesis as disclosed in Hu at least because doing so would allow for the desired operation of delivering a message from one user to a friend for them to be able to listen to the message  (Hu para. [0028]).
Regarding claim 14, Adams teaches a system comprising a memory storing instructions that, when executed by one or more processors, cause the one or more processors to: (Adams col. 16, line 60 – col. 17, line 3, claim 4, the present disclosure implemented as a computing system with at least one processor and at least one memory where the memory includes instructions that when executed by the at least one processor executes various steps, such as the steps of the disclosed method)
Adams col. 15, lines 58-60, in the method for predicting pronunciations, textual identifiers linked to content items are provided (receiving) to the ASR system); 
provide the text data as input into a trained machine-learning model (Adams col. 15, lines 60-66, the ASR system processes (thus receives as an input) the textual identifiers to determine one or more expected pronunciations of the textual identifiers, where col. 7, lines 58-65 teach that the ASR processing is performed using a trained speech recognition model); and 
receive as output from the trained machine-learning model, output data indicative of a pronunciation of the text data (Adams col. 15, line 64 – col. 16, line 32, at block 608, the ASR system determines one or more expected pronunciation(s) of the textual identifier which is then further processed (thus receiving as output) to combine the pronunciations).
Adams does not explicitly teach including at least one character that is a non-letter character.
Hu teaches including at least one character that is a non-letter character (Hu para. [0036], prediction network able to process an into to a label representing a symbol or character in a specified natural language, where symbols include punctuation and other symbols (thus non-letter character) – as disclosed above, support for this specific portion of para 36 of Hu is found at least in paras. 21-26 disclosing the inputs to the prediction network including such out-of-lexicon words as “$9.95”).
Therefore, taking the teachings of Adams and Hu together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the 
Regarding claim 15, Adams teaches further comprising media streaming application instructions stored in a non-transitory memory (Adams col. 16, lines 47-51 and 60-66, command associated with the utterance is executed (application instructions) against the content item such as playing music on the ASR system, where the ASR system includes a non-transitory computer readable medium comprising instructions for performing the process described in the disclosure, and where col. 3, lines 34-36 teach that the music is a streamed song) of a voice-interactive device (Adams fig. 1, col. 2, lines 62-65, user making utterances received by an ASR device 100) executable to cause operation of a media streaming application on the voice-interactive device (Adams col. 3, lines 11-18 and 34-36 and col. 4, line 65 – col. 5, line 22, device accesses the content and plays a song, where the song can be a streaming song (media streaming) and where the ASR device has a memory with computer instructions for storing executable instructions (application) to perform the disclosed operations of the ASR device).
Regarding claim 16, Adams teaches further comprising instructions that cause the one or more processors to: 
train a machine-learning model with training data to form the trained machine-learning model, wherein the training data includes text-based words and associated Adams col. 7, line 41 – col. 8, line 23, ASR processing using a trained speech recognition model, the model trained using a training corpus (training data) including a number of sample utterances with associated feature vectors (correct pronunciations) and associated text (text-based words)).
Regarding claim 17, Adams teaches wherein the training data includes at least a portion of the text-based word identified as having a correct pronunciation different than a phonetic pronunciation (Adams col. 8, lines 2-6, col. 14, lines 24-50 and col. 13, lines 30-56, training corpus including sample utterances for creating mathematical models corresponding to audio for particular speech units including a word or a part of a syllable (different than phonetic pronunciation), where pronunciations can be of a letter in another language (like German) where there is no correspondence to an English phoneme, and can instead be described by linguistic articulatory features).
Regarding claim 18,  Adams teaches wherein training the machine-learning model includes training the machine-learning model to produce pronunciation output based on text input (Adams col. 7, line 58 – col. 8, line 6 and col. 15, line 58-66, speech recognition model is a trained model, and uses textual identifiers linked to the content item as input to determine an expected pronunciation of the textual identifier).
Regarding claim 19, Adams teaches further comprising instructions that cause the one or more processors to: receive an utterance from a user; and further train the trained machine-learning model using the utterance (Adams fig. 1, col. 16, lines 26-32, expected pronunciations and weights of the trained model are adjusted (further training) based on user history such as a typical pronunciation of a user (receiving utterance from a user).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Adams in view of Hu, as set forth above regarding claim 1 from which claim 20 depends, further in view of IBM, “IBM-Cloud-Docs Text-to-Speech,” custom-intro.md and custom-rules.md sections, 04a91b523b tree, dated April 9, 2019, Github, https://github.com/ibm-cloud-docs/text-to-speech/tree/04a91b523b5ce0dd0729c0f76c4c29e68953c379 (herein “IBM NPL”).
Regarding claim 20, Adams does not explicitly teach the limitations of claim 20.
IBM NPL teaches wherein the text data includes a text-based word having the at least one character that is the non-letter character, wherein the non-letter character appears in a location in the text-based word normally occupied by a letter (IBM NPL custom-intro.md section, pages 2-3, “Sounds-like translation” heading disclosing that the string Str. (including a period for the abbreviation which is a non-letter character) is interpreted by default to be translated to the word “street” (thus the period . being in the place of at least the letter “e” in an unabbreviated form “street”, and also see the custom-rules.md section, pages 2-3, under the Context sensitivity header and Trailing Periods header, teaching that the input sentence “St. Anthony lives on Henry St.” is translated to the text “Saint Anthony lives on Henry Street”, and that custom models define for non-default words such as “div.” to translate to “division” where besides the trailing period character, other non-letter characters such as %, & and @ can be defined in a custom model).
Therefore, taking the teachings of Adams and IBM NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the pronunciation prediction as cited to above 


Conclusion
Applicant's amendment necessitated any new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on Monday-Friday, 9:30a-7p, eastern time zone.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656