Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 5/19/2020.  These drawings are accepted.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1,5,6,7,8,12,13,14,15,19,20 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Golipour et al (US Publication No.: 20180330729).
	Claim 1, Golipour et al discloses 
accessing a dataset comprising text-based messages (Fig. 10a, label text corpus. Paragraph 249 discloses “For example, text corpus 802 can include one or more documents …, one or more emails, one or more text messages, or the like.” Fig. 9 shows an example of the text corpus.); 
generating tokens for words and punctuation marks contained in the text-based messages, each token corresponding to one word or one punctuation mark (Fig. 10a, label tokenizer. Paragraph 265 discloses “tokenizer 120 can perform syntax and/or semantic analysis of the received text corpus 802; recognize characters including words, sequences of letters, symbols, punctuation marks, …. and generate one or more sequences of tokes based on the recognized characters. A token can be a structure representing one or more characters.” Fig. 10b shows an example of tokenization of a text corpus, wherein tokens are generated that include recognized whitespaces such as punctuation marks. Fig. 10c shows one or more tokens 1033 and corresponding pattern 1032,1034,1036 (paragraph 268-269).); 
generating, using a processor implementing natural language processing (Fig. 10a, label 820 includes tokenizing the input text corpus, feature extraction and classification of the tokens associated with labels. Such indicates natural language processing. Fig. 7a, label processors. Paragraph 279 discloses “classifier 1060 can classify tokens using data-driven learning networks such as machine learning techniques. ), a vector representation for each of a plurality of the tokens (Fig. 10a, label feature extractor. Paragraph 267 discloses “feature extractor 1040 can determine features associated with the tokens and perform word embedding of the tokens based on the determined features.”); 
generating, for each of a plurality of the text-based messages in the dataset (Fig. 10a, label text corpus), a sequence of tokens corresponding to the text-based message and identifying ones of the tokens that represent punctuation marks (Fig. 10c, label 1033,1035,1037 are sequences of tokens with respective corresponding patterns 1032,1034,1036. Paragraph 265 discloses tokens are generated based on the recognized characters that include punctuation marks.); and 
training an artificial neural network to predict use of the punctuation marks in sentence structures (Fig. 10a, label sequence of normalized text. Paragraph 283 discloses “pronunciation generator 840 can include one or more data-driven learning networks (e.g. deep learning RNN) and can thus be trained using the sequence of normalized text representing a normalized training text corpus. After training of pronunciation generator 840, a current sequence of normalized text that is to-be-converted to speech can be provided to pronunciation generator 840 for generating speech. In some examples, as more sequences of normalized text representing training text corpuses are provided to pronunciation generator 840, pronunciation generator 840 can learn to generate more context-sensitive pronunciations, and thus improving the accuracy of pronunciation.” The generation of the pronunciation of the current sequence of normalized text is a generation of sentence structure. This indicates the training of the deep learning RNN (as indicated above) using the sequence of normalized text (Fig. 10a, label 822) is a prediction of pronunciation of sentence structures. This includes when and where pronunciation of certain tokens, such as punctuations, are used within the sentence structures. Paragraph 262 discloses one or more non-standard words that are included in the text corpus (paragraph 249-250) are normalized for proper pronunciation. This indicates, based on paragraphs 249-250,283 and Fig. 10a, the predicted pronunciation is generated based on sequence of normalized text, which can include pronunciation of punctuation marks.), 
the training using the generated sequence of tokens (Paragraph 265 and Fig. 10a, label 1020 discloses label 1020 generates one or more sequences of tokens based on the recognized characters.) and the vector representations for the tokens, in the sequence of tokens (Fig. 10a, label feature extractor. Paragraph 268 discloses the feature extractor analyzes tokens to determine one or more patterns associated with the tokens.), that represent the punctuation marks. (Paragraph 283 discloses training of data driven learning networks is based on sequence of normalized text. Fig. 10a, label sequence of normalized text is generated based on text corpus, output from the feature vector and tokenizer. Paragraph 249-250 discloses the text corpus includes punctuation marks or non-standard words. Paragraph 262 discloses punctuation marks are normalized to a pronunciation. This indicates the sequence of normalized text includes punctuation marks.).
	Claim 5, Golipour et al discloses the artificial neural network is long short term memory/recurrent neural network. (Paragraph 283 discloses “pronunciation generator 840 can include one or more data-driven learning networks (e.g. deep learning RNN) …”, wherein long short term memory/RNN is a type of deep learning RNN. Paragraph 271 discloses “In some examples, feature extractor 1040 can perform word embedding using data-driven learning networks such as a deep neural network (e.g., recurrent neural works, long short-term memory (LSTM) …”.)
Claim 6, Golipour et al discloses storing in a punctuation dictionary the tokens that represent punctuation marks (paragraph 282 discloses “classifier 1060 can store the association of the labels and corresponding tokens for performing subsequent normalization tasks.”) and the vector representations (Paragraph 268 discloses “determining the features associated with the tokens, feature extractor 1040 can generate a pattern list based on the tokens provided by the tokenizer 1020.” Such list indicates storage or table of feature vectors to tokens.) for the tokens that represent punctuation marks (fig. 10a, label 1020 tokenizes the text corpus that includes punctuations. The classifier labels the tokens outputted by the tokenizer which indicates the labels and tokens include punctuation marks.). 
Claim 7, Golipour et al discloses the artificial neural network (Fig. 11a, label 840) accesses the punctuation dictionary and uses the tokens that represent punctuation marks and the vector representations for the tokens that represent punctuation marks to predict the use of punctuation marks in the sentence structure (Fig. 11a, label 840 access label 822 to predict the pronunciation sequence, label 842. Fig. 10a, label 1020,1040 is accessed to generate sequence of normalized text, wherein as per paragraph 268, the feature extractor 1040 generates a pattern list of tokens and associated feature vectors which is used at label 1060,1080 to generate label 822. Label 802 includes punctuations which indicates the tokens and feature vectors includes punctuations. (paragraph 265,249) As shown in Fig. 11a, the sequence of normalized text is used to generate pronunciation. This indicates the training of the deep learning RNN (as indicated above) using the sequence of normalized text (Fig. 10a, label 822) is a prediction of pronunciation of sentence structures. This includes when and where pronunciation of certain tokens, such as punctuations, are used within the sentence structures. Paragraph 262 discloses one or more non-standard words that are included in the text corpus (paragraph 249-250) are normalized for proper pronunciation. This indicates, based on paragraphs 249-250,283 and Fig. 10a, the predicted pronunciation is generated based on sequence of normalized text, which can include pronunciation of punctuation marks.)
Claim 8, Golipour et al discloses
a processor programmed to initiate executable operations (paragraph 203,303 discloses hardware, software instructions for execution by one or more processors.) comprising:
accessing a dataset comprising text-based messages (Fig. 10a, label text corpus. Paragraph 249 discloses “For example, text corpus 802 can include one or more documents …, one or more emails, one or more text messages, or the like.” Fig. 9 shows an example of the text corpus.); 
generating tokens for words and punctuation marks contained in the text-based messages, each token corresponding to one word or one punctuation mark (Fig. 10a, label tokenizer. Paragraph 265 discloses “tokenizer 120 can perform syntax and/or semantic analysis of the received text corpus 802; recognize characters including words, sequences of letters, symbols, punctuation marks, …. and generate one or more sequences of tokes based on the recognized characters. A token can be a structure representing one or more characters.” Fig. 10b shows an example of tokenization of a text corpus, wherein tokens are generated that include recognized whitespaces such as punctuation marks. Fig. 10c shows one or more tokens 1033 and corresponding pattern 1032,1034,1036 (paragraph 268-269).); 
generating, using a processor implementing natural language processing (Fig. 10a, label 820 includes tokenizing the input text corpus, feature extraction and classification of the tokens associated with labels. Such indicates natural language processing. Fig. 7a, label processors. Paragraph 279 discloses “classifier 1060 can classify tokens using data-driven learning networks such as machine learning techniques.), a vector representation for each of a plurality of the tokens (Fig. 10a, label feature extractor. Paragraph 267 discloses “feature extractor 1040 can determine features associated with the tokens and perform word embedding of the tokens based on the determined features.”); 
generating, for each of a plurality of the text-based messages in the dataset (Fig. 10a, label text corpus), a sequence of tokens corresponding to the text-based message and identifying ones of the tokens that represent punctuation marks (Fig. 10c, label 1033,1035,1037 are sequences of tokens with respective corresponding patterns 1032,1034,1036. Paragraph 265 discloses tokens are generated based on the recognized characters that include punctuation marks.); and 
training an artificial neural network to predict use of the punctuation marks in sentence structures (Fig. 10a, label sequence of normalized text. Paragraph 283 discloses “pronunciation generator 840 can include one or more data-driven learning networks (e.g. deep learning RNN) and can thus be trained using the sequence of normalized text representing a normalized training text corpus. After training of pronunciation generator 840, a current sequence of normalized text that is to-be-converted to speech can be provided to pronunciation generator 840 for generating speech. In some examples, as more sequences of normalized text representing training text corpuses are provided to pronunciation generator 840, pronunciation generator 840 can learn to generate more context-sensitive pronunciations, and thus improving the accuracy of pronunciation.” The generation of the pronunciation of the current sequence of normalized text is a generation of sentence structure. This indicates the training of the deep learning RNN (as indicated above) using the sequence of normalized text (Fig. 10a, label 822) is a prediction of pronunciation of sentence structures. This includes when and where pronunciation of certain tokens, such as punctuations, are used within the sentence structures. Paragraph 262 discloses one or more non-standard words that are included in the text corpus (paragraph 249-250) are normalized for proper pronunciation. This indicates, based on paragraphs 249-250,283 and Fig. 10a, the predicted pronunciation is generated based on sequence of normalized text, which can include pronunciation of punctuation marks.), 
the training using the generated sequence of tokens (Paragraph 265 and Fig. 10a, label 1020 discloses label 1020 generates one or more sequences of tokens based on the recognized characters.) and the vector representations for the tokens, in the sequence of tokens (Fig. 10a, label feature extractor. Paragraph 268 discloses the feature extractor analyzes tokens to determine one or more patterns associated with the tokens.), that represent the punctuation marks. (Paragraph 283 discloses training of data driven learning networks is based on sequence of normalized text. Fig. 10a, label sequence of normalized text is generated based on text corpus, output from the feature vector and tokenizer. Paragraph 249-250 discloses the text corpus includes punctuation marks or non-standard words. Paragraph 262 discloses punctuation marks are normalized to a pronunciation. This indicates the sequence of normalized text includes punctuation marks.).
	Claim 12, Golipour et al discloses the artificial neural network is long short term memory/recurrent neural network. (Paragraph 283 discloses “pronunciation generator 840 can include one or more data-driven learning networks (e.g. deep learning RNN) …”, wherein long short term memory/RNN is a type of deep learning RNN. Paragraph 271 discloses “In some examples, feature extractor 1040 can perform word embedding using data-driven learning networks such as a deep neural network (e.g., recurrent neural works, long short-term memory (LSTM) …”.)
Claim 13, Golipour et al discloses storing in a punctuation dictionary the tokens that represent punctuation marks (paragraph 282 discloses “classifier 1060 can store the association of the labels and corresponding tokens for performing subsequent normalization tasks.”) and the vector representations (Paragraph 268 discloses “determining the features associated with the tokens, feature extractor 1040 can generate a pattern list based on the tokens provided by the tokenizer 1020.” Such list indicates storage or table of feature vectors to tokens.) for the tokens that represent punctuation marks (fig. 10a, label 1020 tokenizes the text corpus that includes punctuations. The classifier labels the tokens outputted by the tokenizer which indicates the labels and tokens include punctuation marks.). 
Claim 14, Golipour et al discloses the artificial neural network (Fig. 11a, label 840) accesses the punctuation dictionary and uses the tokens that represent punctuation marks and the vector representations for the tokens that represent punctuation marks to predict the use of punctuation marks in the sentence structure (Fig. 11a, label 840 access label 822 to predict the pronunciation sequence, label 842. Fig. 10a, label 1020,1040 is accessed to generate sequence of normalized text, wherein as per paragraph 268, the feature extractor 1040 generates a pattern list of tokens and associated feature vectors which is used at label 1060,1080 to generate label 822. Label 802 includes punctuations which indicates the tokens and feature vectors includes punctuations. (paragraph 265,249) As shown in Fig. 11a, the sequence of normalized text is used to generate pronunciation. This indicates the training of the deep learning RNN (as indicated above) using the sequence of normalized text (Fig. 10a, label 822) is a prediction of pronunciation of sentence structures. This includes when and where pronunciation of certain tokens, such as punctuations, are used within the sentence structures. Paragraph 262 discloses one or more non-standard words that are included in the text corpus (paragraph 249-250) are normalized for proper pronunciation. This indicates, based on paragraphs 249-250,283 and Fig. 10a, the predicted pronunciation is generated based on sequence of normalized text, which can include pronunciation of punctuation marks.)
Claim 15, Golipour et al discloses
one or more computer readable storage mediums having program code stored thereon, the program code stored on the one or more computer readable storage mediums collectively executable by a data processing system to initiate operations (paragraph 203,303 discloses hardware, software instructions for execution by one or more processors and non-transitory computer readable storage medium storing one or more programs for execution by one or more processors. Fig. 10a, 11a as the data processing system.) including:
accessing a dataset comprising text-based messages (Fig. 10a, label text corpus. Paragraph 249 discloses “For example, text corpus 802 can include one or more documents …, one or more emails, one or more text messages, or the like.” Fig. 9 shows an example of the text corpus.); 
generating tokens for words and punctuation marks contained in the text-based messages, each token corresponding to one word or one punctuation mark (Fig. 10a, label tokenizer. Paragraph 265 discloses “tokenizer 120 can perform syntax and/or semantic analysis of the received text corpus 802; recognize characters including words, sequences of letters, symbols, punctuation marks, …. and generate one or more sequences of tokes based on the recognized characters. A token can be a structure representing one or more characters.” Fig. 10b shows an example of tokenization of a text corpus, wherein tokens are generated that include recognized whitespaces such as punctuation marks. Fig. 10c shows one or more tokens 1033 and corresponding pattern 1032,1034,1036 (paragraph 268-269).); 
generating, using a processor implementing natural language processing (Fig. 10a, label 820 includes tokenizing the input text corpus, feature extraction and classification of the tokens associated with labels. Such indicates natural language processing. Fig. 7a, label processors. Paragraph 279 discloses “classifier 1060 can classify tokens using data-driven learning networks such as machine learning techniques. ), a vector representation for each of a plurality of the tokens (Fig. 10a, label feature extractor. Paragraph 267 discloses “feature extractor 1040 can determine features associated with the tokens and perform word embedding of the tokens based on the determined features.”); 
generating, for each of a plurality of the text-based messages in the dataset (Fig. 10a, label text corpus), a sequence of tokens corresponding to the text-based message and identifying ones of the tokens that represent punctuation marks (Fig. 10c, label 1033,1035,1037 are sequences of tokens with respective corresponding patterns 1032,1034,1036. Paragraph 265 discloses tokens are generated based on the recognized characters that include punctuation marks.); and 
training an artificial neural network to predict use of the punctuation marks in sentence structures (Fig. 10a, label sequence of normalized text. Paragraph 283 discloses “pronunciation generator 840 can include one or more data-driven learning networks (e.g. deep learning RNN) and can thus be trained using the sequence of normalized text representing a normalized training text corpus. After training of pronunciation generator 840, a current sequence of normalized text that is to-be-converted to speech can be provided to pronunciation generator 840 for generating speech. In some examples, as more sequences of normalized text representing training text corpuses are provided to pronunciation generator 840, pronunciation generator 840 can learn to generate more context-sensitive pronunciations, and thus improving the accuracy of pronunciation.” The generation of the pronunciation of the current sequence of normalized text is a generation of sentence structure. This indicates the training of the deep learning RNN (as indicated above) using the sequence of normalized text (Fig. 10a, label 822) is a prediction of pronunciation of sentence structures. This includes when and where pronunciation of certain tokens, such as punctuations, are used within the sentence structures. Paragraph 262 discloses one or more non-standard words that are included in the text corpus (paragraph 249-250) are normalized for proper pronunciation. This indicates, based on paragraphs 249-250,283 and Fig. 10a, the predicted pronunciation is generated based on sequence of normalized text, which can include pronunciation of punctuation marks.), 
the training using the generated sequence of tokens (Paragraph 265 and Fig. 10a, label 1020 discloses label 1020 generates one or more sequences of tokens based on the recognized characters.) and the vector representations for the tokens, in the sequence of tokens (Fig. 10a, label feature extractor. Paragraph 268 discloses the feature extractor analyzes tokens to determine one or more patterns associated with the tokens.), that represent the punctuation marks. (Paragraph 283 discloses training of data driven learning networks is based on sequence of normalized text. Fig. 10a, label sequence of normalized text is generated based on text corpus, output from the feature vector and tokenizer. Paragraph 249-250 discloses the text corpus includes punctuation marks or non-standard words. Paragraph 262 discloses punctuation marks are normalized to a pronunciation. This indicates the sequence of normalized text includes punctuation marks.).
	Claim 19, Golipour et al discloses the artificial neural network is long short term memory/recurrent neural network. (Paragraph 283 discloses “pronunciation generator 840 can include one or more data-driven learning networks (e.g. deep learning RNN) …”, wherein long short term memory/RNN is a type of deep learning RNN. Paragraph 271 discloses “In some examples, feature extractor 1040 can perform word embedding using data-driven learning networks such as a deep neural network (e.g., recurrent neural works, long short-term memory (LSTM) …”.)
Claim 20, Golipour et al discloses storing in a punctuation dictionary the tokens that represent punctuation marks (paragraph 282 discloses “classifier 1060 can store the association of the labels and corresponding tokens for performing subsequent normalization tasks.”) and the vector representations (Paragraph 268 discloses “determining the features associated with the tokens, feature extractor 1040 can generate a pattern list based on the tokens provided by the tokenizer 1020.” Such list indicates storage or table of feature vectors to tokens.) for the tokens that represent punctuation marks (fig. 10a, label 1020 tokenizes the text corpus that includes punctuations. The classifier labels the tokens outputted by the tokenizer which indicates the labels and tokens include punctuation marks.). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 4,11,18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Golipour et al (US Publication No.: 20180330729) in view of Zhao et al (US Publication No.: 20210089936).
	Claim 4, Golipour et al discloses the classifier as a data driven learning networks such as machine learning (paragraph 279), but fails to disclose the implementing the natural language processing comprises implementing a Bidirectional Encoder Representation from Transformers language model.
	Zhao et al discloses the implementing the natural language processing comprises implementing a Bidirectional Encoder Representation from Transformers language model. (paragraph 62 discloses predicting classification labels of tokens of a sentence (Fig. 6, label 601).) It would be obvious to one skilled in the art to substitute one well known data driven learning networks as disclosed by Golipour et al with another well-known machine learning networks such as BERT as disclosed by Zhao et al so to obtain predictable results of classifying tokens with associated labels. 
Claim 11, Golipour et al discloses the classifier as a data driven learning networks such as machine learning (paragraph 279), but fails to disclose the implementing the natural language processing comprises implementing a Bidirectional Encoder Representation from Transformers language model.
	Zhao et al discloses the implementing the natural language processing comprises implementing a Bidirectional Encoder Representation from Transformers language model. (paragraph 62 discloses predicting classification labels of tokens of a sentence (Fig. 6, label 601).) It would be obvious to one skilled in the art to substitute one well known data driven learning networks as disclosed by Golipour et al with another well-known machine learning networks such as BERT as disclosed by Zhao et al so to obtain predictable results of classifying tokens with associated labels. 
Claim 18, Golipour et al discloses the classifier as a data driven learning networks such as machine learning (paragraph 279), but fails to disclose the implementing the natural language processing comprises implementing a Bidirectional Encoder Representation from Transformers language model.
	Zhao et al discloses the implementing the natural language processing comprises implementing a Bidirectional Encoder Representation from Transformers language model. (paragraph 62 discloses predicting classification labels of tokens of a sentence (Fig. 6, label 601).) It would be obvious to one skilled in the art to substitute one well known data driven learning networks as disclosed by Golipour et al with another well-known machine learning networks such as BERT as disclosed by Zhao et al so to obtain predictable results of classifying tokens with associated labels. 




Allowable Subject Matter
Claim 2-3,9-10,16-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044. The examiner can normally be reached 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LINDA WONG/Primary Examiner, Art Unit 2655