DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 28 December 2020 in reference to application 17/135,283.  Claims 1-29 are pending and have been examined.

Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because it contains legal phraseology and is not in narrative form.  Correction is required.  See MPEP § 608.01(b).

Claim Objections
Claim 10 objected to because of the following informalities:  In the 3rd line, “annotate” should be “annotating”.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 10, 11, 13, and 14 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sunkara et al. (Robust Prediction for Punctuation and Truecasing for Medical ASR).

Consider claim 10, Sunkara teaches A method (abstract) comprising: 
receiving a first text corpus comprising punctuated and capitalized text (section 3.1, 4.3, pretraining, section 3.2, first paragraph pretraining using Wikipedia text, which is known to have punctuation and capitalization), 
annotating words in said first text corpus with a set of labels, wherein said labels indicate a punctuation and a capitalization associated with each of said words in said first text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), 
at an initial training stage, training a machine learning model on a first training set (section 3.1, pretraining) comprising: 
(i) said annotated words in said first text corpus (section 3.2, first paragraph pretraining using Wikipedia text), and 
(ii) said labels (section 3.1, subbword embeddings), 
receiving a second text corpus representing conversational speech (section 3.2, finetuning using domain text, section 4.1, medical domain data), 
annotating words in said second text corpus with said set of labels, wherein said labels indicate a punctuation and a capitalization associated with each of said words in said second text corpus (section 3.1, 4.3 , creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), 
at a re-training stage, re-training said machine learning model on a second training set (section 3.2 finetuning and domain adaptation) comprising: 
(iii) said annotated words in said second text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), and 
(iv) said labels (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), and 
at an inference stage, applying said trained machine learning model to a target set of words representing conversational speech, to predict a punctuation and capitalization of each word in said target set (section 4.4, testing).

Consider claim 11, Sunkara teaches the method of claim 10, wherein said labels indicating punctuation are selected form the groups consisting of: comma, period, question mark, and other (section 3.2, domain adaptation, list punctuation marks), and wherein said labels indicating capitalization are selected from the group consisting of: capitalized and other (section 3.1, capitalized or not).

Consider claim 13 Sunkara teaches the methold of claim 4, wherein said second text corpus is preprocessed, before said re-training, by performing contextualization, and wherein said contextualization comprises segmenting said text corpus into segments, each comprising at least two sentences (section 4.1, preprocessing transcriptions into segments to 50 words left or right, which would generally be over 2 sentences).

Consider claim 14, Sunkara teaches the system of claim 10, wherein said second text corpus is preprocessed, before said re-training, by performing data augmentation, and wherein said data augmentation comprises extending at least some of said segments by adding at least one of: one or more preceding sentences in said conversational speech, and one or more succeeding sentences in said conversational speech (section 4.1, data held back for fine tuning, segments include 50 words left or right, which would generally include additional sentences. Table 2 conversational corpus).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5, 12, 17-19, and 21-28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sunkara et al. (Robust Prediction for Punctuation and Truecasing for Medical ASR) in view of Thomson et al. (US PAP 2020/0243094).

Consider claim 1, Sunkara teaches A system (abstract) performing steps comprising: 
receive a first text corpus comprising punctuated and capitalized text (section 3.1, 4.3, pretraining, section 3.2, first paragraph pretraining using Wikipedia text, which is known to have punctuation and capitalization), 
annotate words in said first text corpus with a set of labels, wherein said labels indicate a punctuation and a capitalization associated with each of said words in said first text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), 
at an initial training stage, train a machine learning model on a first training set (section 3.1, pretraining) comprising: 
(i) said annotated words in said first text corpus (section 3.2, first paragraph pretraining using Wikipedia text), and 
(ii) said labels (section 3.1, subbword embeddings), 
receive a second text corpus representing conversational speech (section 3.2, finetuning using domain text, section 4.1, medical domain data), 
annotate words in said second text corpus with said set of labels, wherein said labels indicate a punctuation and a capitalization associated with each of said words in said second text corpus (section 3.1, 4.3 , creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), 
at a re-training stage, re-train said machine learning model on a second training set (section 3.2 finetuning and domain adaptation) comprising: 
(iii) said annotated words in said second text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), and 
(iv) said labels (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), and 
at an inference stage, apply said trained machine learning model to a target set of words representing conversational speech, to predict a punctuation and capitalization of each word in said target set (section 4.4, testing ).
Sunkara does not specifically teach
at least one hardware processor; and 
a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor.
In the same field of punctuation and capitalization, Thomson teaches at least one hardware processor (para. 1713 processors); and 
a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor (para. 1714, memory storing instructions).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to use processors and memories as taught by Thomson in the system of Sunkara in order to implement computer processing steps using well known computer hardware that is widely available.

Consider claim 2, Sunkara teaches the system of claim 1, wherein said labels indicating punctuation are selected form the groups consisting of: comma, period, question mark, and other (section 3.2, domain adaptation, list punctuation marks), and wherein said labels indicating capitalization are selected from the group consisting of: capitalized and other (section 3.1, capitalized or not).

Consider claim 3, Sunkara teaches the system of claim 1, but does not specifically teach wherein said first text corpus is preprocessed, before said training, by at least transforming all words in said first text corpus into lowercase.
In the same field of punctuation and capitalization, Thomson teaches wherein said first text corpus is preprocessed, before said training, by at least transforming all words in said first text corpus into lowercase (0448, case remover may remove uppercase from training data).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to remove capitalization as taught by Thomson in the system of Sunkara in order to allow performance of predictions of the system to be compared to known ground truth data later.  

Consider claim 4, Sunkara teaches the system of claim 1, wherein said second text corpus is preprocessed, before said re-training, by performing contextualization, and wherein said contextualization comprises segmenting said text corpus into segments, each comprising at least two sentences (section 4.1, preprocessing transcriptions into segments to 50 words left or right, which would generally be over 2 sentences).

Consider claim 5, Sunkara teaches the system of claim 1, wherein said second text corpus is preprocessed, before said re-training, by performing data augmentation, and wherein said data augmentation comprises extending at least some of said segments by adding at least one of: one or more preceding sentences in said conversational speech, and one or more succeeding sentences in said conversational speech (section 4.1, data held back for fine tuning, segments include 50 words left or right, which would generally include additional sentences. Table 2 conversational corpus).

Claim 12 contains similar limitations as claim 3 and is therefore rejected for the same reasons.

Consider claim 17, Sunkara teaches A system (abstract) performing steps comprising: 
receive a first text corpus comprising punctuated and capitalized text (section 3.1, 4.3, pretraining, section 3.2, first paragraph pretraining using Wikipedia text, which is known to have punctuation and capitalization), 
annotate words in said first text corpus with a set of labels, wherein said labels indicate a punctuation and a capitalization associated with each of said words in said first text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), 
at an initial training stage, train a machine learning model on a first training set (section 3.1, pretraining) comprising: 
(i) said annotated words in said first text corpus (section 3.2, first paragraph pretraining using Wikipedia text), and 
(ii) said labels (section 3.1, subbword embeddings), 
receive a second text corpus representing conversational speech (section 3.2, finetuning using domain text, section 4.1, medical domain data), 
annotate words in said second text corpus with said set of labels, wherein said labels indicate a punctuation and a capitalization associated with each of said words in said second text corpus (section 3.1, 4.3 , creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), 
at a re-training stage, re-train said machine learning model on a second training set (section 3.2 finetuning and domain adaptation) comprising: 
(iii) said annotated words in said second text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), and 
(iv) said labels (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription), and 
at an inference stage, apply said trained machine learning model to a target set of words representing conversational speech, to predict a punctuation and capitalization of each word in said target set (section 4.4, testing ).
Sunkara does not specifically teach
A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by at least one hardware processor.
In the same field of punctuation and capitalization, A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by at least one hardware processor (para. 1714, memory storing instructions).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to use processors and memories as taught by Thomson in the system of Sunkara in order to implement computer processing steps using well known computer hardware that is widely available.

Claim 18 contains similar limitations as claim 3 and is therefore rejected for the same reasons.

Claim 19 contains similar limitations as claim 2 and is therefore rejected for the same reasons.

Consider claim 21, Sunkara teaches A system (abstract) 
to perform operations of a multi-task neural network, the multi-task neural Figure 1, BERT network) network comprising: 
a capitalization prediction network that receives as input a text corpus comprising at least one sentence, and predicts a capitalization of each word in said at least one sentence, wherein the capitalization prediction network is trained based on a first loss function (figure 1, case function layer,  section 3, case loss functions), 29P20035-US-00 
a punctuation prediction network that receives as input said text corpus, and predicts a punctuation with respect to said text corpus, wherein the punctuation prediction network is trained based on a second loss function (figure 1, punctuation layer, section 4, punctuation loss function), and 
an output layer which outputs a joint prediction of said capitalization and said punctuation, based on a multi-task loss function that combines said first and second loss functions (section 3.1, joint learning function, is combined loss function determined by output of network), 
wherein said capitalization prediction network and said punctuation prediction network are jointly trained (section 3.1 joint learning.).
Sunkara does not specifically teach
at least one hardware processor; and 
a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor.
In the same field of punctuation and capitalization, Thomson teaches at least one hardware processor (para. 1713 processors); and 
a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor (para. 1714, memory storing instructions).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to use processors and memories as taught by Thomson in the system of Sunkara in order to implement computer processing steps using well known computer hardware that is widely available.

Consider claim 22, Sunkara teaches the system of claim 21, wherein said program instructions are further executable to apply, at an inference stage, said multi-task neural network to a target set of words representing conversational speech, to predict a punctuation and capitalization of each word in said target set (section 4.4, testing, predicting case and punctuation).

Consider claim 23, Sunkara teaches the system of claim 21, wherein said joint training comprises training said capitalization prediction network and said punctuation prediction network jointly, at an initial training stage (section 3.1, pretraining), on a first training set comprising: 
(i) a first text corpus comprising punctuated and capitalized text (section 3.2, first paragraph pretraining using Wikipedia text); and 
(ii) labels indicating a punctuation and a capitalization associated with each of said words in said first text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription).

Consider claim 24, Sunkara teaches The system of claim 23, wherein said joint training further comprises training said capitalization prediction network and said punctuation prediction network jointly, at a re- training stage (section 3.2 finetuning and domain adaptation)), on a second training set comprising: 
(iii) a second text corpus representing conversational speech (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription); and 
(iv) labels indicating a punctuation and a capitalization associated with each of said words in said second text corpus (section 3.1, creating sub word embeddings for tokens, section 3.3, domain task adaptions, tokens include punctuation and capitalization, section 4.1 ground truth transcription).

Consider claim 25, Sunkara teaches the system of claim 24, wherein said labels indicating punctuation are selected form the groups consisting of: comma, period, question mark, and other (section 3.2, domain adaptation, list punctuation marks), and wherein said labels indicating capitalization are selected from the group consisting of: capitalized and other (section 3.1, capitalized or not).

Consider claim 26, Sunkara teaches the system of claim 24, but does not specifically teach wherein said first text corpus is preprocessed, before said training, by at least transforming all words in said first text corpus into lowercase.
In the same field of punctuation and capitalization, Thomson teaches wherein said first text corpus is preprocessed, before said training, by at least transforming all words in said first text corpus into lowercase (0448, case remover may remove uppercase from training data).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to remove capitalization as taught by Thomson in the system of Sunkara in order to allow performance of predictions of the system to be compared to known ground truth data later.  

Consider claim 27, Sunkara teaches the system of claim 24, wherein said second text corpus is preprocessed, before said re-training, by performing contextualization, and wherein said contextualization comprises segmenting said text corpus into segments, each comprising at least two sentences (section 4.1, preprocessing transcriptions into segments to 50 words left or right, which would generally be over 2 sentences).

Consider claim 28, Sunkara teaches the system of claim 24, wherein said second text corpus is preprocessed, before said re-training, by performing data augmentation, and wherein said data augmentation comprises extending at least some of said segments by adding at least one of: one or more preceding sentences in said conversational speech, and one or more succeeding sentences in said conversational speech (section 4.1, data held back for fine tuning, segments include 50 words left or right, which would generally include additional sentences. Table 2 conversational corpus).

Claim(s) 7, 20, and 29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sunkara and Thomson as applied to claims 1 and 17 above, and further in view of Xue et al. (US PAP 2019/0057306).

Consider claim 7, Sunkara and Thomson teach the system of claim 1, but do not specifically teach wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings.
In the same field text processing with neural networks, Xue teaches wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings (0062, text is preprocessed by adding EOS tokens).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use EOS tokens as taught by Xue in the system of Sunkara and Thomson in order to allow for the system to properly identify the boundaries of sentences (Xue 0062).

Consider claim 20, Sunkara teaches the computer program product of claim 17, wherein said second text corpus is preprocessed, before said re-training, by performing contextualization, and wherein said contextualization comprises segmenting said text corpus into segments, each comprising at least two sentences (section 4.1, preprocessing transcriptions into segments to 50 words left or right, which would generally be over 2 sentences), data augmentation, and wherein said data augmentation comprises extending at least some of said segments by adding at least one of: one or more preceding sentences in said conversational speech, and one or more succeeding sentences in said conversational speech (section 4.1, data held back for fine tuning, segments include 50 words left or right, which would generally include additional sentences. Table 2 conversational corpus).
Sunkara and Thomson do not specifically teach wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings.
In the same field text processing with neural networks, Xue teaches wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings (0062, text is preprocessed by adding EOS tokens).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use EOS tokens as taught by Xue in the system of Sunkara and Thomson in order to allow for the system to properly identify the boundaries of sentences (Xue 0062).

Consider claim 29, Sunkara and Thomson teach the system of claim 24, but do not specifically teach wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings.
In the same field text processing with neural networks, Xue teaches wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings (0062, text is preprocessed by adding EOS tokens).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use EOS tokens as taught by Xue in the system of Sunkara and Thomson in order to allow for the system to properly identify the boundaries of sentences (Xue 0062).

Claim(s) 8 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sunkara and Thomson as applied to claims 1 above, and further in view of Kaur et al. (US PAP 2021/0256220).

Consider claim 8, Sunkara and Thomson teach the system of claim 1, wherein said second text corpus and said target set of words each comprises transcribed text representing a conversation between at least two participants (Sunkara table 4, training using medical conversational data), but do not specifically teach wherein said at least two participants are an agent at a call center and a customer.
In the same field of text processing, Kaur teaches wherein said at least two participants are an agent at a call center and a customer (0045, corpus includes costumer and agent utterances).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use conversation data between an agent and a customer as taught by Kaur in the system of Sunkara and Thomson in order to tailor the trained model to agent and client conversation scenarios. 

Consider claim 9, Sunkara teaches the system of claim 8, wherein said transcribing comprises at least one analysis selected from the group consisting of: textual detection, speech recognition, and speech-to- text detection (abstract, model applied to ASR output).

Claim(s) 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sunkara in view of Xue et al. (US PAP 2019/0057306).

Consider claim 16, Sunkara teach the method of claim 10, but do not specifically teach wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings.
In the same field text processing with neural networks, Xue teaches wherein said second text corpus is preprocessed, before said re-training, by including end-of-sentence (EOS) embeddings (0062, text is preprocessed by adding EOS tokens).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use EOS tokens as taught by Xue in the system of Sunkara in order to allow for the system to properly identify the boundaries of sentences (Xue 0062).

Allowable Subject Matter
Claims 6 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The following is a statement of reasons for the indication of allowable subject matter:  

Consider claim 6, Sunkara teaches the system of claim 1, wherein said predicting comprises a confidence score associated with each of said predicted punctuation and predicted capitalization (table 1 f-scores, section 3.1, output probabilities.). However the prior art does not teach or fairly suggest the limitations of  “wherein, when a word in said target set is included in two or more of said segments and receives two or more of said predictions with respect to said punctuation or capitalization, said confidence scores associated with said two or more predictions are averaged to produce a final confidence score of said predicting” when combined with each and every other limitation of the claim and the base claim.  Therefore claim 6 contains allowable subject matter.

Claim 15 contains similar limitations as claim 6 and therefore contains allowable subject matter as well.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Coden et al. (2002/0099744) teaches recovering capitalization and punctuation as well. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655