Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1,3-8,10-20 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by McCarthy (7092888).

As per claim 1, McCarthy (7092888) teaches a method, comprising:
receiving, by a first device, a set of audio data files corresponding to a set of calls, (as input speech, converted into waveforms, input into a recognizer – Fig 4, speech input, both into recognizer A and waveform database; being used in a call routing system -- abstract)

determining, by the first device, a plurality of text-audio pairs within the set of calls (as receiving transcripts from the database – fig 4, subblock 8, which outputs updates to an acoustic model trainer (speech) – fig 4, subblock 10, and a grammar builder with topic information (text) – fig 4, subblock 8,11); training, by the first device and using a machine learning process, a text-to-speech model based on the plurality of text-audio pairs;
receiving, by the first device, a query from a second device (as receiving an incoming call/query – abstract);
inputting, by the first device, a text-based reply to the query to the text-to-speech model to obtain information for an audio output corresponding to the text-based reply (as text translated in Fig. 4, subblock 1); 
and providing, by the first device and to the second device, the information for the audio
output, wherein the audio output is used for providing an audible response (as updating the recognizer A with the topic/recognized updated models – fig.4, subblock 16 and providing the updates to fig. 4, subblock 1).


As per claim 3, McCarthy (7092888) teaches the method of claim 1, further comprising: determining a response to the query, wherein the response to the query includes the text-based reply (as text translated in Fig. 4, subblock 1).

McCarthy (7092888) teaches the method of claim 1, where training the text-to-speech model based on the plurality of text-audio pairs within the set of calls comprises: training the text-to-speech model using the plurality of text-audio pairs corresponding to industry-specific vocabulary (as the topic information from the call center/agent (Fig. 4, subblock 210) sends the information to a separated section that provides the unsupervised training and information – fig. 4, subblock 16; into another device that receives the input speech, topic analyzes, and routes the call – fig. 4, subblock 1; as well as, an agent in a call center – abstract, with particular agents for each particular subindustry – col. 13 lines 40-50, with agent and their ID, and col. 19, lines 29-36, 58-67, wherein the agent is chosen according to the analyzed information, which includes the topic fields).

As per claim 5, McCarthy (7092888) teaches the method of claim 1, further comprising:
receiving a set of transcripts corresponding to the set of audio data files; and determining the plurality of text-audio pairs within the set of calls, wherein a text-audio pair, of the plurality of text-audio pairs, comprises: a digital representation of a segment of a call of the set of calls, and
a corresponding excerpt of text from the set of transcripts data (as storing and analyzing the input audio/speech and the transcript derived from the recognition of the utterance – col. 19 lines 30-50).

As per claim 6, McCarthy (7092888) teaches the method of claim 1, further comprising:
determining context of a text-audio pair, of the plurality of text-audio pairs, based on one


As per claim 7, McCarthy (7092888) teaches the method of claim 1, further comprising:
identifying a text-audio pair, of the plurality of text-audio pairs, between a segment of a
waveform of audio data and an excerpt of a corresponding transcript (as the speech is converted into digitized waveforms – Fig. 4, speech into waveforms; as receiving transcripts from the database – fig 4, subblock 8, which outputs updates to an acoustic model trainer (speech) – fig 4, subblock 10, and a grammar builder with topic information (text) – fig 4, subblock 8,11)

Claims 8,10-14 are device claims that perform the method steps of claims 1,3-7 above; as such, claims 8,10-14 are similar in scope and content to claims 1,3-7 above and therefore, claims 8,10-14 are rejected under similar rationale as presented against claims 1-8 above.  Additionally, McCarthy teaches processor/memory --  col. 1 lines 20-30).  And, further to claim 14, McCarthy discusses account information and access – “Description Paragraph - DETX (84):The present invention has application to other speech recognition tasks as well as those described above. Examples include: the capture of account information, both numeric and alpha-numeric, the capture of yes/no responses, phrase grammars where the caller must match a pre-programmed response such as at a prompt "please say one of the following: billing, orders, cancel service, or technical assistance."”

. 

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2,9 are rejected under 35 U.S.C. 103 as being unpatentable over McCarthy (7092888) in view of Meyer (20130231935).

As per claims 2,9, McCarthy (7092888) teaches the method of claim 1 as noted above, but does not explicitly teach selecting a text-to-speech library from a plurality of libraries, the text-to-speech library providing the audio output based on or one or more of: dialect, rate of speech, tone, or language; however, Meyer (20130231935) teaches a call center (para 0004) in a ivr dialog (para 0004) with speech synthesis (para 0005) altering the pitch, amplitude and/or duration (para 0040), with the speech modification stored in libraries (para 0009).  Therefore, it would have been obvious to one of ordinary skill in the art of call center ivr systems to modify McCarthy (7092888) with alterable speech synthesis libraries, as taught by Meyer (20130231935) above, because it would advantageously match the intonation of the output speech with the detected context (Meyer (20130231935), para 0040).   Further to claim 9, the combination of McCarthy (7092888) in view of Meyer (20130231935) teaches the concept of tying the speech synthesis output to the location/dialect of the user (McCarthy – detx31, “Other fields in the inbound call area 105 may include a geographical region field”, and in view of Meyers intention to provide output audio suitable for the user – para 0004, 0009, 0040; the combination teaches the concept of providing output audio matching the geographic location of the user).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see related art listed on the PTO-892 form.
Bezar (20150170638) teaches analysis of call center information in IVR environement – para 0012, 0021.
Meyer (20110202344) teaches text transcription and synthesis in ivr dialog systems – para 0004-0009)


Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        08/10/2021