DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's Amendment with a Request for Continued Examination (RCE) submission filed on January 7, 2022, has been entered.

Response to Arguments
Applicant’s arguments and amendments in the Amendment with RCE filed 1/7/2022 (herein “Amendment”), with respect to the rejection(s) of claim(s) 1-19 under 35 U.S.C.103 have been fully considered and are persuasive in part.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Goel et al., US 10,347,244 B2.
To fully address all of applicants arguments, the following additional response is made. Applicant sets forth on pages 6-7 of the Amendment that the combination of Bangalore and Liu does not render the amended claims obvious. Specifically, Applicant See In re Keller, 642 F.2d 413, 425 (CCPA 1981).  Nor does an obviousness determination based on teachings from multiple references require an actual, physical substitution of elements.  In re Mouttet, 686 F.3d 1322, 1332 (Fed. Cir. 2012) (citing In re Etter, 756 F.2d 852, 859 (Fed. Cir. 1985)).  Rather, the test for obviousness is what the combined teachings of the references would have suggested to a person of ordinary skill in the art.  Mouttet, 686 F.3d at 1333; see also Manual of Patent Examining Procedure (MPEP) § 2145(IV) (9th ed. rev. 10.2019 June 2020).
Applicant next sets forth on pages 7-8 of the Amendment that the claim limitations “performing automatic speech recognition based on ... the accent classification,” is not disclosed by the combination of Bangalore and Liu since this would require Bangalore to modify his speech recognition to be based on accent classification as provided by Liu, and Bangalore teaches that the speech recognition produces the accent classification, not being based on it. However, no such modification to Bangalore 

    PNG
    media_image1.png
    630
    673
    media_image1.png
    Greyscale


As shown, and disclosed in the accompanying paragraphs 32-33 of Bangalore, the full speech to text process (automatic speech recognition) that outputs the “Enriched Target Text” includes both the processing from 326 and 332. It is not the Automatic Speech recognition 326 or the enriched statistical machine translation 332 that performs the prosodic labeling, rather, it is the target language automatic prosody labeler 330 that performs the prosodic labeling that is provided to enriched statistical machine translation 332 to complete the automatic speech recognition process, and output the enriched target text. Accordingly, Bangalore teaches that the automatic speech recognition process, including the enriched machine translation processing that outputs the enriched target text, is based on the prosody labeling (accent).
	Moreover, despite Liu being relied upon to provide teachings of classifying the accent with an accent classifier to yield an accent classification, Bangalore’s disclosure e.g. “*NY3”). Such strong suggestive teachings by Bangalore only serve to strengthen the case for obviousness in combining Bangalore’s teachings with Liu’s explicit teachings of an accent classifier.
Applicant next sets forth on page 8 that the newly amended “performing natural language understanding on the speech recognition output using the accent classification” (emphasis added to indicate the amended portion), would distinguish over an interpretation that Bangalore uses accent classification in its natural language processing simply because the natural language understanding is performed after (and with the pre-processing benefit) of the prosody labeling. In this way Applicant seeks to distinguish the previously claimed “based on” to be more specifically “using.” Applicant’s amendments and arguments here are persuasive, and for this limitation, newly cited Goel et al., US 10,347,244 B2, is relied upon. It is noted though, that Bangalore does disclose in para. 43 that a prosody module “could be inserted between the ASR 602 and the SLU 604” – the SLU being a “spoken language understanding module,” thus suggesting that pitch accent labels would be at least available to a natural language understanding processing for use therein.


Incorrect Claim Status Identifier
	In the Amendment, claim 6 was indicated by a claim status as “Original” rather than “Currently Amended” despite presenting an amendment in the substance of the claim. For examination purposes, claim 6 has been examined as being currently amended. Future filed claim sets should indicate claim 6 as having been previously amended. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1, 3-7, 9-13, 15-18, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Bangalore et al., (US 2010/0082326 A1, herein “Bangalore”) in view of Liu et al., (US 2008/0147404 A1, herein “Liu”), further in view of Goel et al., (US 10,347,244 B2, herein “Goel”).
Regarding claim 1, Bangalore teaches a computer-implemented method comprising (Bangalore fig. 3B, paras. [0008] and [0047], embodiments of the invention practiced in computing environments where tasks are performed by computers, and the invention includes a method for enriching spoken language translation with prosodic information in a statistical speech translation framework): 
receiving speech input, the speech including an accent characterizing the speech input (Bangalore paras. [0033], and [0036], speech 324 is accepted (receiving input) into an automatic speech recognition module where it is labeled for prosody, including “pitch accent labels” thus including an accent, where para. 30 discloses the prosody label labelling incoming speech as having a level 3 prominent New York accent, thus characterizing the speech); 
performing automatic speech recognition based on the speech input and the accent to yield an automatic speech recognition output (Bangalore fig. 3B, paras. [0033] and [0036], prosody labeling information is based on the speech and the automatic speech recognition 326 including the processing by the enriched statistical machine translation which outputs an enriched target text at 322, is enriched by the prosody labeling  (thus based on the accent) and enriched target text is output (automatic speech recognition output)); 
performing natural language understanding on the speech recognition output determining an intent of the speech recognition output (Bangalore paras. [0040]-[0042], the enriched spoken language translation as implemented in a natural spoken language dialog system, where the dialog system identifies the intents of the human users expressed in natural language, and take action accordingly to satisfy their requests, by first analyzing the input speech in ASR including the textual transcription, then using a spoken language understanding model SLU to derive a meaning (intent) from the input, and output a meaning of the speech input which is input to the dialog management module); 
using natural language generation to generate an output based on the speech recognition output and the intent (Bangalore fig. 6, para. [0041], the dialog management module receives the meaning of the speech input from the SLU module and determines a response for the user, then the spoken language generation module generates a transcription (output) of one or more words in response to the action determined by the dialog manager module); and 
rendering an output using text to speech based on the natural language generation (Bangalore paras. [0041]-[0042], the synthesizing module receives the transcription from the spoken language generation module (which are transcribed words thus text) and generates audible speech as output, which the user then hears).
While Bangalore does teach that accents as part of prosody are labeled in the input speech via a target language prosody labeler generating categorical pitch accent with the pitch accent, and a resultant rich annotation of the speech including accent marks is produced, Bangalore at least suggests that accent is determined using classification. However, Bangalore does not explicitly teach that the classification of the accent. Therefore, Bangalore does not explicitly teach “classifying the accent characterizing the speech input with an accent classifier to yield an accent classification, wherein the accent classification is defined according to at least one of a dialect, a geographic region, and an education level” or “the accent classification.”
Further, while Bangalore does at least provide the enriched target text with the prosody labelling to other components in the language data processing chain, Bangalore does not explicitly teach that at least regarding the spoken (natural) language understanding, that it is performed “using the accent classification.”
Liu teaches classifying the accent characterizing the speech input with an accent classifier to yield an accent classification (Liu para. [0078], fig. 3, feature reduced prosodic information is input to an accent identifier (accent classifier) originating from input speech 12, which identifies the accent and outputs an identified accent 316 (accent classification)), wherein the accent classification is defined according to at least one of a dialect, a geographic region, and an education level (Liu paras. [0078] and [0082], accent identifier identifies the accent of the input speech, where the identified accent includes Japanese, Malay, Korean and Vietnamese which are all classifications according to a geographic region where that language originates from (Japanese is from Japan, Malay is from Malaysia, Vietnamese is from Vietnam, etc.).
Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Goel teaches performing natural language understanding using the accent (Goel fig.1, col. 11, lines 41-47, semantic engine 104 derives the possible meanings of the words in an end user request (thus performs natural language understanding), where fig. 1 illustrates, and col. 17, lines 47 – 65, teach that the semantic engine 104 has transferred to it the accent from the input speech and the semantic engine extracts meaning from the words, and as well uses all of the information stored in the databases 114A and 114B to determine the meaning of the input speech signal, where Goel col. 5, lines 34-43 teaches that the data stored in the databases includes the accent of the end user, and Goel col. 15, lines 22-28 teaching that the accent is used by the dialog engine to match against the databases 114A and 114B give a better suited response by understanding (language understanding) that the user wants to connect to an agent that has a similar accent).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).

Regarding claims 3, 9, and 15, Bangalore teaches wherein the determining of the intent (Bangalore fig. 6, paras. [0036], [0039]-[0041] and [0033], the spoken language understanding module (SLU) uses a natural language understanding model to analyze the group of words that are included in transcribed output and derive a meaning for the input (intent) that allows the system how to respond).
While Bangalore teaches determining accent from prosody, Bangalore does not explicitly teach the accent classification.
Further, while Bangalore teaches determining the intent, and teaches that a prosody enriched speech to speech module could be inserted between the ASR and the SLU to generate pitch accent labels (see para 43), Bangalore does not explicitly teach that the intent determination “uses the accent classification.”
Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Goel teaches “uses the accent,” (Goel col. 15, lines 13-30, additional metadata including the accent of the end user (from their input speech) is used to search and match against a database to match a sound of the end user with another end user in the database in order to give a better response of the end user queries, such as knowing what response accent should be provided when the user wants (their intent) to connect with an agent who has a similar accent, as well, Goel teaches in col. 17, lines 51-58 teach that the entities, including accent is transferred to the semantic engine that assigns an action to the speech sequence according to the meaning (also the user intent)).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).
Further, taking the teachings of Bangalore and Goel together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include using accent data determined from speech in the processing of input speech as disclosed in Goel at least because doing so would provide a better user experience to the end user (Goel col. 5, lines 43-46).
Regarding claims 4, 10 and 16, Bangalore teaches wherein the natural language generation uses the accent (Bangalore fig. 6, para. [0033], the enriched target language output facilitates the text to speech synthesis where the enriched target text is fed to a prosody enriched text-to-speech synthesis module that takes the input words and combines pitch accent and other prosody information to generate (natural language generation) output words or phrases which are then converted to output speech).
While Bangalore teaches determining accent from prosody, Bangalore does not explicitly teach the accent classification.
Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).
Regarding claims 5, 11 and 17, Bangalore teaches wherein the rendering of the  output (Bangalore para. [0033], the prosody enriched output words are converted to output speech).
While Bangalore teaches determining accent from prosody, Bangalore does not explicitly teach the accent classification.
Further, while Bangalore teaches converting the prosody enriched output words to output speech, and teaches that a prosody enriched speech to speech module could be inserted at any point in the cycle shown in fig. 6, including synthesizing module 610 
Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Goel teaches uses the accent (Goel col. 18, lines 16-20, the natural language generator maps the accent related metadata (thus uses it) into a logical sentence to form (render) the spoken response of the dialogue system, and also col. 15, lines 13-30 which teaches that the dialogue engine provides a matching response accent in the response to the end user queries).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).
Further, taking the teachings of Bangalore and Goel together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include using accent data determined from speech in the processing of input speech as disclosed in Goel at least because doing so would provide a better user experience to the end user (Goel col. 5, lines 43-46).
Regarding claims 6, 12 and 18, Bangalore teaches wherein the performing natural language understanding on the speech recognition output, the determining an intent, the using natural language generation (Bangalore fig. 6, paras. [0036], [0039]-[0041] and [0033], temporally/sequentially all of the spoken language understanding module (SLU),  which uses a natural language understanding model synthesizing module, the spoken language generation module, and the synthesizing module perform their respective operations (detailed further in the rejection rationales for claims 2-5 above) downstream upon an input that upstream is the annotated speech transcription enriched by the pitch accent labels, and thus their operations are all based on the accent).
Bangalore further teaches the natural language generation uses the accent (Bangalore fig. 6, para. [0033], the enriched target language output facilitates the text to speech synthesis where the enriched target text is fed to a prosody enriched text-to-speech synthesis module that takes the input words and combines pitch accent and other prosody information to generate (natural language generation) output words or phrases which are then converted to output speech).
While Bangalore teaches determining accent from prosody, Bangalore does not explicitly teach the accent classification.
Further, while Bangalore teaches determining the intent, and teaches that a prosody enriched speech to speech module could be inserted between the ASR and the SLU to generate pitch accent labels (see para 43), Bangalore does not explicitly teach that the intent determination uses the accent in interpreting the “each uses” to be applied to each of the enumerated operations in the claims using the accent.

Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Goel teaches “uses the accent,” (Goel col. 15, lines 13-30, additional metadata including the accent of the end user (from their input speech) is used to search and match against a database to match a sound of the end user with another end user in the database in order to give a better response of the end user queries, such as knowing what response accent should be provided when the user wants (their intent) to connect with an agent who has a similar accent, as well, Goel teaches in col. 17, lines 51-58 teach that the entities, including accent is transferred to the semantic engine that assigns an action to the speech sequence according to the meaning (also the user intent)).
Goel further teaches uses the accent (Goel col. 18, lines 16-20, the natural language generator maps the accent related metadata (thus uses it) into a logical sentence to form (render) the spoken response of the dialogue system, and also col. 15, lines 13-30 which teaches that the dialogue engine provides a matching response accent in the response to the end user queries).

Further, taking the teachings of Bangalore and Goel together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include using accent data determined from speech in the processing of input speech as disclosed in Goel at least because doing so would provide a better user experience to the end user (Goel col. 5, lines 43-46).
Regarding claim 7, Bangalore teaches a computer program product residing on a non-transitory computer readable storage medium having a plurality of instructions stored thereon which, when executed across one or more processors, causes at least a portion of the one or more processors to perform operations comprising (Bangalore paras. [0045]-[0046], embodiments of the present invention including computer-readable media including non-transitory computer media such as RAM, ROM, EEPROM, CD-ROM etc, carrying computer-executable instructions which cause a general purpose computer to implement functions described in the steps of the methods disclosed in Bangalore):
Bangalore paras. [0033], and [0036], speech 324 is accepted (receiving input) into an automatic speech recognition module where it is labeled for prosody, including “pitch accent labels” thus including an accent, where para. 30 discloses the prosody label labelling incoming speech as having a level 3 prominent New York accent, thus characterizing the speech); 
performing automatic speech recognition based on the speech input and the accent to yield an automatic speech recognition output (Bangalore fig. 3B, paras. [0033] and [0036], prosody labeling information is based on the speech and the automatic speech recognition 326, including the processing by the enriched statistical machine translation which outputs an enriched target text at 322, is enriched by the prosody labeling  (thus based on the accent) and enriched target text is output (automatic speech recognition output)); 
performing natural language understanding on the speech recognition output determining an intent of the speech recognition output (Bangalore paras. [0040]-[0042], the enriched spoken language translation as implemented in a natural spoken language dialog system, where the dialog system identifies the intents of the human users expressed in natural language, and take action accordingly to satisfy their requests, by first analyzing the input speech in ASR including the textual transcription, then using a spoken language understanding model SLU to derive a meaning (intent) from the input, and output a meaning of the speech input which is input to the dialog management module); 
Bangalore fig. 6, para. [0041], the dialog management module receives the meaning of the speech input from the SLU module and determines a response for the user, then the spoken language generation module generates a transcription (output) of one or more words in response to the action determined by the dialog manager module); and 
rendering an output using text to speech based on the natural language generation (Bangalore paras. [0041]-[0042], the synthesizing module receives the transcription from the spoken language generation module (which are transcribed words thus text) and generates audible speech as output, which the user then hears).
While Bangalore does teach that accents as part of prosody are labeled in the input speech via a target language prosody labeler generating categorical pitch accent labels, that a classifier can be used at least for automatic prominence detection to include a prominence label with the pitch accent, and a resultant rich annotation of the speech including accent marks is produced, Bangalore at least suggests that accent is determined using classification. However, Bangalore does not explicitly teach that the classification of the accent. Therefore, Bangalore does not explicitly teach “classifying the accent characterizing the speech input with an accent classifier to yield an accent classification, wherein the accent classification is defined according to at least one of a dialect, a geographic region, and an education level” or “the accent classification.”
Further, while Bangalore does at least provide the enriched target text with the prosody labelling to other components in the language data processing chain, 
Liu teaches classifying the accent characterizing the speech input with an accent classifier to yield an accent classification (Liu para. [0078], fig. 3, feature reduced prosodic information is input to an accent identifier (accent classifier) originating from input speech 12, which identifies the accent and outputs an identified accent 316 (accent classification)), wherein the accent classification is defined according to at least one of a dialect, a geographic region, and an education level (Liu paras. [0078] and [0082], accent identifier identifies the accent of the input speech, where the identified accent includes Japanese, Malay, Korean and Vietnamese which are all classifications according to a geographic region where that language originates from (Japanese is from Japan, Malay is from Malaysia, Vietnamese is from Vietnam, etc.).
Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Goel teaches performing natural language understanding using the accent (Goel fig.1, col. 11, lines 41-47, semantic engine 104 derives the possible meanings of the words in an end user request (thus performs natural language understanding), where fig. 1 illustrates, and col. 17, lines 47 – 65, teach that the semantic engine 104 has transferred to it the accent from the input speech and the semantic engine extracts meaning from the words, and as well uses all of the information stored in the databases 114A and 114B to determine the meaning of the input speech signal, where Goel col. 5, lines 34-43 teaches that the data stored in the databases includes the accent of the end user, and Goel col. 15, lines 22-28 teaching that the accent is used by the dialog engine to match against the databases 114A and 114B give a better suited response by understanding (language understanding) that the user wants to connect to an agent that has a similar accent).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).
Further, taking the teachings of Bangalore and Goel together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include using accent data determined from speech in the semantic processing of input speech as disclosed in Goel at least because doing so would provide a better user experience to the end user (Goel col. 5, lines 43-46).
Regarding claim 13, Bangalore teaches a computing system including one or 
more processors and one or more memories configured to perform operations comprising (Bangalore fig. 3B, paras. [0008] and [0047], embodiments of the invention practiced in computing environments where tasks are performed by computers (computer system) including multi-processor systems with program modules in storage devices (memories), and the invention includes a method for enriching spoken language translation with prosodic information in a statistical speech translation framework):
receiving speech input, the speech including an accent characterizing the speech input (Bangalore paras. [0033], and [0036], speech 324 is accepted (receiving input) into an automatic speech recognition module where it is labeled for prosody, including “pitch accent labels” thus including an accent, where para. 30 discloses the prosody label labelling incoming speech as having a level 3 prominent New York accent, thus characterizing the speech); 
performing automatic speech recognition based on the speech input and the accent to yield an automatic speech recognition output (Bangalore fig. 3B, paras. [0033] and [0036], prosody labeling information is based on the speech and the automatic speech recognition 326 including the processing by the enriched statistical machine translation which outputs an enriched target text at 322, is enriched by the prosody labeling  (thus based on the accent) and enriched target text is output (automatic speech recognition output)); 
performing natural language understanding on the speech recognition output determining an intent of the speech recognition output (Bangalore paras. [0040]-[0042], the enriched spoken language translation as implemented in a natural spoken language dialog system, where the dialog system identifies the intents of the human users expressed in natural language, and take action accordingly to satisfy their requests, by first analyzing the input speech in ASR including the textual transcription, then using a spoken language understanding model SLU to derive a meaning (intent) from the input, and output a meaning of the speech input which is input to the dialog management module); 
using natural language generation to generate an output based on the speech recognition output and the intent (Bangalore fig. 6, para. [0041], the dialog management module receives the meaning of the speech input from the SLU module and determines a response for the user, then the spoken language generation module generates a transcription (output) of one or more words in response to the action determined by the dialog manager module); and 
rendering an output using text to speech based on the natural language generation (Bangalore paras. [0041]-[0042], the synthesizing module receives the transcription from the spoken language generation module (which are transcribed words thus text) and generates audible speech as output, which the user then hears).
While Bangalore does teach that accents as part of prosody are labeled in the input speech via a target language prosody labeler generating categorical pitch accent labels, that a classifier can be used at least for automatic prominence detection to include a prominence label with the pitch accent, and a resultant rich annotation of the speech including accent marks is produced, Bangalore at least suggests that accent is determined using classification. However, Bangalore does not explicitly teach that the classification of the accent. Therefore, Bangalore does not explicitly teach “classifying the accent characterizing the speech input with an accent classifier to yield an accent classification, wherein the accent classification is defined according to at least one of a dialect, a geographic region, and an education level” or “the accent classification.”

Liu teaches classifying the accent characterizing the speech input with an accent classifier to yield an accent classification (Liu para. [0078], fig. 3, feature reduced prosodic information is input to an accent identifier (accent classifier) originating from input speech 12, which identifies the accent and outputs an identified accent 316 (accent classification)), wherein the accent classification is defined according to at least one of a dialect, a geographic region, and an education level (Liu paras. [0078] and [0082], accent identifier identifies the accent of the input speech, where the identified accent includes Japanese, Malay, Korean and Vietnamese which are all classifications according to a geographic region where that language originates from (Japanese is from Japan, Malay is from Malaysia, Vietnamese is from Vietnam, etc.).
Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Goel teaches performing natural language understanding using the accent (Goel fig.1, col. 11, lines 41-47, semantic engine 104 derives the possible meanings of the words in an end user request (thus performs natural language understanding), where fig. 1 illustrates, and col. 17, lines 47 – 65, teach that the semantic engine 104 has transferred to it the accent from the input speech and the semantic engine extracts meaning from the words, and as well uses all of the information stored in the databases 114A and 114B to determine the meaning of the input speech signal, where Goel col. 5, lines 34-43 teaches that the data stored in the databases includes the accent of the end user, and Goel col. 15, lines 22-28 teaching that the accent is used by the dialog engine to match against the databases 114A and 114B give a better suited response by understanding (language understanding) that the user wants to connect to an agent that has a similar accent).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).
Further, taking the teachings of Bangalore and Goel together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include using accent data determined from speech in the semantic processing of input speech as disclosed in Goel at least because doing so would provide a better user experience to the end user (Goel col. 5, lines 43-46).
Regarding claim 20, Bangalore teaches wherein the natural language understanding uses the accent to determine a response to the speech input (Bangalore fig. 6, para. [0033], the enriched target language output facilitates the text to speech synthesis where the enriched target text is fed to a prosody enriched text-to-speech synthesis module that takes the input words and combines pitch accent and other prosody information to generate output words or phrases (determine a response) which are then converted to output speech).
While Bangalore teaches determining accent from prosody, Bangalore does not explicitly teach the accent classification.
Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).
Regarding claim 21, Bangalore teaches wherein using the accent to determine the response to the speech input includes determining the response according to a regional origin of a speaker of the speech input (Bangalore fig. 6, para. [0033], the enriched target language output facilitates the text to speech synthesis where the enriched target text is fed to a prosody enriched text-to-speech synthesis module that takes the input words and combines pitch accent and other prosody information to generate output words or phrases (determine a response) which are then converted to output speech, where para. [0030] teaches the pitch accent level indicates a regional accent to influence the translated output speech such that the output speech has the same New York accent as the input speech).
While Bangalore teaches determining accent from prosody, Bangalore does not explicitly teach the accent classification.
Liu further teaches “the accent classification,” (Liu para. [0078], identified accent 316 which is input to further language processing modules for further processing of the input speech).
Therefore, taking the teachings of Bangalore and Liu together as a whole, it would have been obvious to one of ordinary skill in the art to have modified the spoken language dialog system and operations of Bangalore to include identifying a particular accent from input speech as disclosed in Liu at least because doing so would avoid a significant drop in a speech recognition system due to a speech recognition being trained only for a training accent when a user may have an accent different from the training accent (Liu paras. [0015]-[0016] and [0018]-[0020]).
Regarding claim 22, Bangalore teaches wherein the determining of the response according to the regional origin of the speaker of the speech input includes determining the response according to a preference of users from said region (Bangalore fig. 6, para. [0033], the enriched target language output facilitates the text to speech synthesis where the enriched target text is fed to a prosody enriched text-to-speech synthesis module that takes the input words and combines pitch accent and other prosody information to generate output words or phrases (determine a response) which are then converted to output speech, where para. [0030] teaches the pitch accent level indicates a regional accent to influence the translated output speech such that the output speech has the same New York accent as the input speech, and where the system allows a user to establish preferences governing when, where and how much of the accent to apply during translation).
Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Bangalore in view of Liu in view of Goel, as set forth above regarding claim 20 from which claim 23 depends, further in view of Klein, (US 10,789,948 B1, herein “Klein”).
Regarding claim 23, Bangalore does not explicitly teach the limitations of claim 23.
Klein teaches wherein the natural language understanding further uses a geographic location where the speech input was spoken (Klein col. 4, lines 9-34, natural language processing identifies a location associated with a weather inquiry user intent, where the weather inquiry location is that of the environment the device the request is made on (thus where the speech input was made)).
Therefore, taking the teachings of Bangalore and Klein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the natural language processing of Bangalore to include considering the location where a user makes a weather inquiry to a device as disclosed in Klein at least because doing so would allow for outputting desired content to a user through automatic coordinated operation of computer devices in a user environment (see Klein col. 2, lines 19-55).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656