Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see Remarks, filed 02/17/2021, with respect to the rejection(s) of claim(s) 1-4, 8, and 11-14 under 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Phillips and Lindahl.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 8, and 11-14  are rejected under 35 U.S.C. 103 as being unpatentable over Phipps et al. [EP Publication 3200185] in view of Lindahl et al. [US PG Pub 20110066438].

	With respect to Claim 1, Phipps discloses:
A method for generating synthesized speech of a voice assistant  having a contextually-adjusted audio output using a voice-enabled device (Figure 1, Virtual Assistant, 1002, In one embodiment, virtual assistant 1002 receives user input 2704 via any suitable input modality, including for example touchscreen input, keyboard input, spoken input, and/or any combination thereof. In one embodiment, assistant 1002 also receives context information 1000, which may include event context 2706 and/or any of several other types of context as described in more detail herein, [0070]), the method comprising: identifying media content characteristics associated with media content (Examples of context information that can be obtained from personal databases 1058 include, without limitation:…names of songs, genres, playlists, and other data associated with the user's music library that the user might refer to; [0089]); identifying base characteristics of audio output (Context can thus be used to constrain the solutions during various phases of processing, including for example and without limitation:…Dialog Generation - generating assistant responses as part of a conversation with the user about their task, for example, to paraphrase the user's intent with the response "OK, I'll call Rebecca on her mobile..." The level of verbosity and informal tone are choices that can be guided by contextual information, [0020]); generating contextually-adjusted characteristics of audio output based at least in part on the base characteristics and the media content characteristics (Upon processing user input 2704 and context information 1000 according to the techniques described herein, virtual assistant 1002 generates output 2708 for presentation to the user, [0071]); and using the contextually-adjusted audio output characteristics to generate the synthesized speech (Output 2708 can be generated according to any suitable output modality, which may be informed by context 1000 as well as other factors, if appropriate. Examples of output modalities include visual output as presented on a screen, auditory output (which may include spoken output and/or beeps and other sounds), haptic output (such as vibration), and/or any combination thereof, [0071]).  
Phillips, however, fails to disclose identifying media content currently being played by a media playback system; identifying media content characteristics associated with the media content currently being played.	Lindahl does teach identifying media content currently being played by a media playback system The method 114 may include receiving a media item at step 116; Fig. 6, item 116; Lindahl [0069]); identifying media content characteristics associated with the media content currently being played (the electronic device 10 or the host device 68 may analyze the media item. Such analysis of the media item may include analysis of primary audio material, metadata associated with the primary audio material or media item, or both. Analysis of the primary audio material may be achieved through various techniques, such as spectral analysis, cepstral analysis, or any other suitable analytic techniques; Fig. 6, item 122; Lindahl [0070]).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include identifying media content currently being played by a media playback system; identifying media content characteristics associated with the media content currently being played, as taught by Lindahl, in order to provide voice feedback describing a media content during playback of the media content [0006, Lindahl].

	With respect to Claim 2, Phipps discloses: 
	The method of claim 1, wherein the contextually-adjusted characteristics of audio output are further based on user-specific adjustments to the base characteristics of audio output (Figure 1, Application Preferences and Usage History, 1072, In one embodiment, information describing the user's preferences and settings for various applications, as well as his or her usage history 1072, are used as context for interpreting and/or operationalizing the user's intent or other functions of virtual assistant 1002. Examples of such preferences and history 1072 include, without limitation: shortcuts, favorites, bookmarks, friends lists, or any other collections of user data about people, companies, addresses, phone numbers, places, web sites, email messages, or any other references; recent calls made on the device; recent text message conversations, including the parties to the conversations; recent requests for maps or directions; recent web searches and URLs; stocks listed in a stock application; recent songs or video or other media played; the names of alarms set on alerting applications; the names of applications or other digital objects on the device; the user's preferred language or the language in use at the user's location, [0096]).  

	With respect to Claim 3, Phipps discloses:
	The method of claim 1, wherein using the contextually-adjusted audio output comprises receiving voice content (Figure 2, Elicit and Interpret Speech Input, 100 and Context, 1000; In step 500 a dialog response is generated. In step 700, the response is sent to the client device for output thereon. Client software on the device renders it on the screen (or other output device) of the client device, [0133]) and generating the synthesized speech to convey the voice content to the user according to the contextually-adjusted audio output (The input to the embodiments described herein also includes the context of the user interaction history, including dialog and request history.  As described in the related U.S. Utility Applications cross-referenced above, many different types of output data/information may be generated by virtual assistant 1002. These may include, but are not limited to, one or more of the following (or combinations thereof): …Speech output, [0065-0066]).  

	With respect to Claim 4, Phipps discloses: 
The method of claim 1, wherein identifying the media content characteristics comprises: analyzing audio of the media content to determine musical characteristics of the media content (Examples of context information that can be obtained from personal databases 1058 include, without limitation:…names of songs, genres, playlists, and other data associated with the user's music library that the user might refer to, [0089]); and analyzing media content metadata to determine metadata-based characteristics (Examples of context information that can be obtained from personal databases 1058 include, without limitation:… people, places, categories, tags, labels, or other symbolic names on photos or videos or other media in the user's media library, [0089]).

	With respect to Claim 8, Phipps discloses:
The method of claim 1, wherein the user-specific adjustments are based on the user's listening history (Figure 1, Application Preferences and Usage History, 1072, Any of the sources described in connection with Fig. 1 can provide context 1000 to the speech elicitation and interpretation method depicted in Fig. 3. For example :…Vocabulary from personal databases 1058 and application preferences and usage history 1072 can be used as context 1000. For example, the titles of media and names of artists can be used to tune language models 1029, [0150]).
  
	With respect to Claim 11, Phipps discloses:
The performed by the processing device of claim 11 is similar to the method of claim 1, and thus, is rejected under similar rationale. Also, Phipps teaches the additional hardware elements featured in claim 11:
A voice assistant system (Figure 27, 1002, Virtual Assistant);	At least one processing device (Figure 27, 2790, Output Processor, 2780, Dialog Flow Processor);
At least one computer readable storage device (Figure 27, 2754, Long Term Personal Memory, 2752, Short Term Personal Memory).

	With respect to Claim 12, Phipps discloses:
	The voice assistant system of claim 11, further comprising a voice-enabled device (Figure 27, 1002, Virtual Assistant) configured for interaction with a user via voice (Figure 27, 2704, User Input, In one embodiment, virtual assistant 1002 receives user input 2704 via any suitable input modality, including for example touchscreen input, keyboard input, spoken input, and/or any combination thereof, [0070]), wherein the voice-enabled device comprises the at least one processing device (Figure 27, 2790, Output Processor, 2780, Dialog Flow Processor) and the at least one computer readable storage device (Figure 27, 2754, Long Term Personal Memory, 2752, Short Term Personal Memory).  

	With respect to Claim 13, Phipps discloses:
	The voice assistant system of claim 11, further comprising a media delivery system comprising at least one server computing device (Figure 29, 60, computing device) comprising the at least one processing device (Figure 29, 63, Processor(s)) at the at least one computer readable storage device (Figure 29, 1208, Storage Device, 1210, Memory).  

With respect to Claim 14, Phipps discloses: 
The voice assistant system of claim 11, wherein the base characteristics of audio output are user-specific characteristics of audio output generated based at least in part on a listening history of a user and brand characteristics of audio output (Figure 1, Application Preferences and Usage History, 1072, In one embodiment, information describing the user's preferences and settings for various applications, as well as his or her usage history 1072, are used as context for interpreting and/or operationalizing the user's intent or other functions of virtual assistant 1002. Examples of such preferences and history 1072 include, without limitation: shortcuts, favorites, bookmarks, friends lists, or any other collections of user data about people, companies, addresses, phone numbers, places, web sites, email messages, or any other references; recent calls made on the device; recent text message conversations, including the parties to the conversations; recent requests for maps or directions; recent web searches and URLs; stocks listed in a stock application; recent songs or video or other media played; the names of alarms set on alerting applications; the names of applications or other digital objects on the device; the user's preferred language or the language in use at the user's location, [0096]).
Claims 5-7, 9-10, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Phipps et al. [EP Publication 3200185] in view of Lindahl et al. [US PG Pub 20110066438] and further in view of Martinez et al. [U.S. Patent Publication 2010/0049702].

With respect to Claim 5:
Phipps discloses the contextually-adjusted audio output and the method of claim 4. 
Phipps fails to specifically disclose: The method of claim 4, wherein generating a contextually-adjusted audio output is based at least in part upon the musical characteristics of the media content.  
	Within the same field of contextualizing messages, Martinez discloses: The method of claim 4, wherein generating a contextually-adjusted audio output is based at least in part upon the musical characteristics of the media content (In one embodiment, an enhanced message can contain an advertisement with enhanced content criteria relating to the advertisement. Thus, an advertisement may supplement basic ad content with media tailored for a specific user. For example, an advertisement for a sports car may be associated with a context specifying the user’s favorite musical artist and songs with a fast tempo or explicit references to speed, or the year 1975 when the user last owned a sports car, [0152]).
	It would have been obvious to one of ordinary skill, in the art before effective filing date, to combine the voice assistant with contextually adjusted audio output, from Phipps, with the teachings of Martinez where the output is generated based at least in part upon the musical characteristics into the contextually adjusted commands in order to further “deliver useful services and information to end users, and provide commercial opportunities to advertisers and retailers,” (0003, Martinez).

Claim 6:
	Phipps discloses: the contextually-adjusted audio output. 
	Phipps fails to specifically disclose: The method of claim 5, wherein generating the contextually-adjusted audio output comprises generating mood-related attributes that are compatible with the musical characteristics of the media content.  
	Within the same field of contextualizing messages, Martinez discloses: The method of claim 5, wherein generating the contextually-adjusted audio output comprises generating mood-related attributes that are compatible with the musical characteristics of the media content (The media retrieval module could attempt to determine the sender's current mood by scanning text in their recent emails and text messages. The media retrieval module could then select a song from the sender's favorite songs (e.g. in the user's profile, or most frequent historical play's) that has associations suggesting it is responsive to that mood…In another example, suppose a message sender wishes to send a message incorporating text matching of lyrics of songs in order to express an emotional connection between the sender and the recipient. For example, a sender knows that the recipient loves butterflies. The message sender could create a message for immediate delivery with content criteria specifying a romantic song with lyrics containing the word "butterfly." The media retrieval module could then search, for example, songs whose lyrics includes "butterfly" or "butterflies" and having metadata indicating a romantic or sentimental song and then rank them personally for this specific user based upon their user profile and past consumption data [0161, 0163]).  
	It would have been obvious to one of ordinary skill, in the art before effective filing date, to combine the voice assistant with contextually adjusted audio output, from Phipps, with the teachings of Martinez where the output generated comprises mood-related attributes that are compatible with the musical characteristics of the media content in order to further “deliver useful services and information to end users, and provide commercial opportunities to advertisers and retailers,” (0003, Martinez).

	With respect to Claim 7:
	Phipps discloses: the contextually-adjusted audio output. 
	Phipps fails to specifically disclose: The method of claim 5, wherein generating the contextually-adjusted audio output comprises generating mood-related attributes that are compatible with metadata-based characteristics of the media content.  
Within the same field of contextualizing messages, Martinez discloses: The method of claim 5, wherein generating the contextually-adjusted audio output comprises generating mood-related attributes that are compatible with metadata-based characteristics of the media content (The media retrieval module could attempt to determine the sender's current mood by scanning text in their recent emails and text messages. The media retrieval module could then select a song from the sender's favorite songs (e.g. in the user's profile, or most frequent historical play's) that has associations suggesting it is responsive to that mood…In another example, suppose a message sender wishes to send a message incorporating text matching of lyrics of songs in order to express an emotional connection between the sender and the recipient. For example, a sender knows that the recipient loves butterflies. The message sender could create a message for immediate delivery with content criteria specifying a romantic song with lyrics containing the word "butterfly." The media retrieval module could then search, for example, songs whose lyrics includes "butterfly" or "butterflies" and having metadata indicating a romantic or sentimental song and then rank them personally for this specific user based upon their user profile and past consumption data [0161, 0163]).  
It would have been obvious to one of ordinary skill, in the art before effective filing date, to combine the voice assistant with contextually adjusted audio output, from Phipps, with the teachings of Martinez where the output generated comprises mood-related attributes that are compatible with the metadata-based characteristics of the media content in order to further “deliver useful services and information to end users, and provide commercial opportunities to advertisers and retailers,” (0003, Martinez).

	With respect to Claim 9:
	Phipps discloses: The method of claim 1, wherein using the contextually-adjusted audio output to generate synthesize speech further comprises: selecting words to be spoken by the voice assistant using a natural language generator based upon language adjustments associated with the contextually-adjusted audio output characteristics (Context can thus be used to constrain the solutions during various phases of processing, including for example and without limitation:… Natural Language Processing (NLP) - parsing text and associating the words with syntactic and semantic roles, for example, determining that the user input is about making a phone call to a person referred to by the pronoun "her", and finding a specific data representation for this person. For example, the context of a text messaging application can help constrain the interpretation of "her" to mean "the person with whom I am conversing in text."…Dialog Generation - generating assistant responses as part of a conversation with the user about their task, for example, to paraphrase the user's intent with the response "OK, I'll call Rebecca on her mobile ... " The level of verbosity and informal tone are choices that can be guided by contextual information, [0020]); and determining a pronunciation (Examples of context information that can be obtained from personal databases 1058 include, without limitation:… the user's own names, preferred pronunciations, addresses, phone numbers, and the like, [0114])…for speaking the words based upon speech adjustments associated with the contextually-adjusted audio output characteristics.
	Phipps fails to specifically disclose: The method of claim 1, wherein using the contextually-adjusted audio output to generate synthesize speech further comprises: …determining…an emotion for speaking the words based upon speech adjustments associated with the contextually-adjusted audio output characteristics.  
	Within the same field of contextualizing messages, Martinez discloses: The method of claim 1, wherein using the contextually-adjusted audio output to generate synthesize speech further comprises: …determining…an emotion for speaking the words based upon speech adjustments associated with the contextually-adjusted audio output characteristics (A sender may wish to emote using context enhanced messaging. The message sender could create a message for immediate delivery that specifies content criteria that selects a song reflecting the sender's current mood. The media retrieval module could attempt to determine the sender's current mood by scanning text in their recent emails and text messages. The media retrieval module could then select a song from the sender's favorite songs (e.g. in the user's profile, or most frequent historical play's) that has associations suggesting it is responsive to that mood. Such a context could define a push or a pull operation. The sender may wish to express his mood to his fiancé with a media enhanced message the system sends on his behalf as described above, or, alternatively, the sender's fiancé may wish to poll his mood. For example, the sender's fiancé could send herself a message that contains content criteria that specifies her fiancé’s mood. For example, if he misses her, the system could respond with a song that expresses that emotion in both of them [0161-162]).  
It would have been obvious to one of ordinary skill, in the art before effective filing date, to combine the voice assistant with contextually adjusted audio output where the output generated comprised: selecting words to be spoken by the voice assistant using a natural language generator based upon language adjustments associated with the contextually-adjusted audio output characteristics; and determining a pronunciation, from Phipps, with the teachings of Martinez which also included determining an emotion for speaking the words in order to further “deliver useful services and information to end users, and provide commercial opportunities to advertisers and retailers,” (0003, Martinez).

Claim 10:
Phipps discloses: the contextually-adjusted audio output and the method of claim 1. 
Phipps fails to specifically disclose: The method of claim 1, further comprising generating a mood associated with the contextually-adjusted audio output, the mood comprising: the contextually-adjusted audio output; one or more audio cues; and one or more visual representations.  
Within the same field of contextualizing messages, Martinez discloses: The method of claim 1, further comprising generating a mood associated with the contextually-adjusted audio output, the mood comprising: the contextually-adjusted audio output; one or more audio cues; and one or more visual representations (In one embodiment, a context-enhanced message comprises four elements: a recipient, a message body, delivery criteria, and content criteria…The message body may include an audio file containing, for example, a voice message. The message body may include an image file containing, for example, a picture of the sender, or a video message from the user…Delivery criteria are the conditions under which the message is to be delivered to the recipients...Such criteria may also utilize "What" or topical criteria, such as, for example, when the recipient's mood as judged, for example, by the content of recent messages sent by the recipient, appears to be sad. Content criteria describe the media files that are to be included with the message…Such criteria may include social criteria, for example, different media files are included in the message depending on the recipient's favorite music. Such criteria may include topical criteria, for example, different media files are included in the message depending on the recipient's mood, [0101-0104]).  
It would have been obvious to one of ordinary skill, in the art before effective filing date, to combine the voice assistant with contextually adjusted audio output, from Phipps, with the teachings of Martinez where the output comprises generating a mood comprising: the contextually-adjusted audio output; one or more audio cues; and one or more visual representations in order to further “deliver useful services and information to end users, and provide commercial opportunities to advertisers and retailers,” (0003, Martinez).

With respect to Claim 16:
The performed by the processing device of claim 16 is similar to the method of claim 9, and thus, is rejected under similar rationale. Also, Phipps teaches the additional hardware elements featured in claim 16:
Audio output adjuster: (Figure 27, 2708, Output to User, Upon processing user input 2704 and context information 1000 according to the techniques described herein, virtual assistant 1002 generates output 2708 for presentation to the user, [0096])
At least one processing device: (Figure 27, 2790, Output Processor, 2780, Dialog Flow Processor).

	Claim 15 contains subject matter respectively similar to claims 4-7, and thus, is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139.  The examiner can normally be reached on Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/RODRIGO A CHAVEZ/Examiner, Art Unit 2658                                                                                                                                                                                                        
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658