DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 29, 2021has been entered.

In response to Applicant’s claims filed on January 29, 2021, claims 51-69 are now pending for examination in the application.

This office action is in response to amendment filed 01/29/2021. This action also highlights what was discussed in the 01/14/2021 interview. In this action claim(s) 51-69 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shepherd et al. (US Patent No. 10056078) in view of Van Os et al. (US Pub. No. 20150382047).  The references have been added to address querying a database to identify a performer included in a cast of the video content item referenced in the search query, wherein the database indicates that a voice profile of the identified performer is available.

Remarks
Applicant-initiated interview held on 01/14/2021
Maxim Rapoport had a telephonic interview  with the Examiner regarding the rejections to the claims. The general thrust of the Applicant’s argument was that the proposed Lee/Stevans combination fails to disclose, teach, or suggest each and every limitation of independent Claim 1. The Examiner suggested that Applicant amend the claims to differentiate independent Claim 1 from the proposed Lee/Stevans combination. The Examiner agreed that the proposed amendments would advance the case, but indicated that an additional search and further consideration would be needed.

PE2E - Docket and Application Viewer 4.2.0.154
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 51-52, 54-55, 61-62, 64-65 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shepherd et al. (US Patent No. 10056078) in view of Van Os et al. (US Pub. No. 20150382047).

Regarding claim 51, Shepherd et al. teaches a method for responding to a search query with a contextually relevant voice output, the method comprising:



determining an answer to the search query (“answering queries spoken by the user 10,” See Column 3 Lines 4-8); 
determining that the search query includes a reference to a video content item (“determine the user's intent is to search when the text contains structured, specific information regarding a desired result. A search may refer to a user indication to obtain content on a particular item, such as a movie, song, television show, etc,” See Column 3 Lines 36-44); 
generating audio output using the voice profile of the performer identified as being included in the cast of the video content item referenced in the search query, the audio output including the reply to the user input (“speech synthesis using one or more different methods. In one method of synthesis called unit selection, described further below, the TTS module (1010/1110) matches the symbolic linguistic representation against a database of recorded speech, such as a database of a voice corpus,” See Column 22 Lines 38-53).  Shepard et al. does not disclose identify a performer included in a cast of the video content item.
However, Van Os et al. teaches querying a database to identify a performer included in a cast of the video content item referenced in the search query, wherein the database indicates that a voice profile of the identified performer is available (“context-based information can be provided automatically, such as identifying a playing song or soundtrack (e.g., "This song is Performance Piece"), identifying cast members of a currently playing episode (e.g., "Actress 
in response to (a) identifying the performer included in the cast of the video content item referenced in the search query, and (b) accessing the indication that the voice profile of the identified performer is available (“facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions,” See Paragraph 66).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Shepherd et al. (Output of content based on speech-based searching and browsing requests) with Van Os et al. (controlling television user interactions.  This would have improved answering a user’s question by providing a contextually relevant voice response.  See Van Os et al. Paragraphs 6-9.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: querying with speech output.  

The Shepherd et al. reference as modified by Van Os et al. teaches all the limitations of claim 51.  With respect to claim 52, Shepherd et al. teaches the method of claim 51, wherein generating the audio output using the voice profile of the performer comprises:

synthesizing, using the voice profile of the performer, audio matching a plurality of words included in the answer (“speech synthesis,” See Column Lines 40-47); and



The Shepherd et al. reference as modified by Van Os et al. teaches all the limitations of claim 52.  With respect to claim 54, Shepherd et al. teaches the method of claim 52, further comprising:

retrieving an audio recording of the performer's voice, 

wherein the audio recording includes at least a portion of the answer, and 

wherein generating the audio response including the synthesized audio comprises generating audio output including the synthesized audio and the audio recording (“speech synthesis using one or more different methods. In one method of synthesis called unit selection, described further below, the TTS module (1010/1110) matches the symbolic linguistic representation against a database of recorded speech, such as a database of a voice corpus,” See Column 22 Lines 38-53).

The Shepherd et al. reference as modified by Van Os et al. teaches all the limitations of claim 54.  With respect to claim 55, Shepherd et al. teaches the method of claim 54, further comprising:

wherein synthesizing audio matching a plurality of words included in the answer comprises synthesizing audio corresponding to the subset of the plurality of words included in the answer that is not included in the audio recording (“The TTS module (1010/1110) may perform speech synthesis using one or more different methods. In one method of synthesis called unit selection, described further below, the TTS module (1010/1110) matches the symbolic linguistic representation against a database of recorded speech, such as a database of a voice corpus. The TTS module (1010/1110) matches the symbolic linguistic representation against spoken audio units in the database. Matching units are selected and concatenated together to form a speech output. Each unit includes an audio waveform corresponding with a phonetic unit, such as a short .wav file of the specific sound, along with a description of the various acoustic features associated with the .wav file (such as its pitch, energy, etc.), as well as other information, such as where the phonetic unit appears in a word, sentence, or phrase, the neighboring phonetic units, etc. Using all the information in the unit database, the TTS module (1010/1110) may match units (for example in a unit database) to the input text to create a natural sounding waveform. The unit database may include multiple examples of phonetic units to provide the system 100 with many different options for concatenating units into speech. One benefit of unit selection is that, depending on the size of the database, a natural sounding speech output may be generated. As described above, the larger the unit database of the voice corpus, the more likely the system will be able to construct natural sounding speech,” See Column Lines 38-64).

With respect to claim 61, Shepard et al. teaches a system for responding to a search query with a contextually relevant voice output, the system comprising:

control circuitry (speech-controlled device, See Fig. 1) configured to: 

receive a search query (“answering queries spoken by the user 10,” See Column 3 Lines 4-8); 
	determine an answer to the search query (“answering queries spoken by the user 10,” See Column 3 Lines 4-8); 
determine that the search query includes a reference to a video content item (“determine the user's intent is to search when the text contains structured, specific information regarding a desired result. A search may refer to a user indication to obtain content on a particular item, such as a movie, song, television show, etc,” See Column 3 Lines 36-44); 
generating audio output using the voice profile of the performer identified as being included in the cast of the video content item referenced in the search query, the audio output including the reply to the user input (“speech synthesis using one or more different methods. In one method of synthesis called unit selection, described further below, the TTS module 
However, Van Os et al. teaches querying a database to identify a performer included in a cast of the video content item referenced in the search query, wherein the database indicates that a voice profile of the identified performer is available (“context-based information can be provided automatically, such as identifying a playing song or soundtrack (e.g., "This song is Performance Piece"), identifying cast members of a currently playing episode (e.g., "Actress Janet Quinn plays Genevieve"), identifying similar media (e.g., "Show Q is similar to this"), or providing results of any of the other queries discussed herein,” See Paragraph 209); and 
in response to (a) identifying the performer included in the cast of the video content item referenced in the search query, and (b) accessing the indication that the voice profile of the identified performer is available (“facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions,” See Paragraph 66).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Shepherd et al. (Output of content based on speech-based searching and browsing requests) with Van Os et al. (controlling television user interactions.  This would have improved answering a user’s question by providing a contextually relevant voice response.  See Van Os et al. Paragraphs 6-9.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: querying with speech output.  

With respect to claim 62, it is rejected on grounds corresponding to above rejected claim 52, because claim 62 is substantially equivalent to claim 52.

With respect to claim 64, it is rejected on grounds corresponding to above rejected claim 54, because claim 64 is substantially equivalent to claim 54.

With respect to claim 65, it is rejected on grounds corresponding to above rejected claim 55, because claim 65 is substantially equivalent to claim 55.

Claim(s) 53, 56, 63, and 66 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shepherd et al. (US Patent No. 10056078) in view of Van Os et al. (US Pub. No. 20150382047) in further view of Lyren (US Pub. No. 20140359439).

	The Shepherd et al. reference as modified by Van Os et al. teaches all the limitations of claim 52.  With respect to claim 53, Shepherd et al. as modified by Van Os et al. does not disclose determining, based on the voice profile of the personality, a characteristic of the personality's voice.

determining, based on the voice profile of the personality, a characteristic of the personality's voice (See Lyren Paragraph 60 “determine his personal preferences and style of verbal and nonverbal communication preferences. These aspects of the celebrity are combined to replicate a personality of the celebrity, and this personality is implemented into an intelligent personal assistant. This intelligent personal assistant is then sold to or provided to users (such as consumers, purchasers, or third parties) such that the intelligent personal assistant executes on personal electronic devices of the users to assist the users in executing tasks. This intelligent personal assistant has a voice, an appearance (such as being an anthropomorphic agent), and mannerisms that emulate the voice, appearance, and mannerisms of the celebrity”); 
retrieving, from a database, a plurality of audio templates, wherein each audio template of the plurality of audio templates corresponds to a respective word of the plurality of words (See Lyren Paragraphs 118-121 “verbally communicated (such as word choices, enunciation, writing style, tone, pauses, emphasis, and loudness)”); 
modifying, using the control circuitry, each audio template of the plurality of audio templates based on the characteristic of the personality's voice (See Lyren Paragraph 157 “Personality settings of the user agent during a first sequence of simulations result in a ninety percent (90%) accuracy score for the user. During the simulation, adjustments are made to a personality of the user agent in order to improve performance of the user (such as improving the speed and the 
generating, using the control circuitry, audio corresponding to each modified audio template (See Lyren Paragraph 62 “The preferences of the user agent are changed or adjusted to match the preferences of the user. Verbal and nonverbal communication preferences with different emotions of the user agent match the verbal and nonverbal communication preferences with different emotions of the user. This adjusting can occur in real-time while the user interacts with the user agent. Over time, a personality of the user agent more closely matches a personality of the user since verbal and nonverbal communication preferences of the user agent are continuously, continually, or periodically changed to match the verbal and nonverbal communication preferences of the user”). 
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Shepherd et al. (Output of content based on speech-based searching and browsing requests) and Van Os et al. (controlling television user interactions with Lyren (user agent with personality).  This would have improved answering a user’s question by providing a contextually relevant voice response.  See Lyren Paragraphs 21-23.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: dialog systems.  


	The Shepherd et al. reference as modified by Van Os et al. teaches all the limitations of claim 51.  With respect to claim 56, Shepherd et al. as modified by Van Os et al. does not disclose retrieving an audio recording corresponding to the voice profile, the audio recording including at least a portion of the answer.
	However, Lyren teaches the method of claim 1, wherein generating the audio output using the voice profile of the personality comprises: 
retrieving an audio recording corresponding to the voice profile, the audio recording including at least a portion of the answer (See Lyren Paragraph 121 “audio showing how the deceased person verbally communicated”); and 
generating an audio response including the audio recording (See Paragraph 120 “personality of the deceased person can be inferred, predicted, extracted, extrapolated, determined, generated, and/or obtained from the information”). 
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Shepherd et al. (Output of content based on speech-based searching and browsing requests) and Van Os et al. (controlling television user interactions with Lyren (user agent with personality).  This would have improved answering a user’s question by providing a contextually relevant voice response.  See Lyren 

With respect to claim 63, it is rejected on grounds corresponding to above rejected claim 53, because claim 63 is substantially equivalent to claim 53.

With respect to claim 66, it is rejected on grounds corresponding to above rejected claim 56, because claim 66 is substantially equivalent to claim 56.

Claim(s) 57-60 and 67-70 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shepherd et al. (US Patent No. 10056078) and Van Os et al. (US Pub. No. 20150382047) in further view of Stevans et al. (US Pub. No. 20180108343).

The Shepherd et al. reference as modified by Van Os et al. teaches all the limitations of claim 51.  With respect to claim 57, Shepherd et al. as modified by Van Os et al. does not disclose retrieving, from a database, a character associated with the media content reference.
However, Stevans et al. teaches the method of claim 1, wherein determining, based on the media content reference, a personality associated with the media content reference comprises: 
retrieving, from a database, a character associated with the media content reference (See Paragraph 45 “Dialog system 601 receives wake-up phrase and request audio and produces 
identifying a performer providing a voice of the character (See Paragraph 43 “For an assistant plugin for The Simpsons.TM. it might be logical to use the voice of Nancy Cartwright, the voice actor who plays the cartoon character, Bart Simpson, as long as the copyright holder would agree to license the rights”). 
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Shepherd et al. (Output of content based on speech-based searching and browsing requests) and Van Os et al. (controlling television user interactions with Stevans et al. (virtual assistant).  This would have improved answering a user’s question by providing a contextually relevant voice response.  .  See Stevans et al. Paragraphs 2-10.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: dialog systems.  

	The Shepherd et al. reference as modified by Van Os et al. and Stevans et al. teaches all the limitations of claim 57.  With respect to claim 58, Stevans teaches the method of claim 7, wherein identifying the voice profile of the personality comprises retrieving, from the database, 

The Shepherd et al. reference as modified by Van Os et al. teaches all the limitations of claim 51.  With respect to claim 59, Shepherd et al. as modified by Van Os et al. does not disclose identifying a phrase associated with the media content reference, wherein generating audio output using the voice profile of the personality comprises generating audio output including the phrase associated with the media content reference.
However, Lyren teaches the method of claim 51, further comprising: 
identifying a phrase associated with the media content reference, wherein generating audio output using the voice profile of the personality comprises generating audio output including the phrase associated with the media content reference (See Paragraph 65 “According to some embodiments, different wake-up phrases enable different sets of assistants, one of which is selected by the first word or phrase immediately following the wake-up phrase”).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Shepherd et al. (Output of content based on speech-based searching and browsing requests) and Van Os et al. (controlling television user interactions with Stevans et al. (virtual assistant).  This would have improved answering a user’s question by providing a contextually relevant voice response.  .  See Stevans 

	The Shepherd et al. reference as modified by Van Os et al. and Stevans et al. teaches all the limitations of claim 59.  With respect to claim 60, Stevans et al. teaches the method of claim 9, wherein identifying the phrase associated with the media content reference comprises: 
identifying a plurality of words included in the search query (See Paragraph 65 “For example, "Hi, Princess" is a wake-up phrase that enables a number of princess assistants but not any pharmacist or chef assistants. The immediately following phrase selects which, such that "Hi, Princess Aurora" would invoke a sleeping beauty and "Hi, Princess Leia" would invoke an agent of the Rebel Alliance”); 
identifying, based on the plurality of words, the media content reference (See Paragraph 65 “For example, "Hi, Princess" is a wake-up phrase that enables a number of princess assistants but not any pharmacist or chef assistants. The immediately following phrase selects which, such that "Hi, Princess Aurora" would invoke a sleeping beauty and "Hi, Princess Leia" would invoke an agent of the Rebel Alliance”); and 
retrieving, from a database, the phrase associated with the media content reference (See Paragraph 65 “For example, "Hi, Princess" is a wake-up phrase that enables a number of princess assistants but not any pharmacist or chef assistants. The immediately following phrase selects 

With respect to claim 67, it is rejected on grounds corresponding to above rejected claim 57, because claim 67 is substantially equivalent to claim 57.

With respect to claim 68, it is rejected on grounds corresponding to above rejected claim 58, because claim 68 is substantially equivalent to claim 58.

With respect to claim 69, it is rejected on grounds corresponding to above rejected claim 59, because claim 69 is substantially equivalent to claim 59.

With respect to claim 70, it is rejected on grounds corresponding to above rejected claim 60, because claim 70 is substantially equivalent to claim 60.

Response to Arguments
Applicant’s arguments with respect to claims 51-70 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection.

In response to applicants’ comments,” Without conceding propriety of the rejection and in a genuine effort to advance prosecution of the instant application, Applicant has amended 
in response to (a) identifying the performer included in the cast of the video content item referenced in the search query, and (b) accessing the indication that the voice profile of the identified performer is available.”  Examiner has added Shepherd et al. and Vas Os et al. to address the amendments to the claims.

Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US PG-PATENT No. 953879 is directed to a Systems for automated real-time vocal sports commentary with dynamically generated narrative content:   See Column 3 Lines 39-60 content generation engine 120 can be any combination of hardware and/or software (executing in hardware) configured to define narrative content based at least in part on information received from the statistics provider 105 and/or the content generation database 130. The content generation engine 120 can be, for example, a server device executing one or more software modules configured to combine information (e.g., statistics and/or other data received from the statistics provider 105 and/or the content generation database 130) with a narrative content template (e.g., a narrative content template selected from the content generation database 130) to define a narrative content portion (e.g., an article, report, summary, preview, bullet point, short-form text, etc.). The received information can be, for example, statistics information and/or data associated with a given event, occurrence, fact, person, place and/or thing (e.g., sports statistics information, weather history or forecast information, etc.). In addition to the narrative content template, the content generation engine 120 can also receive, from the content database 130, one or more phrases and/or phrase variations associated with the selected narrative content template. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS E ALLEN whose telephone number is (571)270-3562.  The examiner can normally be reached on Monday through Thursday 830-630.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on (571) 272-3978.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/NICHOLAS E ALLEN/Examiner, Art Unit 2154