DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings are objected to because there are duplicate reference numerals for the same elements of media device 201 in Figure 2.  Specifically, media device 201 includes a user input interface that has two reference numerals 210 and 220, a display that has two reference numerals 212 and 222, and a speaker that has two reference numerals 214 and 218.  It appears that reference numerals 210, 212, and 214 should be reserved for media device 200 at the left hand side of Figure 2, and reference numerals 220, 222, and 218 should be the only reference numerals assigned to the corresponding elements of media device 201 on the right hand side of Figure 2.    
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office Action to avoid abandonment of the application.  Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended.  The figure or figure number of an amended drawing should not be labeled as “amended.”  If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency.  Additional replacement sheets may be necessary to show the renumbering of the remaining figures.  Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d).  If the changes are not accepted by the examiner, Applicants will be notified and informed of any required corrective action in the next Office Action.  The objection to the drawings will not be held in abeyance.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Translating a Media Asset with Vocal Characteristics of a Speaker.
The disclosure is objected to because of the following informalities:
In ¶[0000], Applicants have provided a preliminary amendment that should update a continuity status of Application Serial No. 16/152,017 as “now U.S. Patent No. 11,195,507 issued on 07 December 2021”.  Additionally, it does not appear conventional to add a paragraph ¶[0000], as paragraph numbering generally begins with ¶[0001].  
In ¶[0008], “people in Boston have a different from” should be “people from Boston have different non-linguistic characteristics from”.
In ¶[0021], a reference numeral for scenario 100 is not illustrated in Figure 1.
In ¶[0023], “is 104 is” should be “104 is”.
In ¶[0040], “communication path 304” should be “communication network 304”.
In ¶[0046], “As the ‘Oscars’ are broadcast in English . . .” does not appear to be a complete grammatical sentence, but could be rewritten to delete “and”.
In ¶[0047], “media device 202” should be “media device 302”.  See Figure 3.
In ¶[0047], “at in advance” should be “in advance”.
In ¶[0050], “these characteristics combined” should be “these characteristics are combined”.
In ¶[0050], “helps us identify” should be “helps identify”.
In ¶[0064], “is verb” should be “is a verb” and “is object” should be “is an object”.
In ¶[0064], “quiero dedicar” should include quotations marks as “‘quiero dedicar’”.
In ¶[0064], “of the form spoken words” should be “in the form of spoken words”.
In ¶[0069], there is no reference numeral for process 800 in Figure 8.
In ¶[0073], “D6” should be “C6”.  
In ¶[0073], “a plurality of linguistic characteristics” appears that it should be “a plurality of non-linguistic characteristics” because only non-linguistic characteristics are being described here.
In ¶[0084], “speeches” should be “and speeches”.
In ¶[0086], “+fingerprint” should be “fingerprint”.
In ¶[0089], “generates, the retrieved video with the synthesized speech for display are performed” should be “generates the retrieved video with the synthesized speech for display”. 
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 28 to 48 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventors, at the time the application was filed, had possession of the claimed invention.
Independent claims 28, 38, and 48 set forth a limitation of “receiving a request to translate the media asset”, which does not actually appear to be described by the originally-filed Specification.  Applicants’ Specification, ¶[0003] - ¶[0004], states that a system receives media that is requested by a user and that a user requests to watch a broadcast.  Similarly, ¶[0046] of the Specification states that a user requests to access a media asset.  However, there is nothing that supports a claim limitation of “receiving a request to translate the media asset”.  A request to access a media asset is not the same as a request to translate a media asset, and a limitation drawn to the latter appears to be new matter.  Applicants should specifically point out where this limitation is supported in originally-filed Specification, or should cancel this limitation from the claim language.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 28, 38, and 48 are rejected under 35 U.S.C. 103 as being unpatentable over Proidl et al. (WO 2006/129247) in view of Lin et al. (U.S. Patent Publication 2020/0058288).
(Note: Lin et al. has an effective priority date of 16 August 2018 based on an application filed in Taiwan, which is before Applicants’ effective filing date of 04 October 2018.)
Concerning independent claims 28, 38, and 48, Proidl et al. discloses a method,  device, and computer readable medium for performing automatic dubbing of a multimedia signal, comprising:
“accessing a media asset, the media asset featuring a speaker” – a multimedia signal comprises information relating to video and speech; a multimedia signal is received from a receiver (“accessing a media asset “) (Abstract); a movie is dubbed whereby the voices of the actors in the dubbed version are similar to or the same as in the original version, e.g., George Clooney’s voice in English will be similar to George Clooney’s voice in German (Page 5, Lines 16 to 20: Figure 1); a user 106 may be watching a movie in real time; alternatively, a user might be interested in dubbing a movie and watch it at a later time (Page 7, Lines 3 to 5); a receiver 208 receives a multimedia signal 201 (Page 7, Lines 7 to 10: Figure 2); here, multimedia includes speech of a speaker, i.e., George Clooney is an actor who is “a speaker”; 
“receiving a request to translate the media asset” – a user 106 is watching a movie on a television set 104 from a DVD player 101, and wants to see the movie dubbed in another language; by appropriate selection, e.g., on a remote controller, user 106 makes a selection of playing the movie as dubbed (Page 5, Lines 10 to 16: Figure 1); a selection to play a dubbed version of a movie is “a request to translate the media asset”);
“extracting a voice sample [from the another media asset] featuring the speaker” – from speech 103, characteristic voice parameters are extracted from an actor’s voice using a voice analyzer (Page 5, Lines 25 to 26: Figure 1); a processor 206 extracts speech and textual information from the multimedia signal (Page 7, Lines 11 to 13); speech and textual information is extracted 402 which results in speech and textual information (Page 8, Lines 13 to 15: Figure 4); here, speech of an actor extracted from a multimedia signal is equivalent to “extracting a voice sample . . . featuring the speaker”; that is, speech of an actor serves as a ‘sample’ of an actor’s voice used to control a speech synthesizer when reproducing the speech according to voice parameters;
“calculating vocal characteristics based on the voice sample [from the another media asset] featuring the speaker” – voice parameters (“vocal characteristics”) are then used to control parameters for controlling the speech synthesizer when reproducing the speech created, in this case to control the German speech so that the actors appear to be speaking in German; reproduced speech is inserted into a new multimedia signal 108 (Page 5, Line 30 to Page 6, Line 3: Figure 1); a voice analyzer 203 processes voice parameters from the speech (Page 7, Lines 13 to 14: Figure 2); speech comprised in an A/V signal 301 is analyzed 304 and based thereon one or more voice parameters are obtained (Page 7, Lines 27 to 29: Figure 3); speech is analyzed 403 resulting in at least one voice characteristic parameter; these parameters include pitch, melody, duration, phoneme reproduction speed, loudness, and timbre (“calculating vocal characteristics based on the voice sample . . . featuring the speaker”)  (Page 8, Lines 15 to 17: Figure 4);
“generating a translation of the media asset using the calculated vocal characteristic” – a movie is dubbed whereby the voices of the actors in the dubbed version are similar to or the same as in the original version, e.g., George Clooney’s voice in English will be similar to George Clooney’s voice in German (Page 5, Lines 16 to 20: Figure 1); processor 206 uses voice parameters for controlling speech synthesizer 204 in a way that the output speech 207 preserves the original voice of the actor, although the language of the speech has changed (Page 7, Lines 16 to 19: Figure 2); voice characteristic parameters are used for reproducing 405 the new speech so that the voice of the new speech is similar to the voice of the original speech, although the speech is of a different language; in that way, actor will appear to be able to speak different languages fluently, although he/she is not capable of doing do; reproduced new speech is inserted 406 together with video information into the new multimedia signals and played to the user (Page 8, Lines 2o 25: Figure 4); dubbing speech into a different language is equivalent to “generating a translation”. 

Concerning independent claims 28, 38, and 48, Proidl et al. discloses all of the limitations of these independent claims with the exception of “searching for another media asset featuring the speaker”, and then extracting and calculating vocal characteristics from “from the another media asset” featuring the speaker.  Generally, Proidl et al. only provides a current media asset of an actor who is the speaker to extract and calculate vocal characteristics, but does not search for another media asset to perform this extraction and calculation.  Applicants’ Specification, ¶[0084], describes this embodiment as including a control circuitry that may search for voice samples of an actor over the Internet, e.g., voice samples may be extracted from movies, interviews, and speeches that feature ‘Tom Hanks’.  This appears to be the only support for this embodiment in the Specification.  Now one skilled in the art could understand that it might be sufficient to use only a current media asset to determine a voice profile for an actor, but that by using a plurality of media assets for that actor that were recorded in the past, one might obtain a better model of a voice profile for an actor because there is then more data on which to premise the model.
   Concerning independent claims 28, 38, and 48, Lin et al. teaches a method, system, and computer-readable recording medium that obtains real human voice signals, and transforms original synthetic voice signals into timbre-specific human voice signals with a timbre transformation model.  The timbre transformation model is trained with real human voice signals collected from a specific person, and then the processing apparatus plays the transformed human voice.  (Abstract)  Specifically, Lin et al. teaches an embodiment of obtaining the real human voice signal 1511 (which may be extracted from a speech, a conversation, a concert, etc.) from captured network packets, data uploaded by the user, or data stored in an external or internal storage media (e.g., a flash drive, a disc, and an external hard drive) through the voice input apparatus 110.  The user can input a favorite singer’s voice through the user interface, and the voice input apparatus 110 searches on the Internet and obtains a speech or a song of the said singer (“searching for another media asset featuring the speaker”).  Alternatively, a user interface presents the photo or name of some radio hosts for the elder's selection, and the voice input apparatus 110 records said radio host's voice from the online radio on the Internet.  (¶[0039]: Figure 2: Step S210)  Then processing apparatus 170 obtains acoustic features from the real human voice signal 1511.  (¶[0040]: Figure 2: Step S220)  Processing apparatus 170 may train the timbre transformation model with the acoustic features of the real human voice and the acoustic features of the synthetic human voice. Processing apparatus 170 may take the acoustic features of the real human voice and the acoustic features of the synthetic human voice as training samples, and takes the synthetic human voice signal 1512 as a source sound and the real human voice signal 1511 as a target sound for training models.  (¶[0043]: Figure 2: Step S260)  Subsequently, processing apparatus 170 then sends the original synthetic human voice signal to the trained timbre transformation model to transform the original synthetic human voice signal to a synthetic human voice signal 1512 of a specific timbre, and may play said synthetic human voice signal 1512 processed with the timbre transformation to the speaker 130.  (¶[0047]: Figure 2: Steps S290 to S295)  Lin et al., then, teaches the limitations of “searching for another media asset featuring the speaker” and extracting and calculating vocal characteristics for “the another media asset”.  An objective is to save a voice of a specific person telling a story and playing back the saved voice to provide a friendly operation interface for the user to select the voice timbre of a specific person that a user intends to listen to.  (¶[0004])  It would have been obvious to one having ordinary skill in the art to search for another media asset that features a same speaker as taught by Lin et al. to extract and calculate vocal characteristics of an actor to generate a translation of the media asset in Proidl et al. for a purpose of providing a friendly operation interface for a user to select a voice timbre of a specific person that a user intends to listen to.

Allowable Subject Matter
Claims 29 to 37 and 39 to 47 appear to be allowable if rewritten to overcome the rejection under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), 1st ¶, set forth in this Office Action and to include all of the limitations of the base claim and any intervening claims.
Generally, the prior art of record does not appear to disclose or reasonably suggest an entire combination that includes the features of independent claims 28 and 38, and additionally comprises the limitations of determining an emotional state expressed by a first plurality of spoken words based on a set of characteristics, identifying an identifier of a speaker based on metadata associated with the media asset, converting a first plurality of spoken words into a second plurality of spoken words, and generating a translation by converting audio with a second set of characteristics associated with a determined emotional state based on a second plurality of spoken words and characteristics of the speaker that produces the first plurality of words as set forth by dependent claims 29 and 39.  However, allowability is subject to reconsideration upon amendments to overcome the rejection for new matter under 35 U.S.C. §112(a).

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Kumar et al. is Applicants’ parent patent.
Rivers, Lindblom et al., and Doggett disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272- 5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        November 22, 2022