DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/02/2021 has been entered.
This communication is in response to the Amendments and Arguments filed on   06/02/2021. 
Claims 1-20 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner. 
	Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/04/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Response to Arguments
Applicant's arguments filed 06/02/2021 have been fully considered but they are not persuasive. Applicant did not make any further comments with respect to the .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6, and 9-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu (U.S. PG Pub No. 2020/0265829), hereinafter Liu, in view of Brown et al (US PG Pub No. 2011/0064388), hereinafter Brown, in view of Agapi et al. (U.S. PG Pub No. 2006/0287860), hereinafter Agapi, in view of Dubinsky (U.S. Provisional App. No. 62/814419, as found through US PG Pub No. 2020/0211565), hereinafter Dubinsky, and further in view of Rossano et al. (US PG Pub No. 2016/0021334), hereinafter Rossano.

An apparatus, comprising ([0005:1-6] a computer system):
at least one computer memory that is not a transitory signal and that comprises instructions executable by at least one processor to ([0005:1-6] a computer-readable storage media, i.e. memory, containing program instructions for execution by a coupled processor):
access an artificial intelligence model trained to mimic the voice…([0089] an artificial intelligence model is trained to synthesize an audio of a recommended voice, i.e. mimic the voice);
access…text associated with a piece of audio visual (AV) content ([0014:5-7], [17:17-22], [0018:1-5] the voiced content, such as a video, i.e. piece of audio visual content, may have the audio data transcribed into text for further processing by a language synthesizing program, i.e. access…text associated); and
use the artificial intelligence model and the…text to insert audio mimicking the voice…into the piece of AV content, the audio comprising an audible representation of the…text ([0017], [0019], [0089] the artificial intelligence model is used by the language synthesis program to synthesize text from transcribed audio data, i.e. use the artificial intelligence model and the text, into an artificially generated voice, i.e. the audio comprising an audible representation of the text, where the artificially synthesized voice replaces one or more speakers in the original voiced content, i.e. insert audio…into the piece of AV content), wherein the instructions are further executable to:

present on at least one display at least one user interface (UI) comprising:
a first option selectable to enable insertion of the audio mimicking the voice of the child into the piece of AV content;
a first sub-option selectable to insert audio in the voice of a child on the fly as the piece of AV content is streamed;
a second sub-option selectable to insert audio in the voice of a child into the piece of AV content before the piece of AV content is presented;
a second option selectable to match physical attributes of a child to a given AV content character;
a third option selectable to initiate a configuration process for training the artificial intelligence model to the voice of a child; and
 at least one fourth option selectable to select a respective child's voice in which to present audio for a given character within the piece of AV content. 
Brown, however, teaches present on at least one display at least one user interface (UI) comprising ([0096] the user interface, i.e. user interface, is provided on a computer screen, i.e. presenting on at least one display):
a first option selectable to enable insertion of the audio mimicking the voice … into the piece of AV content ([0089-90], [0096] the user can select, i.e. a first option selectable, assets to change in the animation, i.e. AV content, through an interface, where changes can include sound files, where the user can create new ;
a second option selectable to match physical attributes of a child to a given AV content character ([0064-5] a menu allows a user to enter customizing information, i.e. second option selectable, such as choosing a character’s features, i.e. given AV content character, to resemble a real person, i.e. match physical attributes, such as having a child play the role of a lead character);
a third option selectable to initiate a configuration process for training the artificial intelligence model to the voice … ([0082], [0090] the voice changing algorithm, i.e. artificial intelligence model, can be given a sample of the user’s speech, i.e. the voice, that the algorithm can use to modify audio content to match the voice of the user, i.e. initiate a configuration process for training, and where the user can add an asset, such as audio, by way of a user interface, i.e. second option selectable); and
Where Liu previously teaches that the training to mimic a voice is training a neural network using input speech (see [0089]).
at least one fourth option selectable to select a respective … voice in which to present audio for a given character within the piece of AV content ([0080-1], [0089-90], [0096] the user can select, i.e. fourth option selectable, assets to change in the animation, i.e. AV content, through an interface, where changes can include sound files, where the user can create new customized audio files where voice is modified to match particular voice characteristics, i.e. select a respective…voice in which to present audio, and the new audio content can be used to partially populate the .
Liu and Brown are analogous art because they are from a similar field of endeavor in enabling customization of audiovisual content for users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of a recommended voice to synthesize text and replace one or more speakers in the original voiced content teachings of Liu with the use of an interface that allows the user to customize the audio voice file as taught by Brown. The motivation to do so would have been to achieve a predictable result of enabling user generated customized audio content (Brown [0089]).
While Liu in view of Brown provides an interface that allows the user to customize the audio voice file, and that customization can be performed on the fly during streaming or composited before being sent to a local device, Liu in view of Brown does not specifically teach the user making a choice as to when the customization is performed, and thus does not teach
a first sub-option selectable to insert audio in the voice of a child on the fly as the piece of AV content is streamed;
a second sub-option selectable to insert audio in the voice of a child into the piece of AV content before the piece of AV content is presented;
Agapi, however, teaches a first sub-option selectable to insert audio in the voice … on the fly as the piece of AV content is streamed ([0031] a printer properties interface permitting TTS settings to be adjusted, such as modifying the ;
a second sub-option selectable to insert audio in the voice … into the piece of AV content before the piece of AV content is presented ([0031] a printer properties interface permitting TTS settings to be adjusted, such as modifying the gender and pitch, i.e. in the voice, can also permit a user to select an output type, i.e. a second sub-option selectable to insert audio, such as outputting generated speech to a file, i.e. into the piece of AV content before the piece of AV content is presented).
Liu, Brown, and Agapi are analogous art because they are from a similar field of endeavor in enabling customization of audio content for users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the interface that allows the user to customize the audio voice file, and that customization can be performed on the fly during streaming or composited before being sent to a local device teachings of Liu, as modified by Brown, with an interface option to allow a user to choose direct output to a speaker or output to a file as taught by Agapi. The motivation to do so would have been to achieve a predictable result of enabling the user to have precise control over TTS output properties, including file format (Agapi [0031]).
While Liu in view of Brown and Agapi provides the ability to mimic the voice characteristics of specific individuals or actors, Liu does not specifically teach that the voice of a child is one that can be mimicked, and thus does not teach
voice of a child.
voice of a child ([0010:7-14], [0017:11-15], as also found in the provisional app at (p.2, l.18-23), (p.3, l.11-14) the system recognizes characteristics of the original speakers, including children, and creates voice dubbings that match the original speaker’s voice as closely as possible).
Liu, Brown, Agapi, and Dubinsky are analogous art because they are from a similar field of endeavor in realistic speech synthesis for voiced content applications. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the speech synthesis using age as an attribute teachings of Liu, as modified by Brown and Agapi, with the specific recognition and synthesis of a child’s voice as taught by Dubinsky. The motivation to do so would have been to achieve a predictable result of ensuring the voice dubbing is closely matched to the original speaker’s voice (Dubinsky (p. 3, l.11-14)).
While Liu in view of Brown, Agapi, and Dubinsky provides a transcribed text for synthesis, Liu in view of Brown, Agapi, and Dubinsky does not specifically teach that the text can take the form of closed captioning, and thus does not teach
closed captioning (CC) text.
Rossano, however, teaches closed captioning (CC) text ([0060:1-3], [0062] the video may have closed subtitles or closed captions associated with the video for further processing).
Liu, Brown, Agapi, Dubinsky, and Rossano are analogous art because they are from a similar field of endeavor in realistic speech synthesis for voiced content applications. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the transcribed text for 

	Regarding claim 2, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 1, and Brown further teaches
the piece of AV content is an AV cartoon ([0013] the system is able to generate cartoonized representations of a subject and incorporate the cartoonized representations into an animated video, such as an animated story, i.e. AV content is an AV cartoon).  
And, where the rationale to combine is the same as previously presented.

Regarding claim 3, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 1, and Liu further teaches
wherein the artificial intelligence model comprises a deep neural network (DNN) trained to mimic the voice of the child, the DNN trained based on recorded speech of the child and text corresponding to the recorded speech ([0089], [0101:8-14], [0102] an artificial intelligence model uses a deep neural network, trained with voice samples that are recordings of real speech, i.e. trained based on recorded speech, in combination with phonetic transcriptions of the voiced content which are .  
Dubinsky has already taught the voice to be mimicked is that of a child, as shown in claim 1.

Regarding claim 4, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 1, and Rossano further teaches
receive the AV content from a content provider ([0039-40] video data can be streamed to a local apparatus, i.e. receive the AV content, from a site such a YouTube, i.e. content provider); and
insert, locally at the apparatus, the audio into the piece of AV content ([0039-40], [0095] the mixer, which can be on a local apparatus, merges the dubbed segments with the original video, i.e. insert…the audio into the piece of AV content).  
And, where the rationale to combine is the same as previously presented.

Regarding claim 6, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 4, and Liu further teaches
receive the AV content from the content provider with no audio segments of the AV content being left vacant ([0017] processors parse an original version of voiced content, i.e. AV content…with no audio segments of the AV content being left vacant, that were maintained by a content provider); and
transmit, to another device, the piece of AV content with the audio replacing at least a first audio segment of the AV content received from the content provider ([0017], [0050] the originally included voice of the voice content, i.e. first audio segment of the AV content, may be replaced with an artificially synthesized voice, i.e. piece of AV content with the audio replacing, where the synthesized voice content may be delivered to a client device by, i.e. transmit, to another device).  

Regarding claim 9, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 1, and Rossano further teaches
stream the AV content from another device ([0040] the video, i.e. AV content, is received in streaming mode from a provider such as YouTube, i.e. another device); and
insert the audio into the piece of AV content as the piece of AV content is streamed and presented ([0040] the data is dubbed, i.e. insert the audio into the piece of AV content, in real time during streaming mode, i.e. as the piece of AV content is streamed and presented).  
And, where the rationale to combine is the same as previously presented.

Regarding claim 10, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 9, and Liu further teaches
wherein the apparatus inserts the audio into the piece of AV content as the piece of AV content is streamed and presented by one or more of ([0040] the data is dubbed, i.e. insert the audio into the piece of AV content, in real time during streaming mode, i.e. as the piece of AV content is streamed and presented):
inserting the audio into at least one vacant audio segment of the AV content, replacing at least one filled audio segment of the AV content ([0017] the originally included voice of the voice content, i.e. at least one filled audio segment of the AV content, may be replaced with an artificially synthesized voice, i.e. replacing).

Regarding claim 11, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 1, and Brown further teaches
wherein the apparatus is embodied in a consumer electronics device of an end user ([0138-142] the customized animation video system, i.e. apparatus, is initiated in an internet-connected computing device, i.e. embodied in, such as a smartphone, handheld connected device, game console, or personal computer, i.e. consumer electronics device of an end user), and wherein the instructions are executable to:
receive from at least one camera associated with the consumer electronics device an image of the child ([0065], [0068], [0138-0142] the representation of an individual, such as a child, i.e. image of the child, can be downloaded directly from a camera to the video system, i.e. receive from at least one camera, which is accessible to the user through an internet-connected computing device, such as a smartphone or personal computer, i.e. associated with the consumer electronics device); and 
alter a video representation of a character in the AV content according to the image of the child ([0061:1-3], [0065], [0068] character customization can be based on an image of an individual, such as a child, i.e. image of the child, where the features of the character may be made to resemble the child except for specific .
And, where the rationale to combine is the same as previously presented.

Regarding claim 12, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 1, and Liu further teaches
comprising the at least one processor ([0005:1-6] the computer system comprising a processor).  

Claim(s) 5, 7, and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu, in view of Brown, in view of Agapi, in view of Dubinsky, in view of Rossano, and further in view of Lai (US PG Pub No. 2020/0234689), hereinafter Lai.

Regarding claim 5, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 4, and Liu further teaches
wherein the apparatus is embodied in a server ([0024] the system may be embodied in a server computer), wherein the instructions are executable to:
While Liu in view of Brown, Agapi, Dubinsky, and Rossano provides the receipt of video content from a content provider, Liu in view of Brown, Agapi, Dubinsky, and Rossano does not specifically teach that there is any portion of the AV content that is vacant, and thus does not teach
receive the AV content from the content provider with at least one audio segment of the AV content being left vacant; and
transmit, to another device, the piece of AV content with the audio inserted into the at least one vacant audio segment.  
Lai, however, teaches 25201806011.01receive the AV content from the content provider with at least one audio segment of the AV content being left vacant ([0024:1-3], [0026], [0045], [0055] the source file database, which can be on a server, i.e. content provider, stores to-be-replaced source video and audio files to be shared with a terminal device, i.e. receive AV content, where the audio file has saved audio segment information including start and stop time, and an indication of whether each segment represents a target role to be replaced, i.e. at least one audio segment of the AV content being left vacant); and
transmit, to another device, the piece of AV content with the audio inserted into the at least one vacant audio segment ([0023], [0024], [0026], [0032:1-2], [0122]  the audio file processing device obtains the source video file, and replaces the data in the to-be-replaced audio segments to obtain a second audio file, i.e. piece of AV content with the audio inserted into the at least one vacant audio segment, which can be send to a client on a terminal device for user viewing, i.e. transmit, to another device).  
Liu, Brown, Agapi, Dubinsky, Rossano, and Lai are analogous art because they are from a similar field of endeavor in realistic speech synthesis for audiovisual applications. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the receipt of video content 

Regarding claim 7, Liu in view of Brown, Agapi, Dubinsky, and Rossano teaches claim 4, and Brown further teaches
wherein the apparatus is embodied in a consumer electronics device of an end user ([0138-142] the customized animation video system, i.e. apparatus, is initiated in an internet-connected computing device, i.e. embodied in, such as a smartphone, handheld connected device, game console, or personal computer, i.e. consumer electronics device of an end user), and wherein the instructions are executable to:
While Liu in view of Brown, Agapi, Dubinsky, and Rossano provides the receipt of video content from a content provider, Liu in view of Brown, Agapi, Dubinsky, and Rossano does not specifically teach that there is any portion of the AV content that is vacant, and thus does not teach
receive the AV content from the content provider with at least one audio segment of the AV content being left vacant. 
Lai, however, teaches receive the AV content from the content provider with at least one audio segment of the AV content being left vacant ([0024:1-3], [0026], [0045], [0055] the source file database, which can be on a server, i.e. content provider, .  
Liu, Brown, Agapi, Dubinsky, Rossano, and Lai are analogous art because they are from a similar field of endeavor in realistic speech synthesis for audiovisual applications. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the receipt of video content from a content provider of Liu, as modified by Brown, Agapi, Dubinsky, and Rossano, with the source video file content being marked with specific segments as to-be-replaced as taught by Lai. The motivation to do so would have been to achieve a predictable result of replacing the voice of a specific character at the request of a user (Lai [0031]).

Regarding claim 8, Liu in view of Brown, Agapi, Dubinsky, Rossano, and Lai teaches claim 7, and Rossano further teaches
remaster the AV content locally at the apparatus prior to presentation of the AV content locally at the apparatus ([0039], [0058], [0060] the method may be carried out using a local configuration, such as a desktop computer or tablet, i.e. locally at the apparatus, where a mixing unit merges the new created audio track into the original movie, i.e. remaster the AV content, where video playback can be delayed to , 
subsequently begin presenting the remastered AV content locally at the apparatus ([0060] video playback can be delayed to allow time to dub the video before viewing).  
	And Lai further teaches the AV content being remastered with the audio being inserted into the at least one vacant audio segment ([0024:1-3], [0045], [0048], [0055] the source file database stores to-be-replaced source video and audio files where the audio file has saved audio segment information including start and stop time, and an indication of whether each segment represents a target role to be replaced, i.e. at least one audio segment of the AV content being left vacant, where the segments to-be-replaced are replaced with to-be-dubbed segments, i.e. the AV content being remastered).
And, where the rationale to combine is the same as previously presented.

Claim(s) 13-15 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu, in view of Brown, in view of Agapi, and further in view of Dubinsky.

Regarding claim 13, Liu teaches
A method, comprising ([0094] a method for synthesizing voice content):
accessing a speech synthesizer trained to mimic the voice…, the speech synthesizer comprising an artificial neural network trained to the…voice based on recorded speech…and first text corresponding to words indicated in the recorded speech ([0089], [0101:8-14], [0102] a synthesis engine, i.e. accessing a speech synthesizer, is trained to mimic the speech of a recommended voice, i.e. trained to mimic the voice, and uses a neural network, i.e. comprising an artificial neural network, trained with voice samples that are recordings of real speech, i.e. trained to the…voice based on recorded speech, in combination with phonetic transcriptions of the voiced content which are generated from the raw text of the content, i.e. first text corresponding to words indicated in the recorded speech);
accessing second text associated with audio visual (AV) content ([0014:5-7], [17:17-22], [0018:1-5] the voiced content, such as a video, i.e. audio visual content, may have the audio data transcribed into text for further processing by a language synthesizing program, i.e. accessing second text associated);
using the speech synthesizer and the second text to insert audio mimicking the voice…into the AV content ([0017], [0019], [0089] the artificial intelligence model is used by the language synthesis program, i.e. speech synthesizer to synthesize text from transcribed audio data, i.e. second text, into an artificially generated voice, where the artificially synthesized voice replaces one or more speakers in the original voiced content, i.e. insert audio mimicking the voice into the piece of AV content).
While Liu provides the use of a recommended voice to synthesize text and replace one or more speakers in the original voiced content, Liu does not specifically teach the user making an active choice to replace a particular, and thus does not teach
1 168-948AM1CASE NO. SYP332491US01PATENTSerial No.: 16/432,660Filed: June 5, 2019Page 6presenting on at least one display at least one user interface (UI) comprising:
a first option selectable to enable insertion of the audio mimicking the voice of the child into the AV content;
a first sub-option selectable to insert audio in the voice of a child on the fly as the AV content is streamed;
a second sub-option selectable to insert audio in the voice of a child into the AV content before the piece of AV content is presented.
Brown, however, teaches 1 168-948AM1CASE NO. SYP332491US01PATENTSerial No.: 16/432,660Filed: June 5, 2019Page 6presenting on at least one display at least one user interface (UI) comprising ([0096] the user interface, i.e. user interface, is provided on a computer screen, i.e. presenting on at least one display):
a first option selectable to enable insertion of the audio mimicking the voice … into the AV content ([0089-90], [0096] the user can select, i.e. a first option selectable, assets to change in the animation, i.e. AV content, through an interface, where changes can include sound files, where the user can create new customized audio files where voice is modified to match particular voice characteristics, i.e. audio mimicking the voice, and the new audio content can be used to partially populate the audio content of the animation, i.e. insertion of the audio).
Liu and Brown are analogous art because they are from a similar field of endeavor in enabling customization of audiovisual content for users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of a recommended voice to synthesize text and replace one or more speakers in the original voiced content teachings of Liu with the use of an interface that allows the user to customize the audio voice file as taught by 
While Liu in view of Brown provides an interface that allows the user to customize the audio voice file, and that customization can be performed on the fly during streaming or composited before being sent to a local device, Liu in view of Brown does not specifically teach the user making a choice as to when the customization is performed, and thus does not teach
a first sub-option selectable to insert audio in the voice of a child on the fly as the AV content is streamed;
a second sub-option selectable to insert audio in the voice of a child into the AV content before the piece of AV content is presented
Agapi, however, teaches a first sub-option selectable to insert audio in the voice … on the fly as the AV content is streamed ([0031] a printer properties interface permitting TTS settings to be adjusted, such as modifying the gender and pitch, i.e. in the voice, can also permit a user to select an output type, i.e. a first sub-option selectable to insert audio, such as outputting generated speech to a speaker, i.e. on the fly as the AV content is streamed);
a second sub-option selectable to insert audio in the voice … into the AV content before the piece of AV content is presented ([0031] a printer properties interface permitting TTS settings to be adjusted, such as modifying the gender and pitch, i.e. in the voice, can also permit a user to select an output type, i.e. a second sub-option selectable to insert audio, such as outputting generated speech to a file, i.e. into the AV content before the piece of AV content is presented).

While Liu in view of Brown and Agapi provides the ability to mimic the voice characteristics of specific individuals or actors, Liu does not specifically teach that the voice of a child is one that can be mimicked, and thus does not teach
voice of a child.
Dubinsky, however, teaches voice of a child ([0010:7-14], [0017:11-15], as also found in the provisional app at (p.2, l.18-23), (p.3, l.11-14) the system recognizes characteristics of the original speakers, including children, and creates voice dubbings that match the original speaker’s voice as closely as possible).
Liu, Brown, Agapi, and Dubinsky are analogous art because they are from a similar field of endeavor in realistic speech synthesis for voiced content applications. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the speech synthesis using age as an attribute teachings of Liu, as modified by Brown and Agapi, with the specific recognition 

Regarding claim 14, Liu in view of Brown, Agapi, and Dubinsky teaches claim 13, and Liu further teaches
wherein the inserted audio comprises an audible representation of at least a portion of the second text ([0018:1-5], [0019] the text transcription of the audio data, i.e. at least a portion of the second text, is processed to synthesize an artificial voice, i.e. audible representation, to replace the voice of the original voiced content, i.e. inserted audio).

Regarding claim 15, Liu in view of Brown, Agapi, and Dubinsky teaches claim 13, and Brown further teaches
the AV content is animated AV content ([0013] the system is able to generate cartoonized representations of a subject and incorporate the cartoonized representations into an animated video, i.e. AV content is animated).  
And, where the rationale to combine is the same as previously presented.
Regarding claim 17, Liu in view of Brown, Agapi, and Dubinsky teaches claim 13, and Liu further teaches
wherein the inserted audio replaces at least one existing audio segment of the AV content ([0017], [0050] the originally included voice of the voice content, i.e. .

Claim(s) 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu, in view of Brown, in view of Agapi, in view of Dubinsky, and further in view of Lai.

Regarding claim 16, Liu in view of Brown, Agapi, and Dubinsky teaches claim 13.
While Liu in view of Brown, Agapi, and Dubinsky provides the receipt of video content from a content provider, Liu in view of Brown, Agapi, and Dubinsky does not specifically teach that there is any portion of the AV content that is vacant, and thus does not teach
wherein the inserted audio fills at least one vacant audio segment of the AV content.
Lai, however, teaches wherein the inserted audio fills at least one vacant audio segment of the AV content ([0024:1-3], [0045], [0048], [0055] the source file database stores to-be-replaced source video and audio files where the audio file has saved audio segment information including start and stop time, and an indication of whether each segment represents a target role to be replaced, i.e. at least one audio segment of the AV content being left vacant, where the segments to-be-replaced are replaced with to-be-dubbed segments, i.e. inserted audio).  
Liu, Brown, Agapi, Dubinsky, and Lai are analogous art because they are from a similar field of endeavor in realistic speech synthesis for audiovisual applications. Thus, .

Claim(s) 18 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu, in view of Brown, and further in view of Dubinsky.

Regarding claim 18, Liu teaches 
An apparatus, comprising ([0005:1-2] a computer system):
at least one computer readable storage medium that is not a transitory signal, the at least one computer readable storage medium comprising instructions executable by at least one processor to ([0005:1-6] a computer-readable storage media containing program instructions for execution by a coupled processor):
use at least one machine learning model to produce a representation of a … voice as speaking audio corresponding to at least a portion of the script of audio video (AV) content ([0014:5-7], [0089] an artificial intelligence model uses a deep neural network trained, i.e. at least one machine learning model, to synthesize an audio of a recommended voice, i.e. produce a representation of a voice as speaking audio, reciting the transcript of text of voiced content including video, i.e. corresponding , the machine learning model being trained using both at least one recording of words spoken by the --speaker-- and text corresponding to the words ([0089], [0101:8-14], [0102] the deep neural network is trained, i.e. machine learning model being trained, with voice samples that are recordings of real speech, i.e. at least one recording of words spoken by the speaker, in combination with phonetic transcriptions of the voiced content which are generated from the raw text of the content, i.e. text corresponding to the words).
While Liu provides the use of a recommended voice to synthesize text and replace one or more speakers in the original voiced content, Liu does not specifically teach the user making an active choice to replace a particular, and thus does not teach
presenting on at least one display at least one user interface (UI) comprising ([0096] the user interface, i.e. user interface, is provided on a computer screen, i.e. presenting on at least one display):
a first option selectable to match physical attributes of a child to a given AV content character ([0064-5] a menu allows a user to enter customizing information, i.e. first option selectable, such as choosing a character’s features, i.e. given AV content character, to resemble a real person, i.e. match physical attributes, such as having a child play the role of a lead character);
a second option selectable to initiate a configuration process for training the machine learning model to the voice … ([0082], [0090] the voice changing algorithm, i.e. machine learning model, can be given a sample of the user’s speech, i.e. the voice, that the algorithm can use to modify audio content to match the voice of the ; and
Where Liu previously teaches that the training to mimic a voice is training a neural network using input speech (see [0089]).
at least one third option selectable to select a respective … voice in which to present audio for a given character within the AV content ([0080-1], [0089-90], [0096] the user can select, i.e. a third option selectable, assets to change in the animation, i.e. AV content, through an interface, where changes can include sound files, where the user can create new customized audio files where voice is modified to match particular voice characteristics, i.e. select a respective…voice in which to present audio, and the new audio content can be used to partially populate the audio content of the animation, i.e. insertion of the audio, and where the user can provide the voice talent for some of the animation, where predetermined characters are referred to as voice talent, i.e. given character). 
Liu and Brown are analogous art because they are from a similar field of endeavor in enabling customization of audiovisual content for users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of a recommended voice to synthesize text and replace one or more speakers in the original voiced content teachings of Liu with the use of an interface that allows the user to customize the audio voice file as taught by Brown. The motivation to do so would have been to achieve a predictable result of enabling user generated customized audio content (Brown [0089]).

voice of a child.
Dubinsky, however, teaches voice of a child ([0010:7-14], [0017:11-15], as also found in the provisional app at (p.2, l.18-23), (p.3, l.11-14) the system recognizes characteristics of the original speakers, including children, and creates voice dubbings that match the original speaker’s voice as closely as possible).
Liu, Brown, and Dubinsky are analogous art because they are from a similar field of endeavor in realistic speech synthesis for voiced content applications. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the speech synthesis using age as an attribute teachings of Liu, as modified by Brown, with the specific recognition and synthesis of a child’s voice as taught by Dubinsky. The motivation to do so would have been to achieve a predictable result of ensuring the voice dubbing is closely matched to the original speaker’s voice (Dubinsky (p. 3, l.11-14)).

Regarding claim 19, Liu in view of Brown and Dubinsky teaches claim 18, and Brown further teaches
the AV content is a cartoon ([0013] the system is able to generate cartoonized representations of a subject and incorporate the cartoonized representations into an animated video, such as an animated story, i.e. AV content is a cartoon).  
.

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu, in view of Brown, in view of Dubinsky, and further in view of McCoy et al. (US PG Pub No. 2015/0199978), hereinafter McCoy.

Regarding claim 20, Liu in view of Brown and Dubinsky teaches claim 19.
While Liu in view of Brown and Dubinsky provides the matching of facial expressions to the emotions of an animated character, Liu in view of Brown and Dubinsky does not specifically teach matching the character speech to the lip movement of an animated character, and thus does not teach
match the representation of the child's voice to lip movement of at least one character visually depicted in the cartoon.
McCoy, however, teaches match the representation of the child's voice to lip movement of at least one character visually depicted in the cartoon ([0025:1-10], [0026] movement of a character in the visual content, i.e. at least one character visually depicted in the cartoon, such as lip positions, i.e. lip movement, can be correlated to the dubbed vocal sounds for the character, i.e. match the representation of the voice).
Additionally, Dubinsky has already taught the voice to be mimicked is that of a child, also shown in claim 18.
Liu, Brown, Dubinsky, and McCoy are analogous art because they are from a similar field of endeavor in processing of multimedia content. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed 

Conclusion
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474.  The examiner can normally be reached on 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/NICOLE A K SCHMIEDER/Examiner, Art Unit 2659                                                                                                                                                                                                        

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659