DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 7-10, 12-13, 15-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Serlectic et al. (US 2018/0374461).
Claim 1
Serlectic teaches a computer program product for artificially generating media streams, the computer program product embodied in a non-transitory computer-readable medium and including instructions for causing at least one processor to execute a method comprising: 
receiving textual information (204 of fig. 2; [0005], determining lyric information of the audio selection.); 
receiving rhythm information ([0005], determining timing information of the audio selection; [0006] The method may also include requesting, via the digital communication network, tone information of the audio selection from a tone database, and receiving, via the digital communication network, the tone information of the audio selection from the tone database based on the request. The tone information may include at least one of a genre, a tempo, a mood, an artist, or a style corresponding to the audio selection. See also [0062], tempo); 
receiving voice characteristics ([0045] In some embodiments, a method of generating a musical work may additionally include receiving a selection of a singer corresponding to at least one voice characteristic… It is contemplated that, in some embodiments, the at least one voice characteristic may be included in the formatted data sent to the voice synthesizer ); 
determining that a first portion of the textual information corresponds to a first portion of the rhythm information and that a second portion of the textual information corresponds to a second portion of the rhythm information, the first portion of the textual information differs from the second portion of the textual information and the first portion of the rhythm information differs from the second portion of the rhythm information ([0038], These embellishments or reductions may be performed in order to align the textual phrases in the lyrical input with the musical phrases by aligning their boundaries, and also to provide the musical material necessary for the alignment of the syllables of individual words to notes resulting in a natural musical expression of the input text. [0039] Subsequent to the analysis of the musical input, at 212, the lyrical input and the musical input may be correlated with one another based on the analyses of both the lyrical input and the musical input 206 and 210. Specifically, in some embodiments, the notes of the selected and analyzed musical work are intelligently and automatically assigned to one or more phonemes in the input text, as described in more detail below. In some embodiments, the resulting data correlating the lyrical input to the musical input may then be formatted into a synthesizer input at 214 for input into a voice synthesizer. The formatted synthesizer input, in the form of text syllable-melodic note pairs, may then be sent to a voice synthesizer at 216 to create a vocal rendering of the lyrical input for use in an original musical work that incorporates characteristics of the lyrical input and the musical input.); and 
generating an audio stream based on the textual information, the rhythm information and the voice characteristics, the generated audio stream includes at least a first portion and a second portion, the first portion of the generated audio stream includes a vocal expression of the first portion of the textual information in accordance with the first portion of the rhythm information and in a voice corresponding to the voice characteristics, the second portion of the generated audio stream includes a vocal expression of the second portion of the textual information in accordance with the second portion of the rhythm information and in the voice corresponding to the voice characteristics ([0039] the generated musical work may be received in the form of an audio file including a vocal rendering of the lyrical input entered by the user correlating with the music/melody of the musical input; [0040] In one embodiment, the vocal renderer may provide the user with a choice of a variety of voices, a variety of voice synthesizers (including but not limited to HMM-based, diphone or unit-selection based), or a choice of human languages. Some examples of the choices of singing voices are gender (e.g., male/female), age (e.g., young/old), nationality or accent (e.g., American accent/British accent), or other distinguishing vocal characteristics (e.g., sober/drunk, yelling/whispering, seductive, anxious, robotic, etc.).).
Claim 2
Serlectic teaches the computer program product of claim 1, wherein the method further comprises: receiving a personalized profile associated with a user; and using the personalized profile to select the voice characteristics ([0045] In some embodiments, a method of generating a musical work may additionally include receiving a selection of a singer corresponding to at least one voice characteristic. In some embodiments, the at least one voice characteristic may be indicative of a particular real-life or fictional singer with a particular recognizable style. For example, a particular musician may have a recognizable voice due to a specific twang, falsetto, vocal range, vibrato style, etc. When the system receives a selection of the particular singer, the at least one voice characteristic may be incorporated into the performance of the musical work.). 
Claim 3
Serlectic teaches the computer program product of claim 1, wherein the method further comprises: receiving a source audio data (208 of Fig. 2; [0031] Also, an external device may be connected to pressure pad 113 and/or client devices 101-105 to provide an external source of sound samples, waveforms, signals, or other musical inputs that can be reproduced by external control. ); and analyzing the source audio data to determine the voice characteristics based on voice characteristics of a speaker in the source audio data (210 of Fig. 2; [0040],  In some embodiments, the choice of voice synthesizer may be made automatically by the system based on analysis of the lyrical input and/or the musical input for specific words or musical styles indicating mood, tone, or genre.).
Claim 4
Serlectic teaches the computer program product of claim 1, wherein the method further comprises: receiving a source audio data (602 of Fig. 6); and analyzing the source audio data to generate the textual information based on speech in the source audio data (606 of Fig. 6; [0069] At 606, the system may determine lyric information of the audio selection, i.e., the words used or sung in the audio selection…In some embodiments, the system may identify the lyric information using voice recognition, such as by converting the spoken or sung words in the audio selection into text.).  
Claim 5
Serlectic teaches the computer program product of claim 4, wherein the generated textual information is a transcription of at least part of the speech in the source audio data (606 of Fig. 6; [0069] At 606, the system may determine lyric information of the audio selection, i.e., the words used or sung in the audio selection…In some embodiments, the system may identify the lyric information using voice recognition, such as by converting the spoken or sung words in the audio selection into text.).  

Claim 7
Serlectic teaches the computer program product of claim 1, wherein the method further comprises: receiving melody information ([0019]  For example, the lyric video system may receive a user's selection of a musical work or melody that is pre-recorded or recorded and provided by the user. See also [0037]); and generating the audio stream based on the textual information, the melody information and the voice characteristics ([0038] During analysis and processing, each musical work or clip may optionally be embellished or reduced, either adding a number of notes to the phrase in a musical way (embellish), or removing them (reduce), while still maintaining the idea and recognition of the original melody in the musical input. [0039] the generated musical work may be received in the form of an audio file including a vocal rendering of the lyrical input entered by the user correlating with the music/melody of the musical input; [0040] In one embodiment, the vocal renderer may provide the user with a choice of a variety of voices, a variety of voice synthesizers (including but not limited to HMM-based, diphone or unit-selection based), or a choice of human languages. Some examples of the choices of singing voices are gender (e.g., male/female), age (e.g., young/old), nationality or accent (e.g., American accent/British accent), or other distinguishing vocal characteristics (e.g., sober/drunk, yelling/whispering, seductive, anxious, robotic, etc.).).  
Claim 8
Serlectic teaches the computer program product of claim 7, wherein the method further comprises: receiving a source audio data (208 of Fig. 2); analyzing the source audio data to identify a melody in the source audio data; and determining the melody information based on the identified melody in the source audio data ([0039] the generated musical work may be received in the form of an audio file including a vocal rendering of the lyrical input entered by the user correlating with the music/melody of the musical input; See also [0065],  In any of 502 or 506, a MusicXML or other suitable data format may be generated from the digital sheet music or from the song audio track. Based on any of 502, 504, and 506, the system may generate a melody MIDI at 508. In some embodiments, the melody MIDI may include timing and pitches of the lead vocal in the audio selection based on timing information included in the audio selection either in the MusicXML format or otherwise. ).  
Claim 9
Serlectic teaches the computer program product of claim 1, wherein the method further comprises receiving musical information, and wherein the generated audio stream includes musical tones based on the musical information in conjunction with the vocal expressions ([0040], Some examples of the choices of singing voices are gender (e.g., male/female), age (e.g., young/old), nationality or accent (e.g., American accent/British accent), or other distinguishing vocal characteristics (e.g., sober/drunk, yelling/whispering, seductive, anxious, robotic, etc.). In some embodiments, these choices of voices may be implemented through one or more speech synthesizers each using one or more vocal models, pitches, cadences, and other variables that may result in perceptively different sung attributes. In some embodiments, the choice of voice synthesizer may be made automatically by the system based on analysis of the lyrical input and/or the musical input for specific words or musical styles indicating mood, tone, or genre.).  
Claim 10
Serlectic teaches the computer program product of claim 9, wherein the method further comprises: receiving a source audio data (208 of Fig. 2); and analyzing the source audio data to determine the musical tones based on music in the source audio data ([0071] In some embodiments, the system may apply machine learning techniques or other automatic analysis to determine timing information, lyric information and analysis, and tone information without the need to receive information from third party sources. For example, in such an embodiment, the system may receive an audio selection or input, automatically derive lyrics, timing information, lyric analysis, and tone information using reference databases and machine learning techniques). 
Claim 12
Serlectic teaches the computer program product of claim 1, wherein the method further comprises: receiving a source textual information (204 of Fig. 2); and modifying at least one aspect of the source textual information to generate the textual information ([0033], Analysis of the lyrical input can include a variety of data processing techniques and procedures. For example, in some embodiments, the lyrical input is parsed into the speech elements of the text with a speech parser. For instance, in some embodiments, the speech parser may identify important words (e.g., love, anger, crazy), demarcate phrase boundaries (e.g., “I miss you.” “I love you.” “Let's meet.” “That was an awesome concert.”) and/or identify slang terms (e.g., chill, hang). Words considered as important can vary by region or language, and can be updated over time to coincide with the contemporary culture. Similarly, slang terms can vary geographically and temporally such that the media generation system is updatable and customizable. Punctuation or other symbols used in the lyrical input can also be identified and attributed to certain moods or tones that can influence the analytical parsing of the text. For example, an exclamation point could indicate happiness or urgency, while a “sad-face” emoticon could indicate sadness or sorrow. In some embodiments, the words or lyrics conveyed in the lyrical input can also be processed into its component pieces by breaking words down into syllables, and further by breaking the syllables into a series of phonemes. In some embodiments, the phonemes are used to create audio playback of the words or lyrics in the lyrical input. Additional techniques used to analyze the lyrical input are described in greater detail below.).  
Claim 13
Serlectic teaches the computer program product of claim 12, wherein the at least one aspect is language register ([0033],  Words considered as important can vary by region or language, and can be updated over time to coincide with the contemporary culture. Similarly, slang terms can vary geographically and temporally such that the media generation system is updatable and customizable.).  
Claim 15
Serlectic teaches the computer program product of claim 1, wherein the method further comprises: receiving source voice characteristics (for example, musical input 208 of Fig. 2; [0045] In some embodiments, a method of generating a musical work may additionally include receiving a selection of a singer corresponding to at least one voice characteristic… It is contemplated that, in some embodiments, the at least one voice characteristic may be included in the formatted data sent to the voice synthesizer ); and modifying at least one aspect of the source voice characteristics to generate the voice characteristics ([0040] In some embodiments, the choice of voice synthesizer may be made automatically by the system based on analysis of the lyrical input and/or the musical input for specific words or musical styles indicating mood, tone, or genre. [0061] A performance of the timing data may be created at a stage where the system mimics a human technician by slightly adjusting pitch and timing information to match the original intent of the timing source, i.e. a song or other audio recording.). 
Claim 16
Serlectic teaches the computer program product of claim 15, wherein the at least one aspect is related to gender ([0061], A performance of the timing data may be created at a stage where the system mimics a human technician by slightly adjusting pitch and timing information to match the original intent of the timing source, i.e. a song or other audio recording. The system may then determine an appropriate voice model based on inputs associated with the timing data. The inputs may be a music artist name, title of the work, gender of the speaker, musical key, etc.).  
Claim 17
Serlectic teaches the computer program product of claim 1, wherein the method further comprises: receiving source rhythm information ([0036], The audio recording may then be analyzed for pitch, tempo, etc., to utilize the audio recording as the musical input.); and modifying at least one aspect of the source rhythm information to generate the rhythm information ([0038]m For example, in some embodiments, analysis and processing of the musical work includes “reducing” or “embellishing” the musical work. In some embodiments, the selected musical work may be parsed for features such as structurally important notes, rhythmic signatures, and phrase boundaries. In embodiments that utilize a text or speech parser as described above, the results of the text or speech parsing may be factored into the analysis of the musical work as well. During analysis and processing, each musical work or clip may optionally be embellished or reduced, either adding a number of notes to the phrase in a musical way (embellish), or removing them (reduce), while still maintaining the idea and recognition of the original melody in the musical input.  [0060] For example, in some embodiments, the system may only add notes in the same musical key as the original musical work, or notes that maintain the tempo or other features of the original work so as to aide in keeping the musical work recognizable. It should be understood that although melodic reduction and embellishment have been described in the context of slight phrase disparity between the musical and text inputs, use of melodic reduction and embellishment in larger or smaller phrase disparity is also contemplated.).  
Claim 18
Serlectic teaches the computer program product of claim 17, wherein the at least one aspect is related to musical genre ([0040], In some embodiments, the choice of voice synthesizer may be made automatically by the system based on analysis of the lyrical input and/or the musical input for specific words or musical styles indicating mood, tone, or genre. [0062], At 308, the system may receive song data, such as the artist, genre, tempo, song title, key, tone, etc. At 312, the system may determine a vocalist gender, style, or ideal voice model based on the received song data.).
Claims 19 and 20
	These claims recite substantially the same limitations as those provided in claim 1 above, and therefore they are rejected for the same reasons.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 6, 11, and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Serlectic et al. (US 2018/0374461) in view of Freitag et al. (US 2021/0019373).
Claim 6
Although Serlectic teaches [0040],  In one embodiment, the vocal renderer may provide the user with a choice of a variety of voices, a variety of voice synthesizers (including but not limited to HMM-based, diphone or unit-selection based), or a choice of human languages, and  Serlectic teaches a computer program product of claim 4, wherein the speech in the source audio data is in a first language ([0047]  The system methodology for recognizing such instances in which words or syllables should receive textual emphasis may be based on language or be culturally specific.), Serlectic does not specifically detail the generated textual information includes a translation of at least part of the speech to a second language.  
Freitag teaches wherein the generated textual information includes a translation of at least part of the speech to a second language ([0026] In many implementations, APE model 108 is trained to correct errors introduced when translating a source language into a target language. )
	Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate translation of language as taught by Freitag in the speech-to-text conversion system of Serlectic, because doing so would have provided a way to correct errors introduced when translating a source language into a target language ([0026] of Freitag). 


Claim 11
Although Serlectic teaches [0040],  In one embodiment, the vocal renderer may provide the user with a choice of a variety of voices, a variety of voice synthesizers (including but not limited to HMM-based, diphone or unit-selection based), or a choice of human languages, and  Serlectic teaches a computer program product of claim 4, wherein the speech in the source audio data is in a first language ([0047]  The system methodology for recognizing such instances in which words or syllables should receive textual emphasis may be based on language or be culturally specific.), Serlectic does not specifically detail wherein the method further comprises: receiving a source textual information; and translating the source textual information to generate the textual information.  
Freitag teaches receiving a source textual information; and translating the source textual information to generate the textual information ([0026] In many implementations, APE model 108 is trained to correct errors introduced when translating a source language into a target language. )
	Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate translation of language as taught by Freitag in the speech-to-text conversion system of Serlectic, because doing so would have provided a way to correct errors introduced when translating a source language into a target language ([0026] of Freitag). 

Claim 14
Serlectic discloses the computer program product of claim 12, but does not specifically detail wherein the at least one aspect is related to gender. 
Freitag teaches wherein the at least one aspect is related to gender ([0029] APE model 206 can be trained to process natural language text to generate edited text correcting word translation error(s), gender error(s), etc. )
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate translation modifications as taught by Freitag in the speech-to-text conversion system of Serlectic, because doing so would have provided a way to correct errors introduced when translating a source language into a target language ([0026] of Freitag). 
	
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS H MAUNG whose telephone number is (571)270-5690. The examiner can normally be reached Monday-Friday, 9am-6pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 1-(571) 272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THOMAS H MAUNG/Primary Examiner, Art Unit 2654