DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/27/2022 has been entered.
 
Response to Arguments
Applicant's arguments filed 04/11/2022 have been fully considered but they are not persuasive. Regarding arguments on page 13 of the Remarks, Examiner notes that while the pitch at any time instant may be compared, this is done over a time period, as shown in para [0103] and Fig. 3A.
Regarding arguments on pages 14-15 of the Remarks, Examiner notes that the first three limitations of claims 9 and 18 are listed in the alternative, using the “or” operator. The newly added limitations are only applicable if the audio information is comprised of both the pitch and energy. However, if only the rhythm is selected as the audio information, the newly added limitations are not given weight.
Applicant’s arguments with respect to claim(s) 1-5, 8-14, and 17-19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claim 10 objected to because of the following informalities:  lines 9-10 reads “in the lyrics” whereas the similar limitation in Claim 1 reads “in the lyric”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 8-11, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Foster (US 2010/0300264), in view of Trovato et al. (US 2002/0163533 A1), hereinafter referred to as Trovato.

Regarding claim 1, Foster teaches:
A method for processing an audio, comprising: 
obtaining first audio information  of a song in response to that the song is selected, wherein the first audio information represents audio features for reflecting music characteristics of the song (para [0038-39], where audio output associated with the selected target musical data is produced); 
acquiring vocal audio data (para [0074], where vocal inputs are received); 
obtaining second audio information based on the vocal audio data (para [0071], where the pitch is extracted from the digital data stream); 
obtaining two or more first singing parameters within a contrast time period, by comparing the second audio information with the first audio information within the contrast time period, wherein the first singing parameters represent a matching degree of each segment of the first song audio information and the second audio information corresponding to each sentence (para [0072], where the sample and correct pitch values are compared to determine a pitch error, which is used to generate performance evaluation data, and para [0103], where a degree of matching is determined at multiple time points)
displaying a first effect animation corresponding to the first singing parameters on a singing live broadcast interface at a singing moment corresponding to ending stamps based on a corresponding relationship between the first singing parameters and the first effect animation (para [0080], where successfully performing musical content triggers animations)
obtaining a second singing parameter based on the two or more first singing parameters (para [0103], where a degree of matching is determined at multiple time points); and
displaying a second effect animation corresponding to the second singing parameter on the singing live broadcast interface at the singing moment (para [0080], where successfully performing musical content triggers animations).  
Foster does not teach:
obtaining first audio information and a lyric file of a song in response to that the song is selected, wherein the first audio information represents audio features for reflecting music characteristics of the song, the lyric file comprises lyric information of two or more sentences in the lyric, and the lyric information comprises a starting timestamp and an ending timestamp of each sentence;
obtaining a time period based on the starting timestamp and the ending timestamp of each sentence; 
Trovato teaches:
obtaining first audio information and a lyric file of a song in response to that the song is selected (para [0073-74], where the lyrics from a text file are played along with the music for the selected song), wherein the first audio information represents audio features for reflecting music characteristics of the song (para [0072], where the music is retrieved), the lyric file comprises lyric information of two or more sentences in the lyric (para [0077], where the lyrics are segmented into sentences or paragraphs), and the lyric information comprises a starting timestamp and an ending timestamp of each sentence (Fig. 9A, 9B, para [0066-69], [0075], where the timestamps indicate a starting and ending time for each sentence or paragraph);
obtaining a time period based on the starting timestamp and the ending timestamp of each sentence (Fig. 9A, 9B, para [0066-69], [0075], where the timestamps indicate a starting and ending time for each sentence or paragraph, corresponding to a time period); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Foster by using the timestamps of Trovato (Trovato Fig. 9A, 9B) for the lyrics of Foster (Foster para [0077-79]), in order to correlate the visual information with corresponding voice and non-voice segments (Trovato para [0009]).

Regarding claim 2, Foster in view of Trovato teaches:
The method according to claim 1, further comprising: 
displaying a song selection interface in response to a singing request being received, wherein songs are displayed on the song selection interface (Foster para [0038], where the first and second target musical data are displayed in a practice mode, interpreted as the singing request), 
wherein said obtaining the first audio information comprises: 
obtaining the first audio information in response to the song being selected from the song selection interface (Foster para [0038-39], where audio output associated with the selected target musical data is produced).  

Regarding claim 8, Foster in view of Trovato teaches:
The method according to claim 1, wherein said acquiring the vocal audio data comprises: 
acquiring the vocal audio data based on an acquisition period (Foster para [0077-79], where the user sings during the vocal cue, corresponding to the timing of the lyrics displayed); or 
acquiring the vocal audio data within the time period (Foster para [0077-79], where the user sings during the vocal cue, corresponding to the timing of the lyrics displayed).  

Regarding claim 9, Foster in view of Trovato teaches:
The method according to claim 1, wherein the audio information comprises at least one of following audio features: 
audio pitch for reflecting pitch characteristics of the song (Foster para [0071], where pitch is used); 
audio rhythm for reflecting rhythm characteristics of the song (Foster para [0074], where the application is a rhythm-action game); or 
audio energy for reflecting energy characteristics of the song (Foster para [0078], where cues glow brightly for louder tones),
wherein in response to the audio information comprising the audio pitch and the audio energy, said obtaining two or more first singing parameters comprises:
in a case of comparing pitch characteristics in the vocal audio data with pitch characteristics included in the first audio information, obtaining the first singing parameters based on ratio of pitches in the vocal audio data and the first audio information (where the selected audio feature is the audio rhythm), and
in a case of comparing audio energy in the vocal audio data with audio energy included in the first audio information, obtaining the first singing parameters based on a change rate of loudness (where the selected audio feature is the audio rhythm).  

Regarding claim 10, Foster teaches:
An electronic device for processing audio, comprising: 
a processor (Fig. 1B element 120, para [0070], where a processor is used); and 
a memory for storing an instruction executable for the processor (Fig. 1B element 145, para [0070], where RAM is used); 
wherein the processor is configured to execute the instruction to implement the following: 
obtaining first audio information  of a song in response to the song being selected, wherein the first audio information represents audio features for reflecting music characteristics of the song (para [0038-39], where audio output associated with the selected target musical data is produced); 
acquiring vocal audio data (para [0074], where vocal inputs are received); 
obtaining second audio information based on the vocal audio data (para [0071], where the pitch is extracted from the digital data stream); 
obtaining two or more first singing parameters within a contrast time period by comparing the second audio information with the first audio information within the contrast time period, wherein the first singing parameters represent a matching degree of each segment of the first song audio information and the second audio information corresponding to each sentence (para [0072], where the sample and correct pitch values are compared to determine a pitch error, which is used to generate performance evaluation data, and para [0103], where a degree of matching is determined at multiple time points)
displaying a first effect animation corresponding to the first singing parameters on a singing live broadcast interface at a singing moment corresponding to ending timestamps based on a corresponding relationship between the first singing parameters and the first effect animation (para [0080], where successfully performing musical content triggers animations);
obtaining a second singing parameter based on the two or more first singing parameters (para [0103], where a degree of matching is determined at multiple time points); and
displaying a second effect animation corresponding to the second singing parameter on the singing live broadcast interface at the singing moment (para [0080], where successfully performing musical content triggers animations).  
Foster does not teach:
obtaining first audio information and a lyric file of a song in response to the song being selected, wherein the first audio information represents audio features for reflecting music characteristics of the song, the lyric file comprises lyric information of two or more sentences in the lyrics, and the lyric information comprises a starting timestamp and an ending timestamp of each sentence; 
obtaining a time period based on the starting timestamp and the ending timestamp of each sentence; 
Trovato teaches:
obtaining first audio information and a lyric file of a song in response to the song being selected (para [0073-74], where the lyrics from a text file are played along with the music for the selected song), wherein the first audio information represents audio features for reflecting music characteristics of the song (para [0072], where the music is retrieved), the lyric file comprises lyric information of two or more sentences in the lyrics (para [0077], where the lyrics are segmented into sentences or paragraphs), and the lyric information comprises a starting timestamp and an ending timestamp of each sentence (Fig. 9A, 9B, para [0066-69], [0075], where the timestamps indicate a starting and ending time for each sentence or paragraph); 
obtaining a time period based on the starting timestamp and the ending timestamp of each sentence (Fig. 9A, 9B, para [0066-69], [0075], where the timestamps indicate a starting and ending time for each sentence or paragraph, corresponding to a time period); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Foster by using the timestamps of Trovato (Trovato Fig. 9A, 9B) for the lyrics of Foster (Foster para [0077-79]), in order to correlate the visual information with corresponding voice and non-voice segments (Trovato para [0009]).

Regarding claim 11, Foster in view of Trovato teaches:
The apparatus according to claim 10, wherein the processor is configured to execute the instruction to implement the following: 
displaying a song selection interface in response to a singing request being received, wherein songs are displayed on the song selection interface (Foster para [0038], where the first and second target musical data are displayed in a practice mode, interpreted as the singing request), 
wherein the processor is configured to execute the instruction to obtain the first audio information by: 
obtaining the first audio information in response to the song being selected from the song selection interface (Foster para [0038-39], where audio output associated with the selected target musical data is produced).  

Regarding claim 17, Foster in view of Trovato teaches:
The apparatus according to claim 10, wherein the processor is configured to execute the instruction to acquire the vocal audio data by: 
acquiring the vocal audio data based on an acquisition period (Foster para [0077-79], where the user sings during the vocal cue, corresponding to the timing of the lyrics displayed); or 
acquiring the vocal audio data within the time period (Foster para [0077-79], where the user sings during the vocal cue, corresponding to the timing of the lyrics displayed).  

Regarding claim 18, Foster in view of Trovato teaches:
The apparatus according to claim 10, wherein the audio information comprises at least one of following audio features: 
audio pitch for reflecting pitch characteristics of the song (Foster para [0071], where pitch is used); 
audio rhythm for reflecting rhythm characteristics of the song (Foster para [0074], where the application is a rhythm-action game); or 
audio energy for reflecting energy characteristics of the song (Foster para [0078], where cues glow brightly for louder tones),
wherein in response to the audio information comprising the audio pitch and the audio energy, said obtaining two or more first singing parameters comprises:
in a case of comparing pitch characteristics in the vocal audio data with pitch characteristics included in the first audio information, obtaining the first singing parameters based on ratio of pitches in the vocal audio data and the first audio information (where the selected audio feature is the audio rhythm), and
in a case of comparing audio energy in the vocal audio data with audio energy included in the first audio information, obtaining the first singing parameters based on a change rate of loudness (where the selected audio feature is the audio rhythm).  

Regarding claim 19, Foster in view of Trovato teaches:
A non-transitory storage medium, comprising an instruction, wherein the instruction is executed by a processor to implement the method according to claim 1 (Foster para [0068], where a computer-readable storage medium is used).

Claims 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Foster, in view of Trovato, and further in view of Inoue et al. (US 2015/0262589 A1), hereinafter referred to as Inoue.

Regarding claim 3, Foster in view of Trovato teaches:
The method according to claim 1, wherein said acquiring the vocal audio data comprises:
Foster in view of Trovato does not teach:
acquiring environmental audio data; and 
determining the vocal audio data by cancelling echo in the environmental audio data in response to a device for implementing the method being detected to be in a loudspeaker mode, wherein the cancelling echo comprises cancelling environmental noise caused by live broadcast voices and comprised in the environmental audio data.  
Inoue teaches:
acquiring environmental audio data (para [0086], where audio from the speaker is detected at the microphone); and 
determining the vocal audio data by cancelling echo in the environmental audio data in response to a device for implementing the method being detected to be in a loudspeaker mode, wherein the cancelling echo comprises cancelling environmental noise caused by live broadcast voices and comprised in the environmental audio data (para [0086], where the echo signal is subtracted from the microphone signal, where the echo comes from a loudspeaker, where the live broadcast voices may be audio from other singers, as in Foster para [0085]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Foster in view of Trovato to use the echo cancellation of Inoue (Inoue para [0086]) on the vocal input of Foster in view of Trovato (Foster para [0074]) in order to reduce the influence of the creeping of accompaniment audio from the speaker (Inoue para [0087]).

Regarding claim 12, Foster in view of Trovato teaches:
The apparatus according to claim 10, wherein the processor is configured to execute the instruction to acquire the vocal audio data by:
Foster in view of Trovato does not teach:
acquiring environmental audio data; and 
determining the vocal audio data by cancelling echo in the environmental audio data in response to the apparatus being detected to be in a loudspeaker mode, wherein the cancelling echo comprises cancelling environmental noise caused by live broadcast voices and comprised in the environmental audio data.  
Inoue teaches:
acquiring environmental audio data (para [0086], where audio from the speaker is detected at the microphone); and 
determining the vocal audio data by cancelling echo in the environmental audio data in response to the apparatus being detected to be in a loudspeaker mode, wherein the cancelling echo comprises cancelling environmental noise caused by live broadcast voices and comprised in the environmental audio data (para [0086], where the echo signal is subtracted from the microphone signal, where the echo comes from a loudspeaker, where the live broadcast voices may be audio from other singers, as in Foster para [0085]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Foster in view of Trovato to use the echo cancellation of Inoue (Inoue para [0086]) on the vocal input of Foster in view of Trovato (Foster para [0074]) in order to reduce the influence of the creeping of accompaniment audio from the speaker (Inoue para [0087]).

Claims 4-5 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Foster, in view of Trovato, and further in view of Fallgatter (US 2005/0235812 A1).

Regarding claim 4, Foster in view of Trovato teaches:
The method according to claim 1, wherein said obtaining the first audio information comprises:
wherein said determining the vocal audio information comprises: 
converting the vocal audio data into a second musical instrument digital interface data (Foster para [0099], where a MIDI value for the input is determined), and  
wherein said determining the first singing parameters comprises: 
determining singing parameters of the second musical instrument digital interface data and the first musical instrument digital interface data as the first singing parameters by comparing the second musical instrument digital interface data with the first musical instrument digital interface data (Foster para [0099], where the MIDI values for the input and the song are compared to determine the pitch distance).  
Foster in view of Trovato does not teach:
obtaining a musical instrument digital interface file of the song, wherein the musical instrument digital interface file carries a first musical instrument digital interface data representing the first audio information;
Fallgatter teaches:
obtaining a musical instrument digital interface file of the song, wherein the musical instrument digital interface file carries a first musical instrument digital interface data representing the first audio information (para [0045], where the music file is converted to a MIDI format);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Foster in view of Trovato by using the MIDI conversion of Fallgatter (Fallgatter para [0045]) for the songs of Foster in view of Trovato (Foster para [0038-39]), as the converted file may contain useful information such as data relating to the notes or time stamps (Fallgatter para [0045]).

Regarding claim 5, Foster in view of Trovato and Fallgatter teaches:
The method according to claim 4, wherein said acquiring the musical instrument digital interface file comprises: 
acquiring audio data of the song (Fallgatter para [0044], where the music file is received); and 
converting the audio data into the musical instrument digital interface data, and generating the musical instrument digital interface file based on the musical instrument digital interface data (Fallgatter para [0045], where the music file is converted to a MIDI format).  

Regarding claim 13, Foster in view of Trovato teaches:
The apparatus according to claim 10, wherein the processor is configured to execute the instruction to acquire the first audio information by:
wherein the processor is configured to execute the instruction to determine the vocal audio information by: 
converting the vocal audio data into a second musical instrument digital interface data (Foster para [0099], where a MIDI value for the input is determined), and 
wherein the processor is configured to execute the instruction to determine the singing completeness by: 
determining a matching degree of the second musical instrument digital interface data and the first musical instrument digital interface data as the singing completeness by comparing the second musical instrument digital interface data with the first musical instrument digital interface data (Foster para [0099], where the MIDI values for the input and the song are compared to determine the pitch distance).  
Foster in view of Trovato does not teach:
obtaining a musical instrument digital interface file of the song, wherein the musical instrument digital interface file carries a first musical instrument digital interface data representing the first audio information,
Fallgatter teaches:
obtaining a musical instrument digital interface file of the song, wherein the musical instrument digital interface file carries a first musical instrument digital interface data representing the first audio information (para [0045], where the music file is converted to a MIDI format),
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Foster in view of Trovato by using the MIDI conversion of Fallgatter (Fallgatter para [0045]) for the songs of Foster in view of Trovato (Foster para [0038-39]), as the converted file may contain useful information such as data relating to the notes or time stamps (Fallgatter para [0045]).

Regarding claim 14, Foster in view of Trovato and Fallgatter teaches:
The apparatus according to claim 13, wherein the processor is configured to execute the instruction to acquire the musical instrument digital interface file by: 
acquiring audio data of the song (Fallgatter para [0044], where the music file is received); and 
converting the audio data into the musical instrument digital interface data, and generating the musical instrument digital interface file based on the musical instrument digital interface data (Fallgatter para [0045], where the music file is converted to a MIDI format).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2014/0149861 A1 para [0063] teaches lyrics are associated with time stamps, the lyrics pre-segmented based on complete sentences.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658