DETAILED ACTION
	This office action is in response to the communication filed on September 24, 2021. Claims 1-5 and 7-20 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to amended independent claim(s) 1, 14, and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claim 20 is objected to because of the following informalities:
In Claim 20 line 9, the phrase “receiving a audio segment” should be “receiving an audio segment”. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:



Claims 1-5 and 7-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In Claims 1, 14, and 20 the phrase “third lyrics information” is indefinite because it is not clear if the third lyrics information is different from the first lyrics and second lyrics information or not. It is also unclear why lyrics information associated with a track determined from retrieved set of tracks associated with second lyrics information is being referred to being associated with third lyrics information.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 1-4, 7-18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Roberts (US Pub 2015/0302086) in view of Wang (US Pub 2011/0276333) and in further view of Sharp (US Pub 2018/0366097).

With respect to claim 1, Roberts discloses an electronic device (Roberts: Paragraph 90; Figure 10), comprising:
circuitry (Roberts: Paragraph 90; Figure 10) configured to:
determine identification information associated with a first performer-of-interest at a live event (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc., retrieve a list of audio pieces corresponding to the performer; Paragraphs 25-33 and 40 – obtain identifier of an audio piece, recording a live segment of the audio piece, generate live fingerprint, query using live fingerprint to obtain metadata of the audio piece, obtain identification information; Figure 5);
retrieve a set of audio tracks from a plurality of audio tracks based on the determined identification information, wherein the set of audio tracks are associated with the first performer-of-interest (Roberts: Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5);
receive a first audio segment associated with the first performer-of-interest from an audio capturing device at the live event (Roberts: Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5);
compare the first text information with second text information, wherein the second text information is associated with a first audio portion of each of the retrieved set of audio tracks (Roberts: Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5; here Roberts discloses comparing first information with second information, where second information is associated with a first audio portion of each of the retrieved reference audio tracks, but Roberts does not explicitly disclose comparing first text information with second text information, however, the Sharp reference discloses the features, as discussed below);
determine a first audio track from the retrieved set of audio tracks based on the comparison between the first text information and the second text information (Roberts: Paragraphs 47 and 48 – retrieve a list of audio pieces for the identified performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5; here Roberts discloses determining a first audio track from the retrieved set of audio tracks based on comparison between first information and second information, but Roberts does not explicitly disclose comparison between first text information with second text information, however, the Sharp reference discloses the features, as discussed below);
Roberts discloses identifying and comparing lyrics information in audio segments, however, Roberts does not explicitly disclose:
identify a start position of the determined first audio track based on the received first audio segment;
identify third lyrics information associated with the determined first audio track;
control a display screen to display the third lyrics information of the determined first audio track based on the identified start position.
The Wang reference discloses identifying a start position of the determined audio track based on the received audio segment (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, determining a time offset indicating a time position in the media stream corresponding to the sampling time, calculating a real-time offset using teal-time timestamp, the timestamp of the media sample, and the time offset, identifying a time offset indicating a time position in a song; Paragraphs 17-19 and 91 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, identify the playing music, media rendering source include live performance as a source of audio, identify beginning of audio sample),
identifying lyrics information associated with the determined audio track (Wang: Paragraphs 5 and 6 – identifying a time offset indicating a time position in a song, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 and 91 – determine timestamp corresponding to a sampling time of media sample captured, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point); and
controlling a display screen to display the lyrics information of the determined audio track based on the identified start position (Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts and Wang, to have combined Roberts and Wang. The motivation to combine Roberts and Wang would be to display lyric information to ambiently playing music by retrieving media information and performing lyric synchronization (Wang: Paras 2 and 18).
Roberts discloses comparing a first fingerprint information with a second fingerprint information and Wang discloses receiving textual information corresponding to lyrics information associated with songs and synchronizing lyrics with an audio sample, however, Roberts and Wang do not explicitly disclose: 
convert the received first audio segment to first text information;
compare the first text information with second text information;
the first text information corresponds to a portion of first lyrics information associated with the received first audio segment, and
the second text information corresponds to a portion of second lyrics information associated with the first audio portion of each of the retrieved set of audio tracks;
The Sharp reference discloses converting a received first audio segment to first text information (Sharp: Paragraph 2 – automatic generation of lyrics of songs using speech recognition; Paragraphs 9 and 36 – receiving audio input of a song, transcribing a plurality of words from vocal content using speech recognition, generating lyric by converting each word to text; Paragraph 51 – audio input can correspond to a single track or multiple tracks);
comparing the first text information with second text information (Sharp: Paragraphs 37 and 48 – verifying lyrics with popular lyric databases, searching lyric archives and matching lyrics with the generated lyrics on a word by word comparison, comparing the lyrics or portion of the lyrics with one or more pre-existing versions);
the first text information corresponding to a portion of first lyrics information associated with the received first audio segment (Sharp: Paragraphs 9 and 36 – receiving audio input of a song, transcribing a plurality of words from vocal content using speech recognition, generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions), and
the second text information corresponding to a portion of second lyrics information associated with a first audio portion of each of a retrieved set of audio tracks (Sharp: Paragraphs 37 and 48 – verifying lyrics with popular lyric databases, searching lyric archives for one or more pre-existing versions of the lyrics and matching the lyrics with the generated lyrics on a word by word comparison, comparing the lyrics or portion of the lyrics with one or more pre-existing versions; Paragraphs 54, 55, and 59 – retrieve pre-existing lyrics for versions of songs from database, compare lyric with pre-existing lyrics).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts, Wang, and Sharp, to have combined Roberts, Wang, and Sharp. The motivation to combine Roberts, Wang, and Sharp would be to automatically generating lyrics of songs using speech recognition (Sharp: Paragraph 2).

With respect to claim 2, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 1, wherein the circuitry is further configured to:
receive a plurality of first audio segments associated with the determined first audio track from the audio capturing device at the live event (Roberts: Paragraphs 40, 47 and 48 – retrieve a list of audio pieces for the identified performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance, identify lyrics; Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 31 and 33 – feature extraction on sample to create fingerprint, metadata of media file includes lyrics; Paragraph 78 – rendering multiple media streams; Sharp: Paragraphs 9 and 36 – generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions, extracting lyrics);
extract third text information from the third lyrics information for each of the plurality of first audio segments (Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 31 and 33 – feature extraction on sample to create fingerprint, metadata of media file includes lyrics; Paragraph 78 – rendering multiple media streams; Sharp: Paragraphs 9 and 36 – generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions, extracting lyrics); and
control the display screen to display the extracted third text information based on playback of each of the plurality of first audio segments (Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 31 and 33 – feature extraction on sample to create fingerprint, metadata of media file includes lyrics; Paragraph 78 – rendering multiple media streams; Paragraphs 86, 95, and 99 – display on the client device time annotated music; Sharp: Paragraphs 9 and 36 – generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions, extracting lyrics).

With respect to claim 3, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 1, wherein the identification information associated with the first performer-of-interest of the set of audio tracks is determined based on at least one of a geo-location of the live event, date-time information of the live event, or a user input associated with the first performer-of-interest (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Figure 5).

With respect to clam 4, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 1, wherein each of a first audio characteristic of the received first audio segment and a second audio characteristic of the first audio portion of each of the retrieved set of audio tracks is a combination of a plurality of audio parameters (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Figure 5).

With respect to claim 7, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 1, wherein the circuitry is further configured to:
receive a second audio segment associated with a second performer-of-interest from the audio capturing device at the live event (Roberts: Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed, which is a second segment; Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams, which is multiple audio segment);
determine a second audio track from the retrieved set of audio tracks based on a comparison between a first audio characteristic of the received second audio segment and a second audio characteristic of a second audio portion of each of the retrieved first set of audio tracks (Roberts: Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed, which is a second segment; Robert: Paragraphs 15, 49, 52, and 56 – plurality of audio parameters including tempo, vocal timber, vocal strength, vibrato, instrument tuning, ambient noise, reverberation, distortions, instrumentation, intonation, pitch, tone etc.; Figure 5; Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams, which is multiple audio segment; Figures 3 and 5); and
control the display screen to display fourth lyrics information of the determined second audio track, wherein the first audio track is different from the second audio track (Roberts: Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed, which is a second segment; Figure 5; Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams, which is multiple audio segment; Sharp: Paragraphs 9 and 36 – generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions, extracting lyrics).

With respect to claim 8, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 7, wherein the first performer-of-interest and the second performer-of-interest are same (Roberts: Paragraphs 2, 14, and 15 – performer can be a soloist or a musical group or a theater troupe with plurality of different performers, identify performer of live version, identify performer based on concert program).

With respect to claim 9, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 8, wherein the first performer-of-interest and the second performer-of-interest are different (Roberts: Paragraphs 2, 14, and 15 – performer can be a soloist or a musical group or a theater troupe with plurality of different performers, identify performer of live version, identify performer based on concert program).
.
With respect to claim 10, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 1, wherein the circuitry is further configured to:
generate notification information associated with the start position of the determined first audio track (Wang: paragraph 17 – client device sends media sample to position identification module to determine a time offset indicating a time position in the media stream corresponding to the sampling time of the media sample; Paragraphs 43 and 75 – return identity of the media sample and time offset; Paragraphs 86, 95, and 99 – display on the client device time annotated music; Figures 3 and 5); and
control the display screen to display of the generated notification information (Wang: Paragraph 17 – client device sends media sample to position identification module to determine a time offset indicating a time position in the media stream corresponding to the sampling time of the media sample; Paragraphs 43 and 75 – return identity of the media sample and time offset; Paragraphs 86, 95, and 99 – display on the client device time annotated music; Figures 3 and 5).

With respect to claim 11, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 1, wherein the circuitry is further configured to:
determine offset information between a first audio characteristic of the received first audio segment and a second audio characteristic of the determined first audio track, wherein the offset information indicates a deviation between at least one audio parameter of each of the first audio characteristic and the second audio characteristic (Roberts: Paragraphs 40, 47 and 48 – retrieve a list of audio pieces for the identified performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance, identify lyrics; Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed; Figure 5; Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, determining a time offset indicating a time position in the media stream corresponding to the sampling time, calculating a real-time offset using teal-time timestamp, the timestamp of the media sample, and the time offset, identifying a time offset indicating a time position in a song, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize, rendering a second media stream at a position corresponding to real-time offset to be in synchrony to the media stream; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams; Paragraphs 86, 95, and 99 – display on the client device time annotated music; Sharp: Paragraphs 9 and 36 – generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions, extracting lyrics);
generate a second audio track based on the determined first audio track and the offset information (Roberts: Paragraphs 40, 47 and 48 – retrieve a list of audio pieces for the identified performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance, identify lyrics; Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed; Figure 5; Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, determining a time offset indicating a time position in the media stream corresponding to the sampling time, calculating a real-time offset using teal-time timestamp, the timestamp of the media sample, and the time offset, identifying a time offset indicating a time position in a song, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize, rendering a second media stream at a position corresponding to real-time offset to be in synchrony to the media stream; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams; Paragraphs 86, 95, and 99 – display on the client device time annotated music; Figures 3 and 5); and
update the set of audio tracks based on the generated second audio track (Roberts: Paragraphs 40, 47 and 48 – retrieve a list of audio pieces for the identified performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance, identify lyrics; Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed; Figure 5; Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, determining a time offset indicating a time position in the media stream corresponding to the sampling time, calculating a real-time offset using teal-time timestamp, the timestamp of the media sample, and the time offset, identifying a time offset indicating a time position in a song, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize, rendering a second media stream at a position corresponding to real-time offset to be in synchrony to the media stream; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams; Paragraphs 86, 95, and 99 – display on the client device time annotated music; Figures 3 and 5).

With respect to claim 12, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 11, wherein the circuitry is further configured to display the determined offset information on the display screen (Wang: paragraph 17 – client device sends media sample to position identification module to determine a time offset indicating a time position in the media stream corresponding to the sampling time of the media sample; Paragraphs 43 and 75 – return identity of the media sample and time offset; Paragraphs 86, 95, and 99 – display on the client device time annotated music; Figures 3 and 5).

With respect to claim 13, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 1, wherein the first audio segment comprises a plurality of audio portions, the plurality of audio portions is associated with a plurality of audio sources at the live event (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – retrieve a list of audio pieces corresponding to the performer; Paragraphs 25-33 and 40 – recording a live segment of the audio piece, determining one or more live fingerprints of segments of audio pieces being performed; Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream; Paragraphs 17-19 –media rendering source include live performance as a source of audio), and the circuitry is further configured to:
receive a user input to select at least one of the plurality of audio sources (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer, retrieve a list of audio pieces corresponding to the performer; Paragraphs 25-33 and 40 – obtain identifier of an audio piece, recording a live segment of the audio piece, generate live fingerprint, query using live fingerprint to obtain metadata of the audio piece, obtain identification information; Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed; Figure 5; Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio);
extract a set of audio portions from the plurality of audio portions based on the received user input, wherein the set of audio portions are associated with the at least one of the plurality of audio sources (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio);
control the display screen to display a plurality of audio notes for the extracted set of audio portions (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio); and
output the extracted set of audio portions through a speaker, wherein the speaker is associated with the electronic device (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio).

With respect to claim 14, Roberts discloses a method, comprising:
in an electronic device (Roberts: Paragraph 90; Figure 10):
determining identification information associated with a performer-of-interest at a live event (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc., retrieve a list of audio pieces corresponding to the performer; Paragraphs 25-33 and 40 – obtain identifier of an audio piece, recording a live segment of the audio piece, generate live fingerprint, query using live fingerprint to obtain metadata of the audio piece, obtain identification information; Figure 5);
retrieving a set of audio tracks from a plurality of audio tracks based on the determined identification information, wherein the first set of audio tracks are associated with the performer-of-interest (Roberts: Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5);
receiving an audio segment associated with the performer-of-interest from an audio capturing device at the live event (Roberts: Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5);
comparing the first text information with second text information, wherein the second text information is associated with an audio portion of each of the retrieved set of audio tracks (Roberts: Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5; here Roberts discloses comparing first information with second information, where second information is associated with a first audio portion of each of the retrieved reference audio tracks, but Roberts does not explicitly disclose comparing first text information with second text information, however, the Sharp reference discloses the features, as discussed below);
determining a first audio track from the retrieved set of audio tracks based on the comparison between the first text information and the second text information (Roberts: Paragraphs 47 and 48 – retrieve a list of audio pieces for the identified performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5; here Roberts discloses determining a first audio track from the retrieved set of audio tracks based on comparison between first information and second information, but Roberts does not explicitly disclose comparison between first text information with second text information, however, the Sharp reference discloses the features, as discussed below);
Roberts discloses identifying and comparing lyrics information in audio segments, however, Roberts does not explicitly disclose:
identifying a start position of the determined audio track based on the received audio segment,
identifying third lyrics information associated with the determined audio track; and
controlling a display screen to display the third lyrics information of the determined audio track based on the identified start position.
The Wang reference discloses identifying a start position of the determined audio track based on the received audio segment (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, determining a time offset indicating a time position in the media stream corresponding to the sampling time, calculating a real-time offset using teal-time timestamp, the timestamp of the media sample, and the time offset, identifying a time offset indicating a time position in a song; Paragraphs 17-19 and 91 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, identify the playing music, media rendering source include live performance as a source of audio, identify beginning of audio sample),
identifying lyrics information associated with the determined audio track (Wang: Paragraphs 5 and 6 – identifying a time offset indicating a time position in a song, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 and 91 – determine timestamp corresponding to a sampling time of media sample captured, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point); and
controlling a display screen to display the lyrics information of the determined audio track based on the identified start position (Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts and Wang, to have combined Roberts and Wang. The motivation to combine Roberts and Wang would be to display lyric information to ambiently playing music by retrieving media information and performing lyric synchronization (Wang: Paras 2 and 18).
Roberts discloses comparing a first fingerprint information with a second fingerprint information and Wang discloses receiving textual information corresponding to lyrics information associated with songs and synchronizing lyrics with an audio sample, however, Roberts and Wang do not explicitly disclose: 
converting the received first audio segment to first text information;
comparing the first text information with second text information;
the first text information corresponds to a portion of first lyrics information associated with the received first audio segment, and
the second text information corresponds to a portion of second lyrics information associated with the first audio portion of each of the retrieved set of audio tracks;
The Sharp reference discloses converting a received first audio segment to first text information (Sharp: Paragraph 2 – automatic generation of lyrics of songs using speech recognition; Paragraphs 9 and 36 – receiving audio input of a song, transcribing a plurality of words from vocal content using speech recognition, generating lyric by converting each word to text; Paragraph 51 – audio input can correspond to a single track or multiple tracks);
comparing the first text information with second text information (Sharp: Paragraphs 37 and 48 – verifying lyrics with popular lyric databases, searching lyric archives and matching lyrics with the generated lyrics on a word by word comparison, comparing the lyrics or portion of the lyrics with one or more pre-existing versions);
the first text information corresponding to a portion of first lyrics information associated with the received first audio segment (Sharp: Paragraphs 9 and 36 – receiving audio input of a song, transcribing a plurality of words from vocal content using speech recognition, generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions), and
the second text information corresponding to a portion of second lyrics information associated with a first audio portion of each of a retrieved set of audio tracks (Sharp: Paragraphs 37 and 48 – verifying lyrics with popular lyric databases, searching lyric archives for one or more pre-existing versions of the lyrics and matching the lyrics with the generated lyrics on a word by word comparison, comparing the lyrics or portion of the lyrics with one or more pre-existing versions; Paragraphs 54, 55, and 59 – retrieve pre-existing lyrics for versions of songs from database, compare lyric with pre-existing lyrics).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts, Wang, and Sharp, to have combined Roberts, Wang, and Sharp. The motivation to combine Roberts, Wang, and Sharp would be to automatically generating lyrics of songs using speech recognition (Sharp: Paragraph 2).

With respect to claim 15, Roberts in view of Wang and in further view of Sharp discloses the method according to claim 14, wherein the identification information associated with the performer-of-interest of the set of audio tracks is determined based on at least one of a geo-location of the live event, date-time information of the live event, or a user input associated with the performer-of-interest (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Figure 5).

With respect to claim 16, Roberts in view of Wang and in further view of Sharp discloses the method according to claim 14, further comprising:
receiving a plurality of audio segments associated with the determined first audio track from the audio capturing device at the live event (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer, retrieve a list of audio pieces corresponding to the performer; Paragraphs 25-33 and 40 – obtain identifier of an audio piece, recording a live segment of the audio piece, generate live fingerprint, query using live fingerprint to obtain metadata of the audio piece, obtain identification information; Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed; Figure 5; Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio);
extracting third text information from the third lyrics information for each of the plurality of audio segments (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Sharp: Paragraphs 9 and 36 – generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions, extracting lyrics); and
controlling the display screen to display the extracted third text information based on playback of each of the plurality of audio segments (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio).

With respect to claim 17, Roberts in view of Wang and in further view of Sharp discloses the method according to claim 14, further comprising:
determining offset information between a first audio characteristic of the received audio segment and a second audio characteristic of the determined first audio track, wherein the offset information indicates a deviation between at least one audio parameter of each of the first audio characteristic and the second audio characteristic (Roberts: Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed, which is a second segment; Figure 5; Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams, which is multiple audio segment; Figures 3 and 5);
generating a second audio track based on the determined first audio track and the offset information (Roberts: Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed, which is a second segment; Figure 5; Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams, which is multiple audio segment; Figures 3 and 5); and
updating the set of audio tracks based on the generated second audio track (Roberts: Paragraph 30 – determining one or more live fingerprints of segments of audio pieces being performed, which is a second segment; Figure 5; Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point, media rendering source include live performance as a source of audio; Paragraph 78 – rendering multiple media streams, which is multiple audio segment; Figures 3 and 5).

With respect to claim 18, Roberts in view of Wang and in further view of Sharp discloses the method according to claim 14, wherein each of a first audio characteristic of the received audio segment and a second audio characteristic of the audio portion of each of the retrieved set of audio tracks is a combination of a plurality of audio parameters (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Figure 5).

With respect to claim 20, Roberts discloses a non-transitory computer-readable medium having stored thereon, computer-executable instructions which, when executed by an electronic device, cause the electronic device to execute operations (Roberts: Paragraph 90; Figure 10), the operations comprising:
determining identification information associated with a performer-of-interest at a live event (Roberts: Paragraph 10 – during live performance of audio pieces identify an audio piece during performance; Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc., retrieve a list of audio pieces corresponding to the performer; Paragraphs 25-33 and 40 – obtain identifier of an audio piece, recording a live segment of the audio piece, generate live fingerprint, query using live fingerprint to obtain metadata of the audio piece, obtain identification information; Figure 5);
retrieving a set of audio tracks from a plurality of audio tracks based on the determined identification information, wherein the set of audio tracks are associated with the performer-of-interest (Roberts: Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5);
receiving an audio segment associated with the performer-of-interest from an audio capturing device at the live event (Roberts: Paragraphs 14, 15, and 46 – identify one or more characteristics of the audio piece, identify performer of the live version using live fingerprint containing characteristics, identify performer based on venue, current date and time, geolocation etc.; Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5);
comparing the first text information with second text information, wherein the second text information is associated with an audio portion of each of the retrieved set of audio tracks (Roberts: Paragraphs 47 and 48 – retrieve a list of audio pieces for the performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5; here Roberts discloses comparing first information with second information, where second information is associated with a first audio portion of each of the retrieved reference audio tracks, but Roberts does not explicitly disclose comparing first text information with second text information, however, the Sharp reference discloses the features, as discussed below);
determining an audio track from the retrieved set of audio tracks based on the comparison between the first text information and the second text information (Roberts: Paragraphs 47 and 48 – retrieve a list of audio pieces for the identified performer, identify reference versions of audio pieces corresponding to the performer and reference fingerprints, compare live fingerprint with reference fingerprints for matches, identify audio being performed live based on match, reference audio pieces can be retrieved before start of performance; Figure 5; here Roberts discloses determining a first audio track from the retrieved set of audio tracks based on comparison between first information and second information, but Roberts does not explicitly disclose comparison between first text information with second text information, however, the Sharp reference discloses the features, as discussed below);
Roberts discloses identifying and comparing lyrics information in audio segments, however, Roberts does not explicitly disclose:
identifying a start position of the determined audio track based on the received audio segment,
identifying third lyrics information associated with the determined audio track; and
controlling a display screen to display the third lyrics information of the determined audio track based on the identified start position.
The Wang reference discloses identifying a start position of the determined audio track based on the received audio segment (Wang: Paragraphs 5 and 6 – receiving a media sample of a media stream associated with a timestamp corresponding to a sampling time, determining a time offset indicating a time position in the media stream corresponding to the sampling time, calculating a real-time offset using teal-time timestamp, the timestamp of the media sample, and the time offset, identifying a time offset indicating a time position in a song; Paragraphs 17-19 and 91 – determine timestamp corresponding to a sampling time of media sample captured, sampling time can be the beginning, identify the playing music, media rendering source include live performance as a source of audio, identify beginning of audio sample),
identifying lyrics information associated with the determined audio track (Wang: Paragraphs 5 and 6 – identifying a time offset indicating a time position in a song, receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 and 91 – determine timestamp corresponding to a sampling time of media sample captured, display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point); and
controlling a display screen to display the lyrics information of the determined audio track based on the identified start position (Wang: Paragraphs 5 and 6 – receiving textual lyrics of the song, rendering the lyrics at a position corresponding to real-time offset to synchronize; Paragraphs 17-19 – display lyrics synchronized to ambiently playing music using a mobile music information retrieval device, identify the playing music, retrieve and display corresponding lyrics synchronized to current time point).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts and Wang, to have combined Roberts and Wang. The motivation to combine Roberts and Wang would be to display lyric information to ambiently playing music by retrieving media information and performing lyric synchronization (Wang: Paras 2 and 18).
Roberts discloses comparing a first fingerprint information with a second fingerprint information and Wang discloses receiving textual information corresponding to lyrics information associated with songs and synchronizing lyrics with an audio sample, however, Roberts and Wang do not explicitly disclose: 
converting the received first audio segment to first text information;
comparing the first text information with second text information;
the first text information corresponds to a portion of first lyrics information associated with the received first audio segment, and
the second text information corresponds to a portion of second lyrics information associated with the first audio portion of each of the retrieved set of audio tracks;
The Sharp reference discloses converting a received first audio segment to first text information (Sharp: Paragraph 2 – automatic generation of lyrics of songs using speech recognition; Paragraphs 9 and 36 – receiving audio input of a song, transcribing a plurality of words from vocal content using speech recognition, generating lyric by converting each word to text; Paragraph 51 – audio input can correspond to a single track or multiple tracks);
comparing the first text information with second text information (Sharp: Paragraphs 37 and 48 – verifying lyrics with popular lyric databases, searching lyric archives and matching lyrics with the generated lyrics on a word by word comparison, comparing the lyrics or portion of the lyrics with one or more pre-existing versions);
the first text information corresponding to a portion of first lyrics information associated with the received first audio segment (Sharp: Paragraphs 9 and 36 – receiving audio input of a song, transcribing a plurality of words from vocal content using speech recognition, generating lyric by converting each word to text; Paragraphs 37 and 48 – comparing the lyrics or portion of the lyrics with one or more pre-existing versions), and
the second text information corresponding to a portion of second lyrics information associated with a first audio portion of each of a retrieved set of audio tracks (Sharp: Paragraphs 37 and 48 – verifying lyrics with popular lyric databases, searching lyric archives for one or more pre-existing versions of the lyrics and matching the lyrics with the generated lyrics on a word by word comparison, comparing the lyrics or portion of the lyrics with one or more pre-existing versions; Paragraphs 54, 55, and 59 – retrieve pre-existing lyrics for versions of songs from database, compare lyric with pre-existing lyrics).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts, Wang, and Sharp, to have combined Roberts, Wang, and Sharp. The motivation to combine Roberts, Wang, and Sharp would be to automatically generating lyrics of songs using speech recognition (Sharp: Paragraph 2).

Claims 5 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Roberts (US Pub 2015/0302086) in view of Wang (US Pub 2011/0276333) in view of Sharp (US Pub 2018/0366097) and in further view of Mishra (US Pub 2018/0144746).

With respect to claim 5, Roberts in view of Wang and in further view of Sharp discloses the electronic device according to claim 4, wherein the plurality of audio parameters comprises a loudness parameter, a pitch parameter, a tone parameter, a rate-of-speech parameter, a voice quality parameter, a phonetic parameter, an intonation parameter, an intensity of overtones, a voice modulation parameter, a pronunciation parameter, a prosody parameter, a timbre parameter, and at least one psychoacoustic parameter (Robert: Paragraphs 15, 49, 52, and 56 – audio parameters including tempo, vocal timber, vocal strength, vibrato, instrument tuning, ambient noise, reverberation, distortions, instrumentation, intonation, pitch, tone etc.; here Robert discloses a plurality of audio parameters, but Roberts, Wang, and Sharp do not explicitly disclose a rate-of-speech parameter, a phonetic parameter, an intensity of overtones, a pronunciation parameter, a prosody parameter, and one or more psychoacoustic parameters, however, the Mishra reference discloses the features, as discussed below).
Roberts discloses a plurality of audio parameters, however, Roberts, Wang, and Sharp do not explicitly disclose:
audio parameters comprises a rate-of-speech parameter, a phonetic parameter, an intensity of overtones, a pronunciation parameter, a prosody parameter, and at least one psychoacoustic parameter.
The Mishra reference discloses audio parameters comprises a rate-of-speech parameter, a phonetic parameter, an intensity of overtones, a pronunciation parameter, a prosody parameter, and at least one psychoacoustic parameter (Mishra: Paragraphs 15, 18, 19, 42, 46, 54-56, 60, 77, 102 – audio features include timbre including tone color, tone quality, or other psychoacoustic sound quality, rate of speech, volume, voice quality, prosody including properties of spoken syllables, tone, intonation, stress, rhythm, cadence etc., vocal register and vocal resonance including intensity of voice, pitch, speech loudness or rate differentiation between music and speech, vocal identification, identifying emotions from speech and voice, language analysis).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts, Wang, Sharp, and Mishra, to have combined Roberts, Wang, Sharp, and Mishra. The motivation to combine Roberts, Wang, Sharp, and Mishra would be to perform audio analysis of content using audio features (Mishra: Paragraph 19).

With respect to claim 19, Roberts in view of Wang and in further view of Sharp discloses the method according to claim 18, wherein the plurality of audio parameters comprises a loudness parameter, a pitch parameter, a tone parameter, a rate-of-speech parameter, a voice quality parameter, a phonetic parameter, an intonation parameter, an intensity of overtones, a voice modulation parameter, a pronunciation parameter, a prosody parameter, a timbre parameter, and at least one psychoacoustic parameter (Robert: Paragraphs 15, 49, 52, and 56 – audio parameters including tempo, vocal timber, vocal strength, vibrato, instrument tuning, ambient noise, reverberation, distortions, instrumentation, intonation, pitch, tone etc.; here Robert discloses a plurality of audio parameters, but Roberts, Wang, and Sharp do not explicitly disclose a rate-of-speech parameter, a phonetic parameter, an intensity of overtones, a pronunciation parameter, a prosody parameter, and one or more psychoacoustic parameters, however, the Mishra reference discloses the features, as discussed below).
Roberts discloses a plurality of audio parameters, however, Roberts, Wang, and Sharp do not explicitly disclose:
audio parameters comprises a rate-of-speech parameter, a phonetic parameter, an intensity of overtones, a pronunciation parameter, a prosody parameter, and at least one psychoacoustic parameter.
The Mishra reference discloses audio parameters comprises a rate-of-speech parameter, a phonetic parameter, an intensity of overtones, a pronunciation parameter, a prosody parameter, and at least one psychoacoustic parameter (Mishra: Paragraphs 15, 18, 19, 42, 46, 54-56, 60, 77, 102 – audio features include timbre including tone color, tone quality, or other psychoacoustic sound quality, rate of speech, volume, voice quality, prosody including properties of spoken syllables, tone, intonation, stress, rhythm, cadence etc., vocal register and vocal resonance including intensity of voice, pitch, speech loudness or rate differentiation between music and speech, vocal identification, identifying emotions from speech and voice, language analysis).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Roberts, Wang, Sharp, and Mishra, to have combined Roberts, Wang, Sharp, and Mishra. The motivation to combine Roberts, Wang, Sharp, and Mishra would be to perform audio analysis of content using audio features (Mishra: Paragraph 19).


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to REZWANUL MAHMOOD whose telephone number is (571)272-5625. The examiner can normally be reached M-F 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571-272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/R.M/Examiner, Art Unit 2164                                                                                                                                                                                                        
January 18, 2022

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164