DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on August 27, 2020 is/are being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The broadest reasonable interpretation of “computer readable storage medium” includes both transitory storage, such as propagating signals, and non-transitory storage. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because it claims a computer-readable storage medium, without expressly excluding transitory propagating signals.  While the specification describes non-limiting examples of computer-readable storage media, it fails to provide a definition of the computer-readable storage medium such that, when read in light of the specification, claim 11 excludes said signals. Further, the plain language of claim 11 fails to include limitations to exclude said signals. Therefore, under the broadest reasonable interpretation, claim 11 incorporates non-statutory subject matter and is rejected.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, 2, 6, 10, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Alakoye (U.S. Pat. App. Pub. No. 2019/0294630 , hereinafter Alakoye) in view of Duan (CN 105120304 A, hereinafter Duan).

Regarding claim 1, Alakoye discloses A live broadcast room display method, comprising ("media search and presentation service" presented through "a client device 301"; Alakoye, ¶¶ [0025]) : acquiring a speech signal within a set duration of at least one live broadcast room under a target classification label ("The service may identify any number of audio content sources 304 {at least one live broadcast}, and it may monitor audio streams from the identified sources 305 {under a target classification label}" and "For each of the audio streams, when monitoring the stream the system will … [capture snippets of] the audio stream" where the audio stream comprises speech {acquiring a speech signal...} and the "snippet[s] of the audio stream" are of a limited duration "such as 1 second, 5 seconds, 30 seconds, 1 minute, 3 minutes, 5 minutes, or another time period {within a set duration of at least one live broadcast}"; Alakoye, ¶¶ [0027]); inputting the speech signal within a set duration of the at least one live broadcast room into a speech detection model to obtain a speech signal that satisfies a set type condition ("For each of the audio streams, when monitoring the stream the system will use a speech-to-text converter to capture a sequence of speech-to-text segments 306 of the audio stream."; Alakoye, ¶¶ [0027]); and arranging and displaying the at least one live broadcast room in a display interface corresponding to the target classification label according to the display identifier ("The system may order the results (i.e., the list of sources 403, 503) on the display using any suitable ordering scheme. For example, as a default the system may present the list of sources such that the source that most recently included content relevant to the search request is listed first, the source that next most recently included content relevant to the search request is listed second, and so on."; Alakoye, ¶¶ [0029]). However, Alakoye fails to expressly recite inputting the speech signal within a set duration of the at least one live broadcast room into a speech detection model to obtain a speech signal that satisfies a set type condition; [and] adding a display identifier to a live broadcast room corresponding to the speech signal of the set type condition.
Duan teaches an information display method including song detection. (Duan, Abstract). Regarding claim 1, Duan teaches inputting the speech signal within a set duration of the at least one live broadcast room into a speech detection model to obtain a speech signal that satisfies a set type condition ("the server collects at least one of live voice {a speech signal}… [and] obtaining each of live speech information every predetermined time {within a set duration of the at least one live broadcast room}" where the live voice is input into "the corresponding relation server recognition technology {a speech detection model...}" which detects "by song names of songs corresponding to each live voice, and storing the name {to obtain a speech signal that satisfies a set type condition}" where the set type condition is singing one or more songs; Duan, ¶¶ [0099]-[0102]); adding a display identifier to a live broadcast room corresponding to the speech signal of the set type condition (According to an exemplary embodiment, "server according to song identification technology to identify the live speech A corresponding to the song name is ‘small apple’, live voice B corresponding to the song name is ‘likes’, of live voice C corresponding to the song name is ‘small apple,’ of live voice D corresponding to the song name is ‘father anywhere’." where "displayed in the name is a music play interface of the client end obtaining the song, the entrance of the corresponding is displayed on the display position corresponding to each song name, as shown in FIG. 4B."; Duan, ¶¶ [0124], [0161]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye to incorporate the teachings of Duan to include inputting the speech signal within a set duration of the at least one live broadcast room into a speech detection model to obtain a speech signal that satisfies a set type condition; [and] adding a display identifier to a live broadcast room corresponding to the speech signal of the set type condition. The identification system described in Duan “improves access rate” and “reduces the waste of server resources,” as recognized by Duan. (Duan, ¶ [0067]).

Regarding claim 2, the rejection of claim 1 is incorporated. Alakoye disclose all of the elements of the current invention as stated above. However, Alakoye fail(s) to expressly recite wherein the set type condition comprises a singing condition.
The relevance of Duan is described above with relation to claim 1. Regarding claim 2, Duan teaches wherein the set type condition comprises a singing condition ("the server collects at least one of live voice {a speech signal}… [and] obtaining each of live speech information every predetermined time {within a set duration of the at least one live broadcast room}" where the live voice is input into "the corresponding relation server recognition technology {a speech detection model...}" which detects "by song names of songs corresponding to each live voice, and storing the name {to obtain a speech signal that satisfies a set type condition}" where the set type condition is singing one or more songs.; Duan, ¶¶ [0099]-[0102]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye to incorporate the teachings of Duan to include wherein the set type condition comprises a singing condition. The identification system described in Duan “improves access rate” and “reduces the waste of server resources,” as recognized by Duan. (Duan, ¶ [0067]).

Regarding claim 6, the rejection of claim 2 is incorporated. Alakoye disclose all of the elements of the current invention as stated above. Alakoye further discloses topping the target live broadcast room in the display interface corresponding to the target classification label ("The system may order the results (i.e., the list of sources 403, 503) on the display using any suitable ordering scheme. For example, as a default the system may present the list of sources such that the source that most recently included content relevant to the search request is listed first, the source that next most recently included content relevant to the search request is listed second, and so on."; Alakoye, ¶¶ [0029]). However, Alakoye fail(s) to expressly recite wherein the arranging and displaying the at least one live broadcast room in the display interface corresponding to the target classification label according to the display identifier comprises: acquiring a target live broadcast room with the display identifier added.
The relevance of Duan is described above with relation to claim 1. Regarding claim 6, Duan teaches wherein the arranging and displaying the at least one live broadcast room in the display interface corresponding to the target classification label according to the display identifier comprises: acquiring a target live broadcast room with the display identifier added (According to an exemplary embodiment, "server according to song identification technology to identify the live speech A corresponding to the song name is ‘small apple’, live voice B corresponding to the song name is ‘likes’, of live voice C corresponding to the song name is ‘small apple,’ of live voice D corresponding to the song name is ‘father anywhere’," where adding the identifier to a target live broadcast room is acquiring the target live broadcast room with the display identifier added.; Duan, ¶¶ [0124], [0161]); and topping the target live broadcast room in the display interface corresponding to the target classification label ("Optionally, in the preset time, the server transmits the presenter of each singing a song name ordering from high to low according to times, singing a song name list, live pop song name list in the song name list name of the front n, such as n=200 or n=100."; Duan, ¶¶ [0179]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye to incorporate the teachings of Duan to include wherein the arranging and displaying the at least one live broadcast room in the display interface corresponding to the target classification label according to the display identifier comprises: acquiring a target live broadcast room with the display identifier added. The identification system described in Duan “improves access rate” and “reduces the waste of server resources,” as recognized by Duan. (Duan, ¶ [0067]).

Regarding claim 10, the rejection of claim 1 is incorporated. Alakoye further discloses A computer device, comprising (The service 101 implemented in a server and interacting with a “client device 301”; Alakoye, ¶ [0024]-[0025], [0039]); at least one processor; and a memory, which is configured to store at least one program; wherein when executed by the at least one processor, the at least one program enables the at least one processor ( “The digital media search and presentation service 101 will include a processor, and it will include or be communicatively connected to a memory containing programming instructions that are configured to cause the service's processor to perform some or all of the functions described in this document.”; Alakoye, ¶ [0020]) to implement the live broadcast room display method of claim 1 (The rejection of claim 1, recited above, is incorporated by reference herein).

Regarding claim 11, the rejection of claim 1 is incorporated. Alakoye further discloses A computer-readable storage medium; which is configured to store a computer program; wherein when executed by a processor, the at least one program enables the at least one processor ( “The digital media search and presentation service 101 will include a processor, and it will include or be communicatively connected to a memory containing programming instructions that are configured to cause the service's processor to perform some or all of the functions described in this document.”; Alakoye, ¶ [0020]) to implement the live broadcast room display method of claim 1 (The rejection of claim 1, recited above, is incorporated by reference herein).

Claims 3 and 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Alakoye and Duan as applied to claim 2 above, and further in view of Non-patent Literature to Schlüter (J. Schlüter and R. Sonnleitner. “Unsupervised feature learning for speech and music detection in radio broadcasts.” In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx), York, UK, Sept. 2012., hereinafter Schlüter).

Regarding claim 3, the rejection of claim 2 is incorporated. Alakoye and Duan disclose all of the elements of the current invention as stated above. However, Alakoye and Duan fail to expressly recite wherein the speech detection model is obtained by training a set deep learning model using singing type speech signal samples and non-singing type speech signal samples.
Schlüter teaches “speech and a music detector based on an mcRBM.” (Schlüter, pg. 1, para. 4). Regarding claim 3, Schlüter teaches wherein the speech detection model is obtained by training a set deep learning model using singing type speech signal samples and non-singing type speech signal samples (The system includes a deep neural network trained using a "dataset consist[ing] of 42 hours of radio broadcasts finely segmented (with a resolution of 200 ms) into speech/nonspeech and music/nonmusic sections." where music and nonmusic speech samples includes singing type speech signal samples and non-singing type speech signal samples, respectively.; Schlüter, pg. 4, para 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye as modified by the song information display method of Duan to incorporate the teachings of Schlüter to include wherein the speech detection model is obtained by training a set deep learning model using singing type speech signal samples and non-singing type speech signal samples. The speech/music discrimination model improves accuracy over prior art systems in performing similar tasks, as recognized by Schlüter. (Schlüter, pg. 1, para. 6).

Regarding claim 4, the rejection of claim 2 is incorporated. Alakoye and Duan disclose all of the elements of the current invention as stated above. However, Alakoye and Duan fail to expressly recite before the inputting the speech signal within the set duration of the at least one live broadcast room into the speech detection model to obtain the speech signal that satisfies the set type condition, further comprising: respectively obtaining singing type speech signal samples and non-singing type speech signal samples; and training a set deep learning model using the singing type speech signal samples and the non- singing type speech signal samples to obtain the speech detection model.
The relevance of Schlüter is described above with relation to claim 3. Regarding claim 4, Schlüter teaches before the inputting the speech signal within the set duration of the at least one live broadcast room into the speech detection model to obtain the speech signal that satisfies the set type condition, further comprising (The training occurs before inputting the speech signal.; Schlüter, ¶¶ pg. 4, para 3): respectively obtaining singing type speech signal samples and non-singing type speech signal samples (Discloses obtaining "dataset consist[ing] of 42 hours of radio broadcasts finely segmented (with a resolution of 200 ms) into speech/nonspeech and music/nonmusic sections" where the annotation is performed in advance by paid students.; Schlüter, ¶¶ pg. 4, para 3); and training a set deep learning model using the singing type speech signal samples and the non- singing type speech signal samples to obtain the speech detection model (The system includes a deep learning model (mcRMB), trained prior to use, using a "dataset consist[ing] of 42 hours of radio broadcasts finely segmented (with a resolution of 200 ms) into speech/nonspeech and music/nonmusic sections." where music and nonmusic speech samples includes singing type speech signal samples and non-singing type speech signal samples, respectively; Schlüter, ¶¶ pg. 4, para 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye as modified by the song information display method of Duan to incorporate the teachings of Schlüter to include before the inputting the speech signal within the set duration of the at least one live broadcast room into the speech detection model to obtain the speech signal that satisfies the set type condition, further comprising: respectively obtaining singing type speech signal samples and non-singing type speech signal samples; and training a set deep learning model using the singing type speech signal samples and the non- singing type speech signal samples to obtain the speech detection model. The speech/music discrimination model improves accuracy over prior art systems in performing similar tasks, as recognized by Schlüter. (Schlüter, pg. 1, para. 6).

Claims 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Alakoye, Duan, and Schlüter as applied to claim 4 above, and further in view of Alhakimi (U.S. Pat. App. Pub. No. 2015/0310107, hereinafter Alhakimi ).

Regarding claim 5, the rejection of claim 4 is incorporated. Alakoye and Duan disclose all of the elements of the current invention as stated above. However, Alakoye and Duan fail to expressly recite wherein the respectively obtaining the singing type speech signal samples and the non-singing type speech signal samples comprises: calling a search engine interface to search for and download a plurality of audio files matched with set keywords corresponding to the singing type and the non-singing type respectively; randomly extracting a set number of audio files from a plurality of singing type audio files to configure as singing type speech signal samples; and randomly extracting a set number of non-singing type audio files from a plurality of non-singing type audio files to configure as non-singing type speech signal samples
The relevance of Schlüter is described above with relation to claim 3. Regarding claim 5, Schlüter teaches wherein the respectively obtaining the singing type speech signal samples and the non-singing type speech signal samples comprises:...download a plurality of audio files… corresponding to the singing type and the non-singing type respectively ("The remaining 12 hours" of the 42 hours of total radio broadcasts "have been captured from lower-bitrate web streams {download a plurality of audio files} of 4 Austrian radio stations (Ö1, Ö3, FM 4, Life Radio) as continuous 3-hour recordings, again covering different music styles and two languages: Austrian German and English {corresponding to the singing type and non-singing type respectively}."; Schlüter, ¶¶ pg. 4, para 3); randomly extracting a set number of audio files from a plurality of singing type audio files to configure as singing type speech signal samples (The authors randomly extracted 15 hours of recording incorporating "453,120 training samples split into mini-batches of 128 data points" where "Speech and music detection were treated as two separate classification problems handled by two separately fine-tuned instances of the network" and "Each network was trained for 100 epochs, monitoring the classification error at threshold 0 on the validation set." which includes extracting a set number of audio files from the plurality of singing type audio files to configure as training samples {singing type speech signal samples}; Schlüter, ¶¶ pg. 4, para 3 and 5); and randomly extracting a set number of non-singing type audio files from a plurality of non-singing type audio files to configure as non-singing type speech signal samples (the above "453,120 training samples split into mini-batches of 128 data points" further includes extracting a set number of non-music audio files {non-singing type audio files} from the plurality of singing type audio files to configure as training samples {singing type speech signal samples}; Schlüter, ¶¶ pg. 4, para 3 and 5).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye as modified by the song information display method of Duan to incorporate the teachings of Schlüter to include wherein the respectively obtaining the singing type speech signal samples and the non-singing type speech signal samples comprises: ...download a plurality of audio files… corresponding to the singing type and the non-singing type respectively; randomly extracting a set number of audio files from a plurality of singing type audio files to configure as singing type speech signal samples; and randomly extracting a set number of non-singing type audio files from a plurality of non-singing type audio files to configure as non-singing type speech signal samples. The speech/music discrimination model improves accuracy over prior art systems in performing similar tasks, as recognized by Schlüter. (Schlüter, pg. 1, para. 6). However, Alakoye, Duan, and Schlüter fail to expressly recite wherein the respectively obtaining the singing type speech signal samples and the non-singing type speech signal samples comprises: calling a search engine interface to search for and… matched with set keywords.
Alhakimi teaches systems and methods for web searching for video and audio content. (Alhakimi, ¶ [0005]). Regarding claim 5, Alhakimi teaches wherein the respectively obtaining the singing type speech signal samples and the non-singing type speech signal samples comprises: calling a search engine interface to search for [downloading] and… matched with set keywords ("the search engine 200 includes an indexer 202 and a user search interface 204. Spiders 206 directed by the indexer 202 search the Web 1 for audio and video files or content (referred to herein as “A/V”) 2. The spiders 206 search, not just for titles and descriptions, but also for files 2A having associated or attached text, such as subtitles, captions, transcripts, or lyrics (collectively referred to herein as “associated text” or “text”). The associated text is indexed in the same way as content that is found in a text file. The indexed information is then stored in a data structure, such as a database 208, allowing a user 3 to use the associated text as a primary search field."; Alhakimi, ¶¶ [0017]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye as modified by the song information display method of Duan, as modified by the speech/music discrimination system of Schlüter to incorporate the teachings of Alhakimi to include wherein the respectively obtaining the singing type speech signal samples and the non-singing type speech signal samples comprises: calling a search engine interface to search for and… matched with set keywords. The systems and methods of Alhakimi provide a means for searching “video and audio content… regardless of the search engine used,” where the video or audio does not include a “title of the file or any attached description or metadata,” thus overcoming a deficiency in prior art systems when searching for unannotated files. (Alhakimi, ¶ [0004]-[0005]).

Claims 7 and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Alakoye and Duan as applied to claim 6 above, and further in view of Wang (CN 107172498 A, hereinafter Wang ).

Regarding claim 7, the rejection of claim 6 is incorporated. Alakoye and Duan disclose all of the elements of the current invention as stated above. Alakoye further discloses wherein the topping the target live broadcast room in the display interface corresponding to the target classification label comprises: acquiring a current speech signal of the target live broadcast room in real time ("The service may identify any number of audio content sources 304 {at least one live broadcast}, and it may monitor audio streams from the identified sources 305 {under a target classification label}" and "For each of the audio streams, when monitoring the stream the system will … [capture snippets of] the audio stream" where the audio stream comprises speech {acquiring a speech signal...} and the "snippet[s] of the audio stream" are of a limited duration "such as 1 second, 5 seconds, 30 seconds, 1 minute, 3 minutes, 5 minutes, or another time period {within a set duration of at least one live broadcast}"; Alakoye, ¶¶ [0027]). However, Alakoye fail(s) to expressly recite wherein the topping the target live broadcast room in the display interface corresponding to the target classification label comprises: acquiring a current speech signal of the target live broadcast room in real time, and acquiring matched song content according to the current speech signal; scoring the target live broadcast room according to a matching degree between the current speech signal and an audio feature of the song content; and arranging the target broadcast room according to... [matching degree], and topping the arranged target broadcast room in the display interface corresponding to the target classification label.
The relevance of Duan is described above with relation to claim 1. Regarding claim 7, Duan teaches wherein the topping the target live broadcast room in the display interface corresponding to the target classification label comprises: acquiring a current speech signal of the target live broadcast room in real time ("the server collects at least one of live voice"; Duan, ¶¶ [0099]), and acquiring matched song content according to the current speech signal ("the song identification technology to identify the name of the song corresponding to each live voice, and storing the corresponding relation between name of and songs"; Duan, ¶¶ [0102]); scoring the target live broadcast room according to a matching degree between the current speech signal and an audio feature of the song content (The system compares elements of the song, such as keywords, as "corresponding to each live voice. {a matching degree between the current speech signal and an audio feature of the song content}" As searching for a match includes a hypothesis regarding the match, Duan implicitly discloses scoring of the target live broadcast rooms in determining the degree of match for each of the live broadcast rooms.; Duan, ¶¶ [0102]); and arranging the target broadcast room according to... [matching degree] ("Optionally, in the preset time, the server transmits the presenter of each singing a song name ordering from high to low according to times, singing a song name list, live pop song name list in the song name list name of the front n, such as n=200 or n=100."; Duan, ¶¶ [0179]), and topping the arranged target broadcast room in the display interface corresponding to the target classification label (Duan shows topping of the arranged target broadcast room through the emphasized rooms depicted in FIGS. 3A, 4A, and 4B. The emphasis corresponds to the target classification label.; Duan, ¶¶ FIGS. 3A, 4A, and 4B.).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye to incorporate the teachings of Duan to include wherein the topping the target live broadcast room in the display interface corresponding to the target classification label comprises: acquiring a current speech signal of the target live broadcast room in real time, and acquiring matched song content according to the current speech signal; scoring the target live broadcast room according to a matching degree between the current speech signal and an audio feature of the song content; and arranging the target broadcast room according to... [matching degree], and topping the arranged target broadcast room in the display interface corresponding to the target classification label. The identification system described in Duan “improves access rate” and “reduces the waste of server resources,” as recognized by Duan. (Duan, ¶ [0067]). However, Alakoye and Duan fail to expressly recite scoring the target live broadcast room according to a matching degree between the current speech signal and an audio feature of the song content; and arranging the target broadcast room according to the score.
Wang teaches systems and methods of live room display. (Wang, Abstract). Regarding claim 7, Wang teaches scoring the target live broadcast room according to a matching degree between the current speech signal and an audio feature of the song content ("obtaining ordering metric parameter corresponding to the live room type" and "determining the ranking score of at least one live room corresponding to the type"; Wang, ¶¶ [0045]-[0049]); and arranging the target broadcast room according to the score ("ordering the at least one live room according to the sorting score of at least one live room."; Wang, ¶¶ [0055]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye as modified by the song information display method of Duan, to incorporate the teachings of Wang to include scoring the target live broadcast room according to a matching degree between the current speech signal and an audio feature of the song content; and arranging the target broadcast room according to the score. The systems of Wang can allow for the selection of “a live broadcasting room with a high popularity” which can improve user engagement and entice longer viewing, as recognized by Wang. (Wang, ¶ [0003]).

Regarding claim 8, the rejection of claim 7 is incorporated. Alakoye and Duan disclose all of the elements of the current invention as stated above. However, Alakoye fails to expressly recite after the acquiring the current speech signal of the target live broadcast room in real time, and acquiring the matched song content according to the current speech signal, further comprising: displaying a song name corresponding to the song content in an information display area corresponding to the target live broadcast room in the display interface corresponding to the target classification label.
The relevance of Duan is described above with relation to claim 1. Regarding claim 8, Duan teaches after the acquiring the current speech signal of the target live broadcast room in real time, and acquiring the matched song content according to the current speech signal, further comprising: displaying a song name corresponding to the song content in an information display area corresponding to the target live broadcast room in the display interface corresponding to the target classification label (at step 210e, thus after the acquiring the current speech signal of the target live broadcast room in real time, and acquiring the matched song content according to the current speech signal, "displayed in the name is a music play interface of the client end obtaining the song, the entrance of the corresponding is displayed on the display position corresponding to each song name, as shown in FIG. 4B." all of which corresponds to the singer target classification label, depicted at FIG. 4B.; Duan, ¶¶ [0161], FIG. 4B).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the digital media search and presentation system of Alakoye to incorporate the teachings of Duan to include after the acquiring the current speech signal of the target live broadcast room in real time, and acquiring the matched song content according to the current speech signal, further comprising: displaying a song name corresponding to the song content in an information display area corresponding to the target live broadcast room in the display interface corresponding to the target classification label. The identification system described in Duan “improves access rate” and “reduces the waste of server resources,” as recognized by Duan. (Duan, ¶ [0067]). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Xu (U.S. Pat. App. Pub. No. 2018/0041783) discloses systems and methods of modification of live broadcasts adding text information obtained from speech recognition into the broadcast data
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657