DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The applicant amended claims 1, 13 and 15-18, canceled claim 14 and added claims 19-23 in the amendment received on 8/26/2022.

The claims 1-13 and 15-23 are pending.

Response to Arguments
Applicant's arguments filed 8/26/2022 have been fully considered but they are not persuasive. 
I.	Applicants argue on page 9 of the remarks that, Rivera fails to disclose "actuating a first microphone couple to the first device to record a first audio clip, wherein the actuation is synchronized to the playback of the video content utilizing a first plurality of timestamps corresponding to the video content".
The Examiner respectfully disagrees with Applicant’s arguments because as shown below in the rejection Rivera teaches a method of recording a karaoke performance in which a karaoke performer sings a song through a first microphone connected to a jukebox that is playing the song.….The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox. The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times, with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  This as well as other parts of Rivera seem to clearly teach the argued limitation.

II.	Applicants also argue on page 9 of the remarks that, Rivera fails to disclose "actuating a second microphone coupled to the second device to record a second audio clip, wherein the actuation is synchronized to the playback of the video content based on a second plurality of timestamps corresponding to the video content".
The Examiner respectfully disagrees with Applicant’s arguments because as described above Rivera does teach the use of timestamps to sync the recording of audio to video playback in ¶ 23 among others.  Further the adding of a second device would be obvious to one of ordinary skill in the art in view of ¶ 67 and figure 3.  There Rivera describes accommodating network connections, such that the karaoke jukebox system 302 may include a network interface 318, which connects the karaoke jukebox system 302 to the AV network 304 and/or other outside resources. The network interface 318 of the karaoke jukebox system 302 also may accommodate connections to patrons' mobile devices 320.  Thus, there can multiple user devices including a second user device with microphone doing the same things as the first device.  Therefore, Rivera reads on the limitation.

III.	Applicants further argue on page 9 of the remarks that, Walker fails to disclose "wherein generating the compilation audio clip comprises synchronizing the audio from the first audio clip to at least one frame of the video content using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps.".
The Examiner respectfully disagrees with Applicant’s arguments because Walker teaches that an accurate reference clock common to all (music nodes) MNs in the session and timestamps made at each MN at recording starts can be utilized to help provide this synchronization. Each MN uses the common reference clock to timestamp each recording start with that clock time. With this reference clock timestamp, the following example algorithm can then be used to produce final mix:…[0233] 4. The delay is the time offset in recording R.sub.ai that must be skipped to bring the recording in alignment with that of the recording having the latest start. [0234] 5. R.sub.FINAL is then produced by discarding the delay worth of data associated with each recording with the set of recordings that does not have the latest start time, and then reading and mixing audio from the files from a time that will now match the latest start time. When the first end-of-file is reached, the mixing process stops [using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps], ¶s 229-234.  Further Walker teaches that the audio data from frames (e.g., audio data from audio data frames or audio plus video data frames) in packets received from multiple MNs can also be combined together by the NAAS server systems, and this combined audio data can be downloaded from the NAAS server systems to the MNs as a single UDP packet. This combining of audio data from communicated frames reduces the packet rate that is used to for processing by the MN router and also reduces bandwidth requirements on the receiving MN Internet service provider (ISP), ¶ 400.  Thus, Walker does teach the argued limitation.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-13 and 15-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rivera et al. (U.S. Publication No. 2020/0228856 A1) in view of Walker et al. (U.S. Publication No. 2015/0256613 A1).
With respect to claim 1, Rivera discloses a method comprising: providing access to a master recording session to a first device corresponding to a first user and a second device corresponding to a second user (i.e., FIG. 1 shows an overview of an exemplary embodiment of a digital downloading jukebox system 10. As shown in FIG. 1, the jukebox system 10 includes a central server 12 that contains a master library of audio content (typically music), as well as or alternatively audiovisual content (typically music and associated video or graphics), that can be downloaded therefrom.  The jukebox system also includes a series of remote jukebox devices 16, 16a-16f Each of these jukebox devices are generally located in a bar, restaurant, club, or other desired location, and are operable to play music (e.g., from a suitable storage location such as, for example, from a local server, a central and potentially remote server, from local storage, etc.) in response to receiving a payment from a user, such as coins, bills, credit/debit card, etc., and having one or more songs selected by the user for play [first and second users and devices], ¶ 4.  In order to address this problem and increase revenue, jukebox systems have in the past provided a feature that enables the user to search for songs on the central server from the jukebox and request an immediate download of a desired song from the central server to the jukebox for an additional fee. This feature enables the user to play any song in the master library of songs maintained by the central server using the jukebox, regardless of whether or not the specific song is presently stored in the mass storage of the jukebox itself, ¶ 7.  An aspect of certain exemplary embodiments relates to providing a karaoke jukebox connected system with collaborative touch points (including, for example, user devices such as mobile phones, tablets, etc.; jukeboxes themselves; game or other fixed or portable terminals in a location; etc.) that define unique moments  [first and second users and devices], ¶ 17.  Audiovisual data captured from a user device is received, with the audiovisual data including first audio data and first video data. Audio-only data having a quality higher than the first audio data is received. The first audio data and the audio-only data are digitally combined such that the first audio data is at least partially replaced with the audio-only data in order to produce a new audiovisual data file with user-generated video content synchronized with high-quality audio content based on a common time reference value, ¶ 24). 
Rivera further discloses wherein the first device and the second device are in different physical locations (i.e., An aspect of certain exemplary embodiments relates to providing a karaoke jukebox connected system with collaborative touch points (including, for example, user devices such as mobile phones, tablets, etc.; jukeboxes themselves; game or other fixed or portable terminals in a location; etc.) that define unique moments [wherein the first device and the second device are in different physical locations], ¶ 17.  Another aspect of certain exemplary embodiments relates to defining moments in the experience by providing unique or signature interactions that help make the experience immersive and unique for patrons, whether they are performing or watching at the location or remote from the location, ¶ 19). 
Rivera also discloses initiating a playback of a video content at the first device and at the second device (i.e., This feature enables the user to play any song in the master library of songs maintained by the central server using the jukebox, regardless of whether or not the specific song is presently stored in the mass storage of the jukebox itself, ¶ 7). 
Rivera further discloses actuating a first microphone coupled to the first device to record a first audio clip, wherein the actuation is synchronized to the playback of the video content based on a first plurality of timestamps corresponding to the video content (i.e., In certain exemplary embodiments, a method of recording a karaoke performance in which a karaoke performer sings a song through a first microphone connected to a jukebox that is playing the song is provided. A user can check in to a site where the jukebox is located via an application running on a portable electronic device being operated by the user. The application has access to a karaoke queue maintained by the jukebox, with the karaoke queue indicating the songs that are being sung. …The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox. The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times [actuating a first microphone coupled to the first device to record a first audio clip, wherein the actuation is synchronized to the playback of the video content], with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level [wherein the actuation is synchronized to the playback of the video content based on a first plurality of timestamps corresponding to the video content], ¶ 91.  Audio optionally may be captured along with the video, e.g., through a microphone provided to the jukebox and/or synced from a remote source (e.g., a mobile device of the patron). The synchronization may be facilitated by providing a common or shared timestamp service as between the various devices involved in the video and/or audio capture [wherein the actuation is synchronized to the playback of the video content based on a first plurality of timestamps corresponding to the video content], ¶ 197). 
Rivera also discloses actuating a second microphone coupled to the second device to record a second audio clip (i.e., In certain exemplary embodiments, a method of recording a karaoke performance in which a karaoke performer sings a song through a first microphone connected to a jukebox that is playing the song is provided. A user can check in to a site where the jukebox is located via an application running on a portable electronic device being operated by the user. … The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox. The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times, with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  Auxiliary and/or microphone input ports 326 may facilitate one or more microphone connections, e.g., for karaoke, general announcement and/or other purposes, ¶ 658.  This can be for any number of users that use the system including a second). 
Rivera further discloses wherein the actuation is synchronized to the playback of the video content based on a second plurality of timestamps corresponding to the video content (i.e., There is a synchronizing of times as between the jukebox and the electronic device upon said check in [actuating a first microphone coupled to the first device to record a first audio clip, wherein the actuation is synchronized to the playback of the video content]. The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox. The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times [wherein the actuation is synchronized to the playback of the video content], with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level [wherein the actuation is synchronized to the playback of the video content based on a second plurality of timestamps corresponding to the video content], ¶ 91.  Audio optionally may be captured along with the video, e.g., through a microphone provided to the jukebox and/or synced from a remote source (e.g., a mobile device of the patron). The synchronization may be facilitated by providing a common or shared timestamp service as between the various devices involved in the video and/or audio capture [wherein the actuation is synchronized to the playback of the video content based on a second plurality of timestamps corresponding to the video content], ¶ 197). 
Rivera also discloses receiving the first audio clip from the first device and the second audio clip from the second device (i.e., Audiovisual data captured from a user device is received, with the audiovisual data including first audio data and first video data, ¶ 24.  Certain exemplary embodiments enable the creation of a “mixed performance” that accepts audio from the karaoke jukebox microphone(s) input(s), ¶ 84). 
Rivera further discloses generating a compilation audio clip including audio from the first audio clip and audio from the second audio clip, wherein the compilation audio clip is synchronized to the video content (i.e., A mixer is configured to combine, with reference to the synchronized times, the captured video with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 26.  This can be done for multiple user devices with multiple audio clip recordings). 
Rivera may not explicitly disclose wherein generating the compilation audio clip comprises synchronizing the audio from the first audio clip to at least one frame of the video content using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps.
However, Walker discloses wherein generating the compilation audio clip comprises synchronizing the audio from the first audio clip to at least one frame of the video content using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps (i.e., An accurate reference clock common to all MNs in the session and timestamps made at each MN at recording stars can be utilized to help provide this synchronization. Each MN uses the common reference clock to timestamp each recording start with that clock time. With this reference clock timestamp, the following example algorithm can then be used to produce final mix:…[0233] 4. The delay (t.sub.Di) is the time offset in recording R.sub.ai that must be skipped to bring the recording in alignment with that of the recording having the latest start. [0234] 5. R.sub.FINAL is then produced by discarding the delay (t.sub.Di) worth of data associated with each recording with the set of recordings (RA.sub.ai, RB.sub.ai, RC.sub.ai) that does not have the latest start time, and then reading and mixing audio from the files from a time that will now match the latest start time t.sub.OLD. When the first end-of-file is reached, the mixing process stops [using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps], ¶s 229-234.  The audio data from frames (e.g., audio data from audio data frames or audio plus video data frames) in packets received from multiple MNs can also be combined together by the NAAS server systems, and this combined audio data can be downloaded from the NAAS server systems to the MNs as a single UDP packet. This combining of audio data from communicated frames reduces the packet rate that is used to for processing by the MN router and also reduces bandwidth requirements on the receiving MN Internet service provider (ISP), ¶ 400) in order to provide an interactive music client system which captures audio data and processes the captured data to generate audio output data within an interactive music session (¶ 5).
Therefore, based on Rivera in view of Walker, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of Walker to the system of Rivera in order to an interactive music client system which captures audio data and processes the captured data to generate audio output data within an interactive music session.

With respect to claim 2, Rivera discloses wherein the playback of the video content occurs at the first device at a first time and the playback of the video content occurs at the second device at a second time (i.e., The application has access to a karaoke queue maintained by the jukebox, with the karaoke queue indicating the songs that are being sung. There is a synchronizing of times as between the jukebox and the electronic device upon said check in. The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox. The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times, with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  The first audio data and the audio-only data are digitally combined such that the first audio data is at least partially replaced with the audio-only data in order to produce a new audiovisual data file with user-generated video content synchronized with high-quality audio content based on a common time reference value, ¶ 24.  Also, it will be appreciated that one or more karaoke jukebox systems, displays, speakers, zones, mobile devices, remote devices, social networks, and/or the like, may be provided in different locations and that the numbers of the various elements shown in FIG. 3 are provided for explanatory purposes [wherein the playback of the video content occurs at the first device at a first time and the playback of the video content occurs at the second device at a second time]. In other words, more or fewer mobile devices, displays, remote devices, social networks, may be connected or interconnected in different embodiments, ¶ 62.  Audio may be output from the karaoke jukebox system 302 using one or more audio output ports and/or circuits 322. The audio output 322 may support zoned output, e.g., to multiple speakers and/or speaker systems 324. The operating system of the jukebox may maintain separate song and/or karaoke queues for the various zones in certain exemplary embodiments. For instance, separate queues may be maintained so that songs may be played in certain zones even though other zones are playing different songs and/or are participating in karaoke…A video output circuit 330 may facilitate connections to multiple displays 332, e.g., through a switching device 334 as described herein [wherein the playback of the video content occurs at the first device at a first time and the playback of the video content occurs at the second device at a second time], ¶ 68). 

With respect to claim 3, Rivera discloses wherein the playback of the video content occurs at the first device and at the second device at substantially the same time (i.e., One or more TVs may be connected to the single switching device, thereby allowing the ability to cascade the display to many televisions, with potentially all rendering the same karaoke video signal, ¶ 74.  A virtual head-to-head competition can be made based on multiple performers singing the same song and prizes awarded accordingly, ¶ 79.  The scoring system may determine that one performer out of a group of performers singing at the same or different times is scoring higher and thus “doing better” than others, ¶ 102). 

With respect to claim 4, Rivera discloses wherein the first audio clip is stored with first edit metadata in a first location in a database and the second audio clip is stored with second edit metadata in a second location in the database (i.e., Metadata may be associated with songs in the karaoke database or catalog. Such metadata information may include, for example, lyrics of a song, rated difficulty, key, range, snippet (e.g., available for playback on a mobile device), an indication as to when the song was last played at a given location or venue, its popularity, frequency of play, etc, ¶ 65.  A program or program module corresponding to this smart queuing function may implement this dynamic reordering based on, for example, popularity scores associated with the songs, beat counts or known tempo data (e.g., retrieved from a metadata source including such information) saved in the database or otherwise known ahead of time, artist/album/song title data, and/or the like, ¶ 66.  Appropriate follow-on actions then may be taken on the performer's behalf such as, for example, copying media or metadata to a remote repository for subsequent playback, posting elements of the performance to a social network web service under the performance credentials, etc, ¶ 78.  A metadata database may be consulted by the computer system doing the retouching for such information and optionally to trigger such effects, ¶ 90). 
Rivera also discloses wherein the compilation audio clip is generated by accessing the first audio clip from the first location, the second audio clip from the second location (i.e., A mixer is configured to combine, with reference to the synchronized times, the captured video with high-quality audio captured by the first microphone connected to the jukebox [accessing the first audio clip from the first location] and high-quality song audio corresponding to the song associated with the karaoke performance [the second audio clip from the second location], in order to create a combined recording of the karaoke performance [wherein the compilation audio clip is generated], ¶ 26). 
Rivera further discloses applying the first edit metadata and the second edit metadata to the first audio clip and the second audio clip, respectively (i.e., Metadata may be associated with songs in the karaoke database or catalog. Such metadata information may include, for example, lyrics of a song, rated difficulty, key, range, snippet (e.g., available for playback on a mobile device), an indication as to when the song was last played at a given location or venue, its popularity, frequency of play, etc, ¶ 65.  A program or program module corresponding to this smart queuing function may implement this dynamic reordering based on, for example, popularity scores associated with the songs, beat counts or known tempo data (e.g., retrieved from a metadata source including such information) saved in the database or otherwise known ahead of time, artist/album/song title data, and/or the like [applying the first edit metadata and the second edit metadata to the first audio clip and the second audio clip, respectively], ¶ 66.  Although not expressly shown in FIG. 26, metadata may be associated with each of the segments 2608, 2610, and 2612, e.g., indicating timestamps (e.g., for start and end times), lengths, dates, the source of the data (e.g., from the mixer, raw input streams, from a user, etc.), and/or the like. The upload system that is located within the venue or at the event may submit data that is either stored as a contiguous data stream. If metadata or detectable audio cues are available, the data set may be identified with a segment ID and songs may be parsed with the identification of audio type [applying the first edit metadata and the second edit metadata to the first audio clip and the second audio clip, respectively]. Possible audio types include lead-in, actual song, lead-out, etc. The audio record may be time stamped based on information provided from the performance venue and the receipt time at the cloud server, ¶ 237). 

With respect to claim 5, Rivera discloses streaming a playback of the video content with the synchronized compilation audio clip to the first user device and the second user device (i.e., Certain exemplary embodiments enable the creation of a “mixed performance” that accepts audio from the karaoke jukebox microphone(s) input(s), as well as the backing music audio track. Thus, in certain exemplary embodiments, the karaoke jukebox offers real-time music mixing and manipulation arrangement by, for example, overlaying the inbound performer audio while also rendering the song music media. These two audio streams may be mixed by the mixing arrangement together to create a merged audio file, including both original music and audio captured in the venue. This resulting merged file may be considered a new work to be tracked for royalty and rights holder properties. However, the resulting merged file may be transferred and/or re-performed under the karaoke jukebox systems' control, e.g., to facilitate rights tracking and/or royalty sharing. Additional audio may be pre-pended or appended to the merged file, e.g., to include sponsored advertisements, information about the venue and/or performance, rights information, etc, ¶ 84.  Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level. Selectively blending together the different audio streams in this way may help create high quality audio while blending in at least some of the ambient sounds for a more user-generated content (UGC), personalized, or do-it-yourself feel, providing a greater sense of personal ownership in the music-making process and a greater sense of connectedness to the venue and the particular musical event, potentially in a way that simulates a small or intimate “rock star like” performance, ¶ 91). 

With respect to claim 6, Rivera discloses wherein the first audio clip is received in a plurality of segments from the first device, wherein a first of the plurality of segments is received while the first microphone is actuated (i.e., The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times, with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  A mixer is configured to combine, with reference to the synchronized times, the captured video with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 26.  In different scenarios, a KJ may be a member of the venue staff, an enthusiastic audience participant, an operator, or an automated (machine) controller in different exemplary embodiments. A human user may, for example, operate a simple remote control or remote controlled equipped smart device, e.g., to manage the sequence of subsequent performers, make a simple audio adjustments, and provide spontaneous event related commentary or supplemental audio and video clips or segment initiation [plurality of segments received while microphone is actuated], ¶ 70). 

With respect to claim 7, Rivera discloses wherein the first audio clip is stored locally on the first device and the second audio clip is stored locally on the second device (i.e., Each of the jukebox devices includes a subset of the master library on a local storage device of the jukebox. The central server may be used to individually manage the contents of the jukebox device, by monitoring usage of and updating the subset of songs on each of the jukebox devices with the intent of maximizing the usage thereof, ¶ 60.  In certain exemplary embodiments, the audio and/or video data may be stored to the device and later transmitted to a network storage location (e.g., in the cloud) for subsequent process, e.g., as described below, ¶ 89.  It is noted that the replacement or blending may take place on a server in the cloud (e.g., where, or having access to, the audio from the electronic device is stored, after the audio from the karaoke jukebox device is transferred thereto or accessed therefrom, for example), on the karaoke jukebox device itself, or some other location, ¶ 91.  Thus, the captured audio clips can be stored locally upon such transfer is initiated). 

With respect to claim 8, Rivera discloses terminating the master recording session (i.e., Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, ¶ 91.  The upload device may forward either digital versions of the live audio feed, mixed songs (e.g., that have been post-processed after the performance or processed in real-time or substantially in real-time), or multiple unmixed or partially mixed audio streams for later assembly in final mixing. The upload device may be configured to identify the performance (e.g., with a name, date, time, and/or unique identifier) and to characterize the method and format of upload material (e.g., file format, whether the audio stream(s) is/are raw or fully or partially mixed, source device(s) for the audio stream(s), etc.). Video also may be gathered and uploaded via the same device in certain exemplary embodiments, and this type of and/or other related information may be provided. The uploaded data may be stored to a storage location in the cloud. The cloud service may be configured to archive the uploaded material for subsequent processing, ¶ 225). 
Rivera may not explicitly disclose wherein upon termination, the first audio clip is deleted from the first device and the second audio clip is deleted from the second device
However, Walker discloses wherein upon termination, the first audio clip is deleted from the first device and the second audio clip is deleted from the second device (i.e., Finally, music node A leaves the session and being the creator of the session, it may choose to terminate the session with a message "stop session (S, Aid, A)". Otherwise it sends message "leave session (S, Aid, A)" to the server. Typically, the stop session is implicit, when the last node in the session leaves the session. When the server receives this message, it deletes the session object and by definition, the session ceases to exist, ¶ 524.  Similarly, when music node B leaves the session, messages to remove the rules in NAAS that allow communication with B are issued, and the bindings interface binding for B is dropped. Finally, music node A leaves the session by requesting a "session stop (A, Aid, A)". This causes all resources (e.g., forwarding rules and interface bindings) associated with session S at the NAAS to be released. The server also destroys the session object S, ¶ 536) in order to provide an interactive music client system which captures audio data and processes the captured data to generate audio output data within an interactive music session (¶ 5).
Therefore, based on Rivera in view of Walker, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of Walker to the system of Rivera in order to an interactive music client system which captures audio data and processes the captured data to generate audio output data within an interactive music session.

With respect to claim 9, Rivera discloses providing access to an audible playback session at the same time as the master recording session, wherein the audible playback session provides audio corresponding to the first audio clip and the second audio clip during capture of the first audio clip by the first microphone and capture of the second audio clip by the second microphone (i.e., A data stream of lyrics, performance, and/or other data may be created for display on an associated display device. This information may be created by the jukebox, the switching device, a video server, or some other device, ¶ 75.  Certain exemplary embodiments enable the creation of a “mixed performance” that accepts audio from the karaoke jukebox microphone(s) input(s), as well as the backing music audio track. Thus, in certain exemplary embodiments, the karaoke jukebox offers real-time music mixing and manipulation arrangement by, for example, overlaying the inbound performer audio while also rendering the song music media. These two audio streams may be mixed by the mixing arrangement together to create a merged audio file, including both original music and audio captured in the venue. This resulting merged file may be considered a new work to be tracked for royalty and rights holder properties. However, the resulting merged file may be transferred and/or re-performed under the karaoke jukebox systems' control, e.g., to facilitate rights tracking and/or royalty sharing. Additional audio may be pre-pended or appended to the merged file, e.g., to include sponsored advertisements, information about the venue and/or performance, rights information, etc, ¶ 84.  The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion [wherein the audible playback session provides audio corresponding to the first audio clip and the second audio clip during capture of the first audio clip by the first microphone and capture of the second audio clip by the second microphone], ¶ 91). 

With respect to claim 10, Rivera discloses wherein the initiating of the playback of the video content comprises transmitting a first playback command from a server to the first device and a second playback command from the server to the second device (i.e., Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level [wherein the initiating of the playback of the video content comprises transmitting a first playback command from a server to the first device and a second playback command from the server to the second device], ¶ 91). 

With respect to claim 11, Rivera discloses deactivating the first microphone and the second microphone to cease recording of the first audio clip and the second audio clip, respectively (i.e., Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times [deactivating the first microphone and the second microphone to cease recording of the first audio clip and the second audio clip, respectively], etc, ¶ 91). 

With respect to claim 12, Rivera discloses receiving a first edit metadata corresponding to the first audio clip and the video clip playback from the first device; and receiving a second edit metadata corresponding to the second audio clip and the video content playback from the second device (i.e., Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources [first and second devices] may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system [first and second audio clips and edit metadata], e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level. Selectively blending together the different audio streams in this way may help create high quality audio while blending in at least some of the ambient sounds for a more user-generated content (UGC), personalized, or do-it-yourself feel, providing a greater sense of personal ownership in the music-making process and a greater sense of connectedness to the venue and the particular musical event, potentially in a way that simulates a small or intimate “rock star like” performance, ¶ 91). 
Rivera further discloses wherein the first edit metadata and the second edit metadata are used to synchronize the first audio clip and the second audio clip to the video content (i.e., At the network location, the captured video is combined, with reference to the synchronized times, with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  The first audio data and the audio-only data are digitally combined such that the first audio data is at least partially replaced with the audio-only data in order to produce a new audiovisual data file with user-generated video content synchronized with high-quality audio content based on a common time reference value [wherein the first edit metadata and the second edit metadata are used to synchronize the first audio clip and the second audio clip to the video content], ¶ 24.  Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system [wherein the first edit metadata and the second edit metadata are used to synchronize the first audio clip and the second audio clip to the video content], e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, ¶ 91). 

With respect to claim 13, Rivera discloses a system comprising: a database for storing audio clips (i.e., Performers also may want to “keep” or otherwise have access to their performances or data about their performances such as, for example, what song was sung, when they sang it, the location they sang it in, etc. This information may be stored in a database on the jukebox and/or in the audiovisual distribution record, e.g., as a part of a patron's personal karaoke record, ¶ 78.  In certain exemplary embodiments, the audio and/or video data may be stored to the device and later transmitted to a network storage location (e.g., in the cloud) for subsequent process, e.g., as described below. In certain exemplary embodiments, the audio and/or video data may be streamed to and/or stored directly on a network storage location, ¶ 89). 
Rivera further discloses a processing element associated with the database configured to: transmit a video content to a first user device and a second user device (i.e., As shown in FIG. 1, the jukebox system 10 includes a central server 12 that contains a master library of audio content (typically music), as well as or alternatively audiovisual content (typically music and associated video or graphics), that can be downloaded therefrom, ¶ 4). 
Rivera also discloses initiate a first local audio recording at the first device and a second local audio recording at the second user device based on the transmission of the video content to the first user device and the second user device (i.e., In certain exemplary embodiments, a method of recording a karaoke performance in which a karaoke performer sings a song through a first microphone connected to a jukebox that is playing the song is provided. A user can check in to a site where the jukebox is located via an application running on a portable electronic device being operated by the user, ¶ 23.  This can be done by multiple different users and devices). 
Rivera further discloses wherein the initiation of the first local audio recording is synchronized to playback of the video content at the first user device based on a first plurality of timestamps corresponding to the video content (i.e., In certain exemplary embodiments, a method of recording a karaoke performance in which a karaoke performer sings a song through a first microphone connected to a jukebox that is playing the song is provided. A user can check in to a site where the jukebox is located via an application running on a portable electronic device being operated by the user. The application has access to a karaoke queue maintained by the jukebox, with the karaoke queue indicating the songs that are being sung. There is a synchronizing of times as between the jukebox and the electronic device upon said check in [initiation of the first local audio recording is synchronized to playback of the video content at the first user device]. The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox. The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times [wherein the initiation of the first local audio recording is synchronized to playback of the video content at the first user device based on a first plurality of timestamps corresponding to the video content], with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level [wherein the initiation of the first local audio recording is synchronized to playback of the video content at the first user device based on a first plurality of timestamps corresponding to the video content], ¶ 91.  Audio optionally may be captured along with the video, e.g., through a microphone provided to the jukebox and/or synced from a remote source (e.g., a mobile device of the patron). The synchronization may be facilitated by providing a common or shared timestamp service as between the various devices involved in the video and/or audio capture [wherein the initiation of the first local audio recording is synchronized to playback of the video content at the first user device based on a first plurality of timestamps corresponding to the video content], ¶ 197). 
Rivera further discloses wherein the initiation of the second local audio recording is synchronized to playback of the video content at the second user device based on a second plurality of timestamps corresponding to the video content (i.e., There is a synchronizing of times as between the jukebox and the electronic device upon said check in [actuating a first microphone coupled to the first device to record a first audio clip, wherein the actuation is synchronized to the playback of the video content]. The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox. The captured video is received from the electronic device at a network location. At the network location, the captured video is combined, with reference to the synchronized times [initiation of the second local audio recording is synchronized to playback of the video content at the second user device], with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times, etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level [wherein the initiation of the second local audio recording is synchronized to playback of the video content at the second user device based on a second plurality of timestamps corresponding to the video content], ¶ 91.  Audio optionally may be captured along with the video, e.g., through a microphone provided to the jukebox and/or synced from a remote source (e.g., a mobile device of the patron). The synchronization may be facilitated by providing a common or shared timestamp service as between the various devices involved in the video and/or audio capture [wherein the initiation of the second local audio recording is synchronized to playback of the video content at the second user device based on a second plurality of timestamps corresponding to the video content], ¶ 197). 
Rivera further discloses terminate the first local audio recording and the second local audio recording (i.e., Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times [terminate the first local audio recording and the second local audio recording], etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level, ¶ 91). 
Rivera also discloses receive the first local audio recording from the first user device and the second local audio recording from the second user device (i.e., In certain exemplary embodiments, a method of recording a karaoke performance in which a karaoke performer sings a song through a first microphone connected to a jukebox that is playing the song is provided. A user can check in to a site where the jukebox is located via an application running on a portable electronic device being operated by the user. The application has access to a karaoke queue maintained by the jukebox, with the karaoke queue indicating the songs that are being sung. There is a synchronizing of times as between the jukebox and the electronic device upon said check in [receive the first local audio recording from the first user device and the second local audio recording from the second user device]. The electronic device is caused to capture at least video of the karaoke performance mediated by the jukebox, ¶ 23). 
Rivera further discloses store the first local audio recording at a first location within the database and the second local audio recording at a second location within the database (i.e., Appropriate follow-on actions then may be taken on the performer's behalf such as, for example, copying media or metadata to a remote repository for subsequent playback, posting elements of the performance to a social network web service under the performance credentials, etc, ¶ 78.  The upload system that is located within the venue or at the event may submit data that is either stored as a contiguous data stream. If metadata or detectable audio cues are available, the data set may be identified with a segment ID and songs may be parsed with the identification of audio type [store the first local audio recording at a first location within the database and the second local audio recording at a second location within the database]. Possible audio types include lead-in, actual song, lead-out, etc. The audio record may be time stamped based on information provided from the performance venue and the receipt time at the cloud server, ¶ 237). 
Rivera also discloses generate a compilation audio clip including audio from the first local audio recording and audio from the second local audio recording (i.e., A mixer is configured to combine, with reference to the synchronized times, the captured video with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 26.  This can be done for multiple user devices with multiple audio clip recordings). 
Rivera further discloses wherein the compilation audio clip is synchronized to the video content (i.e., A mixer is configured to combine, with reference to the synchronized times, the captured video with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 26.  This can be done for multiple user devices with multiple audio clip recordings). 
Rivera may not explicitly disclose wherein generating the compilation audio clip comprises synchronizing the audio from the first audio clip to at least one frame of the video content using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps.
However, Walker discloses wherein generating the compilation audio clip comprises synchronizing the audio from the first audio clip to at least one frame of the video content using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps (i.e., An accurate reference clock common to all MNs in the session and timestamps made at each MN at recording stars can be utilized to help provide this synchronization. Each MN uses the common reference clock to timestamp each recording start with that clock time. With this reference clock timestamp, the following example algorithm can then be used to produce final mix:…[0233] 4. The delay (t.sub.Di) is the time offset in recording R.sub.ai that must be skipped to bring the recording in alignment with that of the recording having the latest start. [0234] 5. R.sub.FINAL is then produced by discarding the delay (t.sub.Di) worth of data associated with each recording with the set of recordings (RA.sub.ai, RB.sub.ai, RC.sub.ai) that does not have the latest start time, and then reading and mixing audio from the files from a time that will now match the latest start time t.sub.OLD. When the first end-of-file is reached, the mixing process stops [synchronizing the audio from the first audio clip to at least one frame of the video content using the first plurality of timestamps and synchronizing the audio from the second audio clip to the at least one frame of the video content using the second plurality of timestamps], ¶s 229-234.  The audio data from frames (e.g., audio data from audio data frames or audio plus video data frames) in packets received from multiple MNs can also be combined together by the NAAS server systems, and this combined audio data can be downloaded from the NAAS server systems to the MNs as a single UDP packet. This combining of audio data from communicated frames reduces the packet rate that is used to for processing by the MN router and also reduces bandwidth requirements on the receiving MN Internet service provider (ISP), ¶ 400) in order to provide an interactive music client system which captures audio data and processes the captured data to generate audio output data within an interactive music session (¶ 5).
Therefore, based on Rivera in view of Walker, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of Walker to the system of Rivera in order to an interactive music client system which captures audio data and processes the captured data to generate audio output data within an interactive music session.

With respect to claim 15, Rivera discloses wherein the first plurality of timestamps comprises a time start based on the initiation of the first local audio recording (i.e., The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, e.g., thereby providing common timestamps so that the different audio streams can be overlaid in a coherent fashion, even if recording is started and stopped one or more times, the recording from the microphone and the recording from the electronic device start and/or stop at different times [wherein the first plurality of timestamps comprises a time stamp information comprises a time start based on the initiation of the first local audio recording and the initiation of the second local audio recording], etc. An audio and/or video selection arrangement may be triggered locally or remotely to determine the mixing conditions of each stream with a selected predetermined audio level. Selectively blending together the different audio streams in this way may help create high quality audio while blending in at least some of the ambient sounds for a more user-generated content (UGC), personalized, or do-it-yourself feel, providing a greater sense of personal ownership in the music-making process and a greater sense of connectedness to the venue and the particular musical event, potentially in a way that simulates a small or intimate “rock star like” performance, ¶ 91.  Although not expressly shown in FIG. 26, metadata may be associated with each of the segments 2608, 2610, and 2612, e.g., indicating timestamps (e.g., for start and end times) [wherein the time stamp information comprises a time start], lengths, dates, the source of the data (e.g., from the mixer, raw input streams, from a user, etc.), and/or the like. The upload system that is located within the venue or at the event may submit data that is either stored as a contiguous data stream. If metadata or detectable audio cues are available, the data set may be identified with a segment ID and songs may be parsed with the identification of audio type. Possible audio types include lead-in, actual song, lead-out, etc. The audio record may be time stamped based on information provided from the performance venue and the receipt time at the cloud server, ¶ 237). 

With respect to claim 16, Rivera discloses wherein the processing element is further configured to playback the compilation audio clip by accessing the first local audio recording and the second local audio recording (i.e., This feature enables the user to play any song in the master library of songs maintained by the central server using the jukebox, regardless of whether or not the specific song is presently stored in the mass storage of the jukebox itself, ¶ 7.  Appropriate follow-on actions then may be taken on the performer's behalf such as, for example, copying media or metadata to a remote repository for subsequent playback [playback the compilation audio clip by accessing the first local audio recording and the second local audio recording], posting elements of the performance to a social network web service under the performance credentials, etc, ¶ 78). 

With respect to claim 17, Rivera discloses wherein the processing element is further configured to stream the playback of the compilation audio clip and the video content to the first user device and the second user device (i.e., Appropriate follow-on actions then may be taken on the performer's behalf such as, for example, copying media or metadata to a remote repository for subsequent playback, posting elements of the performance to a social network web service under the performance credentials, etc, ¶ 78.  Once the new video was created through a remix of the audio from the uploaded source and the mobile submitted source, the new video may be stored on a cloud server with a unique address (e.g., GUID). In some cases, the video may be controlled by the performer and only accessible through a subsequent transaction. For instance, the patron who had purchased the rights to record one or more segments of the performance may have access via a credential authentication to stream this clip to any device that supports authentication and audio/video playback, ¶ 244.  FIG. 30 is a schematic view showing a mobile video file being viewed from a cloud or other network location in accordance with certain exemplary embodiments. Unlike in FIG. 27, the mobile device 2704 is shown in a playback or “media ready” mode, e.g., after the audio and/or video clips are matched and mixed, and a coherent audio and/or video clip is generated. In that regard, the user may access the mixed song with the mobile video 3002 from the network storage location 3004 by accessing a link to the video stream or file 3006, ¶ 246). 

With respect to claim 18, Rivera discloses wherein the processing element initiates the first local audio recording and the second local audio recording by transmitting a command to actuate a microphone at both the first user device and the second user device (i.e., At the network location, the captured video is combined, with reference to the synchronized times, with high-quality audio captured by the first microphone connected to the jukebox and high-quality song audio corresponding to the song associated with the karaoke performance, in order to create a combined recording of the karaoke performance, ¶ 23.  This can be done for multiple performers/user devices). 

With respect to claim 19, Rivera discloses wherein providing access to the master recording session further comprises providing access to a third device corresponding to a third user, wherein the first device, the second device, and the third device are in different physical locations (i.e., FIG. 1 shows an overview of an exemplary embodiment of a digital downloading jukebox system 10. As shown in FIG. 1, the jukebox system 10 includes a central server 12 that contains a master library of audio content (typically music), as well as or alternatively audiovisual content (typically music and associated video or graphics), that can be downloaded therefrom [wherein providing access to the master recording session], 16a-16f, ¶ 4.  FIG. 3 is a schematic view of a combined jukebox karaoke system for use within a venue or location in accordance with certain exemplary embodiments. Elements shown below the dashed line in FIG. 3 are provided in a single venue or location, whereas elements shown above the dashed line are provided outside of that venue or location. It will be appreciated that multiple venues and/or locations may be connected to the AV Network, for example, although multiple such karaoke jukebox systems and/or related components are omitted from FIG. 3 for clarity purposes. Also, it will be appreciated that one or more karaoke jukebox systems, displays, speakers, zones, mobile devices, remote devices, social networks, and/or the like, may be provided in different locations and that the numbers of the various elements shown in FIG. 3 are provided for explanatory purposes. In other words, more or fewer mobile devices, displays, remote devices, social networks, may be connected or interconnected in different embodiments [providing access to a third device corresponding to a third user, wherein the first device, the second device, and the third device are in different physical locations], ¶ 62.  Referring once again to FIG. 7b, the audio and/or video from the electronic device may be mixed with the retouched audio from the microphone that has been overlaid onto the underlying song audio in step S728. The combination of different audio streams from different sources may become possible because the electronic device was checked in to the location and was synchronized with the local karaoke jukebox system, ¶ 91). 

With respect to claim 20, Rivera discloses wherein the playback of the video content to the first and second device is initiated responsive to a command from the third device (i.e., In different scenarios, a KJ may be a member of the venue staff, an enthusiastic audience participant, an operator, or an automated (machine) controller in different exemplary embodiments. A human user may, for example, operate a simple remote control or remote controlled equipped smart device, e.g., to manage the sequence of subsequent performers, make a simple audio adjustments, and provide spontaneous event related commentary or supplemental audio and video clips or segment initiation [wherein the playback of the video content to the first and second device is initiated responsive to a command from the third device], ¶ 70.  In certain exemplary embodiments, the karaoke jukebox remote control may include a microphone connected to the karaoke jukebox system and/or speakers in the venue, e.g., so that KJ can make announcements, offer verbal words of encouragement, call the next performer, etc. It will be appreciated that the example remote control may have these and/or other buttons or switches for controlling the jukebox in either or both of jukebox and karaoke modes. In certain exemplary embodiments, the KJ's remote control may be a virtual remote control accessible via an electronic device such as, for example, a laptop, smart phone, tablet, or the like, ¶ 72.  In cases where there are multiple possible video sources, the performer, KJ, or other editor may select the appropriate clip(s) and/or image(s) and also indicate where they should be placed in the overall stream [wherein the playback of the video content to the first and second device is initiated responsive to a command from the third device], ¶ 88.  An example karaoke jukebox remote control usable in connection with certain exemplary embodiments is shown in FIG. 4. The example karaoke jukebox remote control shown in FIG. 4 may have a plurality of buttons or switches. Videos could also be paid for music credits or cash or credit or a virtual wallet account, etc., to allow patrons to take the video of the moment and to share this video via a social media website, email, and Internet link, and/or the like. Audio optionally may be captured along with the video, e.g., through a microphone provided to the jukebox and/or synced from a remote source (e.g., a mobile device of the patron) [wherein the playback of the video content to the first and second device is initiated responsive to a command from the third device]. The synchronization may be facilitated by providing a common or shared timestamp service as between the various devices involved in the video and/or audio capture, ¶ 197). 

With respect to claim 21, Rivera discloses receiving, from the third device, a first command for the actuation of a first microphone coupled to the first device (i.e., To accommodate network connections, the karaoke jukebox system 302 may include a network interface 318, which connects the karaoke jukebox system 302 to the AV network 304 and/or other outside resources. The network interface 318 of the karaoke jukebox system 302 also may accommodate connections to patrons' mobile devices 320. It will be appreciated that such connections may be direct connections to the karaoke jukebox system 302 or indirect connections, e.g., mediated by the AV network 304, a local server, and/or the like [first, second or third devices], ¶ 67.  Also, it will be appreciated that one or more karaoke jukebox systems, displays, speakers, zones, mobile devices, remote devices, social networks, and/or the like, may be provided in different locations [remote from each other] and that the numbers of the various elements shown in FIG. 3 are provided for explanatory purposes. In other words, more or fewer mobile devices, displays, remote devices, social networks, may be connected or interconnected in different embodiments, ¶ 62.  One or more remote devices 342 may be able to connect with the AV network 304, the social networks 340, and/or the karaoke jukebox system 302 (e.g., directly or using the AV network 304 as an intermediary) in certain exemplary embodiments [first, second or third devices], ¶ 69.  In different scenarios, a KJ may be a member of the venue staff, an enthusiastic audience participant, an operator, or an automated (machine) controller in different exemplary embodiments. A human user may, for example, operate a simple remote control or remote controlled equipped smart device, e.g., to manage the sequence of subsequent performers, make a simple audio adjustments, and provide spontaneous event related commentary or supplemental audio and video clips or segment initiation. In example use cases, a control system may be used to move a nervous performer back in the queue, raise or lower the volume for a particular performer, play prerecorded applause or cheering, display encouraging or amusing comments on the video display systems, skip performers who are no longer present (e.g., because they have left the venue, are not available, have decided to skip their performance, etc.), and/or the like. Human KJs may be located at or remote from the venue in different example scenarios, ¶ 70.  An example karaoke jukebox remote control usable in connection with certain exemplary embodiments is shown in FIG. 4. The example karaoke jukebox remote control shown in FIG. 4 may have a plurality of buttons or switches. …The KJ may cause the performance to be recorded by pressing the record button 410, and may cause the music and/or recording to be paused by pressing the pause button 412 [receiving, from the third device, a first command for the actuation of a first microphone coupled to the first device]. …The karaoke jukebox remote control may have a remote transmitter that operates under infrared, RF, Bluetooth, or other frequencies to communicate with the karaoke jukebox system for these and/or other purposes [a remote for issuing a first command for the actuation of a first microphone coupled to the first device], ¶ 71.  In certain exemplary embodiments, the karaoke jukebox remote control may include a microphone connected to the karaoke jukebox system and/or speakers in the venue, e.g., so that KJ can make announcements, offer verbal words of encouragement, call the next performer, etc. It will be appreciated that the example remote control may have these and/or other buttons or switches for controlling the jukebox in either or both of jukebox and karaoke modes. In certain exemplary embodiments, the KJ's remote control may be a virtual remote control accessible via an electronic device such as, for example, a laptop, smart phone, tablet, or the like, ¶ 72). 

With respect to claim 22, Rivera discloses receiving, from the third device, a first command for the actuation of the second microphone coupled to the second device (i.e., To accommodate network connections, the karaoke jukebox system 302 may include a network interface 318, which connects the karaoke jukebox system 302 to the AV network 304 and/or other outside resources. The network interface 318 of the karaoke jukebox system 302 also may accommodate connections to patrons' mobile devices 320. It will be appreciated that such connections may be direct connections to the karaoke jukebox system 302 or indirect connections, e.g., mediated by the AV network 304, a local server, and/or the like [first, second or third devices and their microphones], ¶ 67.  Also, it will be appreciated that one or more karaoke jukebox systems, displays, speakers, zones, mobile devices, remote devices, social networks, and/or the like, may be provided in different locations [remote from each other] and that the numbers of the various elements shown in FIG. 3 are provided for explanatory purposes. In other words, more or fewer mobile devices, displays, remote devices, social networks, may be connected or interconnected in different embodiments, ¶ 62.  One or more remote devices 342 may be able to connect with the AV network 304, the social networks 340, and/or the karaoke jukebox system 302 (e.g., directly or using the AV network 304 as an intermediary) in certain exemplary embodiments [first, second or third devices and their microphones], ¶ 69.  In different scenarios, a KJ may be a member of the venue staff, an enthusiastic audience participant, an operator, or an automated (machine) controller in different exemplary embodiments. A human user may, for example, operate a simple remote control or remote controlled equipped smart device, e.g., to manage the sequence of subsequent performers, make a simple audio adjustments, and provide spontaneous event related commentary or supplemental audio and video clips or segment initiation. In example use cases, a control system may be used to move a nervous performer back in the queue, raise or lower the volume for a particular performer, play prerecorded applause or cheering, display encouraging or amusing comments on the video display systems, skip performers who are no longer present (e.g., because they have left the venue, are not available, have decided to skip their performance, etc.), and/or the like. Human KJs may be located at or remote from the venue in different example scenarios, ¶ 70.  An example karaoke jukebox remote control usable in connection with certain exemplary embodiments is shown in FIG. 4. The example karaoke jukebox remote control shown in FIG. 4 may have a plurality of buttons or switches. …The KJ may cause the performance to be recorded by pressing the record button 410, and may cause the music and/or recording to be paused by pressing the pause button 412 [receiving, from the third device, a first command for the actuation of the second microphone coupled to the second device]. …The karaoke jukebox remote control may have a remote transmitter that operates under infrared, RF, Bluetooth, or other frequencies to communicate with the karaoke jukebox system for these and/or other purposes [receiving, from the third device, a first command for the actuation of the second microphone coupled to the second device], ¶ 71.  In certain exemplary embodiments, the karaoke jukebox remote control may include a microphone connected to the karaoke jukebox system and/or speakers in the venue, e.g., so that KJ can make announcements, offer verbal words of encouragement, call the next performer, etc. It will be appreciated that the example remote control may have these and/or other buttons or switches for controlling the jukebox in either or both of jukebox and karaoke modes. In certain exemplary embodiments, the KJ's remote control may be a virtual remote control accessible via an electronic device such as, for example, a laptop, smart phone, tablet, or the like, ¶ 72). 

Claim 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rivera et al. (U.S. Publication No. 2020/0228856 A1) in view of Walker et al. (U.S. Publication No. 2015/0256613 A1), and further in view of Garner et al. (U.S. Publication No. 2018/0047386 A1).
With respect to claim 23, Rivera and Walker may not explicitly disclose wherein the actuation of the first microphone and the actuation of the second microphone allow for capture of the first audio clip at the first device and the capture of the second audio clip at the second device at substantially the same time.
However, Garner discloses wherein the actuation of the first microphone and the actuation of the second microphone allow for capture of the first audio clip at the first device and the capture of the second audio clip at the second device at substantially the same time (i.e., Returning to FIG. 1, in an embodiment, microphones 104 may have a transmit functionality, by which they are able to wired or wirelessly transmit data to VPS 102. As such, microphones 104E and microphones 104A that are geographically located closer to VPS 102 (and thus have shorter transmit times than further places microphones 104B and 104C/104D) may have a synchronized delay in sending data 106 to VPS 102 to account for the variations in transmit times, such that all data 106 is received substantially simultaneously by VPS 102. For example, during a calibration period, VPS 102 may calibrate the transmit time to/from different microphones 104 and may synchronize delays such that data 106 is received from the different microphones 104 at or approximately the same time [wherein the actuation of the first microphone and the actuation of the second microphone allow for capture of the first audio clip at the first device and the capture of the second audio clip at the second device at substantially the same time]. VPS 102 may then process received within a specific time interval of each other together for combination, ¶ 43) in order to allow the operation of receiving audio data from microphones associated with a plurality of devices distributed across an area of interest. (¶ 12).
Therefore, based on Rivera in view of Walker, and further in view of Garner, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of Garner to the system of Rivera and Walker  in order to allow the operation of receiving audio data from microphones associated with a plurality of devices distributed across an area of interest.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisoryaction is not mailed until after the end of the THREE-MONTH shortened statutoryperiod, then the shortened statutory period will expire on the date the advisoryaction is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will becalculated from the mailing date of the advisory action. In no event, however, willthe statutory period for reply expire later than SIX MONTHS from the date of thisfinal action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAREN M MEANS whose telephone number is (571)270-7202.  The examiner can normally be reached on 12pm-6pm ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Joon Hwang can be reached on 571-272-4036.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).





Jaren M. Means
/J.M.M./
Patent Examiner
Art Unit 2447	
12/3/2022

/JOON H HWANG/Supervisory Patent Examiner, Art Unit 2447