DETAILED ACTION
Applicant's submission filed on January 8, 2022 in response to Office Action dated October 29, 2021 has been entered. Claims 1-20 are pending in this application.
Applicant was offered Examiner Amendment to make this application allowable, but the Applicant has declined it and has preferred this Office Action.

Response to Amendment
Applicant’s arguments with respect to claims 1-20 have been considered but are moot in view of new grounds of rejections necessitated due to claim amendments.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 20 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claim 20 recites “wherein the first set of metadata is accessed from header information in audio frames of the initial audio segment” (emphasis added) in lines 28-30. However, it also recites “access a first set of metadata that corresponds to a last audio frame of the initial audio segment” (emphasis added) lines 7-8 indicating accessing only one frame instead of frames in the subsequent audio 10Application No.: 15/931,442Attorney's Docket No.: 010704.0037U1segment is accessed to determine the audio characteristics of the subsequent audio segment” (emphasis added) in lines 32-34. However, it also recites “access a second set of metadata that corresponds to the first audio frame of the subsequent audio segment, the second set of metadata including information indicating 9Application No.: 15/931,442Attorney's Docket No.: 010704.0037U1 one or more audio characteristics of the first audio frame of the subsequent audio segment” (emphasis added) in lines 10-13 indicating accessing only one frame instead of multiple frames. Further it recites “wherein the inserted audio frames are inserted into a detected gap between playback of the initial audio segment and playback of the subsequent audio segment until subsequent header information from audio frames in the subsequent audio segment is accessed to determine the audio characteristics of the subsequent audio segment” (emphasis added) in lines 30-34. It is not clear how inserting audio frames until audio frames in the subsequent audio segment is accessed, i.e. until interpreted as inserting audio frames before audio frames in the subsequent audio segment is accessed, is possible since claim recites “access a second set of metadata that corresponds to the first audio frame of the subsequent audio segment, the second set of metadata including information indicating 9Application No.: 15/931,442Attorney's Docket No.: 010704.0037U1 one or more audio characteristics of the first audio frame of the subsequent audio segment” to “generate, based on the first and second sets of metadata, a new set of metadata that is based on both the audio characteristics of the last audio frame in the initial audio segment and the audio characteristics of the first audio frame in the subsequent audio segment” which is used in determining number of audio frames to be 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-10, 12-13, 17-18, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zalon (US Patent No. 10,409,546), and further in view of Baumgarte (US Patent Application Publication No. 2017/0094409).
Regarding claim 1, Zalon teaches a computer-implemented method comprising:

accessing a first set of metadata that corresponds to a last audio frame of the initial audio segment, the first set of metadata including information indicating one or more audio characteristics of the last audio frame of the initial audio segment (col. 8 ll. 42-col. 10 ll. 7, col. 11 ll. 22-30, col. 13 ll. 8-55, col. 14 ll. 1-46, col. 15 ll. 5-38 generating, storing and accessing previous frame audio characteristics);
accessing a second set of metadata that corresponds to the first audio frame of the subsequent audio segment, the second set of metadata including information indicating one or more audio characteristics of the first audio frame of the subsequent audio segment (col. 8 ll. 42-col. 10 ll. 7, col. 11 ll. 22-30, col. 13 ll. 8-55, col. 14 ll. 1-46, col. 15 ll. 5-38 generating, storing and accessing next frame audio characteristics);
generating, based on the first and second sets of metadata, a new set of metadata that is based on both the audio characteristics of the last audio frame in the initial audio segment and the audio characteristics of the first audio frame in the subsequent audio segment; determining that a transition between the last audio frame in the initial audio segment and the first audio frame in the subsequent audio segment is to span at least a specified minimum number of audio frames (Fig. 22, col. 22 ll. 47-col. 23 ll. 12, col. 26 ll. 50-col. 27 ll. 20, col. 27 ll. 50-col. 29 ll. 24 determining number of elements with corresponding levels);
inserting a dynamically variable number of new audio frames between the last audio frame of the initial audio segment and the first audio frame of the subsequent 
Zalon teaches analyzing audio frame by frame (Fig. 6-8) and tagging audio segments and extracting metadata from data packet (col. 9 ll. 17-18), but Zalon does not explicitly teach metadata for audio frame.
However, in the similar field, Baumgarte teaches using metadata for individual audio frame (Paragraphs 0017-0021). (Note: Zalon and Baumgarte refer to the same industry standard ITU-R BS.1770-3).
It would have been obvious o a person of ordinary skill in the art before the effective filing date of the present invention to modify Zalon to use metadata for 
Regarding claim 2, Zalon teaches the initial audio segment and the subsequent audio segment are part of the same media item (Fig. 2, 9-19 showing portions of same media, col. 8 ll. 62-65 section/ portion of item, col. 11 ll. 35-38, Figs.).
Regarding claim 3, Zalon teaches the media item comprises an interactive media item that allows out-of-order playback of audio segments (col. 6 ll. 21-24, ll. 39-43 advancing, rewinding or skipping).
Regarding claim 4, Zalon teaches the subsequent audio segment comprises an out-of-order audio segment within the media item (col. 6 ll.21-24,  ll. 39-43 advancing, rewinding or skipping).
Regarding claim 5, Zalon teaches the initial audio segment and the subsequent audio segment are each part of different media items that are being spliced together (col. 7 ll. 29-col. 8 ll. 41, col. 19 ll. 39-col. 20 ll. 16).
Regarding claim 6, Zalon teaches the generated new set of metadata comprises adaptive metadata configured to adapt to the audio characteristics of the last audio frame in the initial audio segment and to the audio characteristics of the first audio 
Regarding claim 7, Zalon teaches the new audio frame includes at least two sub-portions over which the audio characteristics of the last audio frame in the initial audio segment are transitioned to the audio characteristics of the first audio frame in the subsequent audio segment using the adaptive metadata (col. 22 ll. 47-col. 23 ll. 12, col. 24 ll. 31-41 gap/stitch element multiple portions transitioning to next element level).
Regarding claim 8, Zalon teaches the at least one new audio frame comprises at least two new audio frames over which the audio characteristics of the last audio frame in the initial audio segment are transitioned to the audio characteristics of the first audio frame in the subsequent audio segment using the adaptive metadata (col. 22 ll. 47-col. 23 ll. 12, col. 24 ll. 31-41 gap/stitch element with multiple frames transitioning to next element level).
Regarding claim 9, Zalon teaches the adaptive metadata is dynamically inserted into a string of inserted audio frames until the first audio frame of the subsequent audio segment is reached (col. 22 ll. 47-col. 23 ll. 12, col. 24 ll. 31-41 gap/stitch element with multiple frames with varying levels transitioning to next element level).
Regarding claim 10, Zalon teaches the number of inserted audio frames having adaptive metadata depends on a length of time between playback of the last audio frame in the initial audio segment and the first audio frame in the subsequent audio segment (col. 23 ll. 43-col. 24 ll. 15 calculate and select multiple items/frames to smoothly bridge/blend, col. 22 ll. 47-col. 23 ll. 12 adaptive metadata with multiple levels).
Regarding claim 12, Zalon teaches a system comprising:
at least one physical processor; and physical memory comprising computer-executable instructions that, when executed by the physical processor, cause the physical processor (col. 29 ll. 25-col. 31 ll. 30, col. 46 ll. 12-60) to:
identify, within at least one media item that includes a plurality of audio segments, an initial audio segment (Fig. 2 item 215) and a subsequent audio segment (Fig. 2 item 230) that follows the initial audio segment (col. 10 ll. 62-col. 11 ll. 2);
access a first set of metadata that corresponds to a last audio frame of the initial audio segment, the first set of metadata including information indicating one or more audio characteristics of the last audio frame of the initial audio segment (col. 8 ll. 42-col. 10 ll. 7, col. 11 ll. 22-30, col. 13 ll. 8-55, col. 14 ll. 1-46, col. 15 ll. 5-38 generating, storing and accessing previous frame audio characteristics);
access a second set of metadata that corresponds to the first audio frame of the subsequent audio segment, the second set of metadata including information indicating one or more audio characteristics of the first audio frame of the subsequent audio segment (col. 8 ll. 42-col. 10 ll. 7, col. 11 ll. 22-30, col. 13 ll. 8-55, col. 14 ll. 1-46, col. 15 ll. 5-38 generating, storing and accessing next frame audio characteristics);
generate, based on the first and second sets of metadata, a new set of metadata that is based on both the audio characteristics of the last audio frame in the initial audio segment and the audio characteristics of the first audio frame in the subsequent audio segment; determine that a transition between the last audio frame in the initial audio segment and the first audio frame in the subsequent audio segment is to span at least a specified minimum number of audio frames (Fig. 22, col. 22 ll. 47-col. 23 ll. 12, col. 26 ll. 
insert a dynamically variable number of new audio frames between the last audio frame of the initial audio segment and the first audio frame of the subsequent audio segment that meets at least the specified minimum number of audio frames that meets at least the specified minimum number of audio frames, wherein the metadata of each new audio frame includes proportionally fewer audio characteristics (volume, rhythmic level) of the last 2Application No.: 15/931,442Attorney's Docket No.: 010704.0037U1audio frame in the initial audio segment (less effect of previous frame volume, rhythmic level based on fade out) and proportionally more audio characteristics of the first audio frame in the subsequent audio segment (more effect of next frame volume, rhythmic level based on fade in), wherein the proportional changes in audio characteristics in the metadata correspond to the dynamically variable number of inserted new audio frames (each inserted frame volume dependent on step value determined by number of frames inserted); and apply the new set of metadata to the new audio frames as the new audio frames are dynamically generated using the dynamically variable number of new audio frames (control the volume adjustment and duration for inserted audio frames) (col. 2 ll. 24-31, col. 3 ll. 27-61, col. 9 ll. 3-11, col . 10 ll. 8-10, col. 10 ll. 62-col. 11 ll. 2, col. 11 ll. 14-22, col. 19 ll. 39-53, col. 24 ll. 31-41, col. 26 ll. 50-col. 27 ll. 20, col. 27 ll. 50-col. 29 ll. 24 adding glue element, stitch data with control messages).
Zalon teaches analyzing audio frame by frame (Fig. 6-8) and tagging audio segments and extracting metadata from data packet (col. 9 ll. 17-18), but Zalon does not explicitly teach metadata for audio frame.

It would have been obvious o a person of ordinary skill in the art before the effective filing date of the present invention to modify Zalon to use metadata for individual audio frame as taught by Baumgarte so that “The metadata may be transferred along with the audio content (indicated as "audio in" in the figures) to the playback or decoding side that is shown, being the decoding and playback system, e.g., via an Internet download or via Internet streaming.  At the decoding or playback side, no additional delay is incurred since the instantaneous loudness values are in the metadata, and so a loudness estimation process at the playback side is not necessary.” thus resulting in “The improved smoothness, the reduced decoder complexity, and the lack of additional delay” (Baumgarte, Paragraph 0018).
Regarding claim 13, Zalon teaches the initial audio segment and the subsequent audio segment are inserted into a pass-through device (Fig. 1A, col. 8 ll.42-col. 10 ll. 17).
Regarding claim 17, Zalon teaches detecting a gap length in time between playback of the initial audio segment and playback of the subsequent audio segment (Fig. 23 gap between 2805 “OUT” song and 2810 “IN” song).
Regarding claim 18, Zalon teaches calculating a number of audio frames that are to be inserted to fill the detected gap length; and inserting the calculated number of audio frames between the initial audio segment and the subsequent audio segment (col. 
Regarding claim 20, Zalon teaches a non-transitory computer-readable medium comprising one or more computer- executable instructions that, when executed by at least one processor of a computing device (col. 29 ll. 25-col. 31 ll. 30, col. 46 ll. 12-60), cause the computing device to:
identify, within at least one media item that includes a plurality of audio segments, an initial audio segment (Fig. 2 item 215) and a subsequent audio segment (Fig. 2 item 230) that follows the initial audio segment (col. 10 ll. 62-col. 11 ll. 2);
access a first set of metadata that corresponds to a last audio frame of the initial audio segment, the first set of metadata including information indicating one or more audio characteristics of the last audio frame of the initial audio segment (col. 8 ll. 42-col. 10 ll. 7, col. 11 ll. 22-30, col. 13 ll. 8-55, col. 14 ll. 1-46, col. 15 ll. 5-38 generating, storing and accessing previous frame audio characteristics);
access a second set of metadata that corresponds to the first audio frame of the subsequent audio segment, the second set of metadata including information indicating one or more audio characteristics of the first audio frame of the subsequent audio segment (col. 8 ll. 42-col. 10 ll. 7, col. 11 ll. 22-30, col. 13 ll. 8-55, col. 14 ll. 1-46, col. 15 ll. 5-38 generating, storing and accessing next frame audio characteristics);
generate, based on the first and second sets of metadata, a new set of metadata that is based on both the audio characteristics of the last audio frame in the initial audio segment and the audio characteristics of the first audio frame in the subsequent audio segment; 

insert a dynamically variable number of new audio frames between the last audio frame of the initial audio segment and the first audio frame of the subsequent audio segment that meets at least the specified minimum number of audio frames, wherein the metadata of each new audio frame includes proportionally fewer audio characteristics (volume, rhythmic level) of the last 2Application No.: 15/931,442Attorney's Docket No.: 010704.0037U1audio frame in the initial audio segment (less effect of previous frame volume, rhythmic level based on fade out) and proportionally more audio characteristics of the first audio frame in the subsequent audio segment (more effect of next frame volume, rhythmic level based on fade in), wherein the proportional changes in audio characteristics in the metadata correspond to the dynamically variable number of inserted new audio frames (each inserted frame volume dependent on step value determined by number of frames inserted), and wherein the inserted audio frames are inserted into a detected gap between playback of the initial audio segment and playback of the subsequent audio segment until subsequent header information from audio frames in the subsequent audio segment is accessed to determine the audio characteristics of the subsequent audio segment (Note: this limitation is rejected under 35 USC 112 above.) (Fig. 22 audio frames inserted until reaching subsequent audio segment level); and apply the new set of metadata to the new audio frames as the new audio frames are dynamically generated 
Zalon teaches analyzing audio frame by frame (Fig. 6-8) and tagging audio segments and extracting metadata from data packet (col. 9 ll. 17-18), but Zalon does not explicitly teach metadata for audio frame, and wherein the first set of metadata is accessed from header information in audio frames of the initial audio segment. (Note: It was well known to a person of ordinary skill in the art that packet headers include metadata regarding actual payload (audio) data in a packet or a frame. The applicant is advised to refer to Bhattacharya (US Patent Application Publication No. 2010/0042740) Paragraph 0005, Kraemer (US Patent Application Publication No. 2011/0040397) Paragraphs 0030, 0061, Dressler (US Patent Application Publication No. 2012/0232910) Paragraph 0033, Begen (US Patent Application Publication No. 2013/0042015) Paragraph 0040, Amble (US Patent Application Publication No. 2014/0275851) Paragraph 0083 for such common knowledge.)
However, in the similar field, Baumgarte teaches using metadata for individual audio frame (Paragraphs 0017-0021), and wherein the metadata is accessed from header information in audio frames (Paragraphs 0018, 0020). (Note: Zalon and Baumgarte refer to the same industry standard ITU-R BS.1770-3).
It would have been obvious o a person of ordinary skill in the art before the effective filing date of the present invention to modify Zalon to use metadata for .

Allowable Subject Matter
Claims 11, 14-16, 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The above objection(s) is (are) based on the claim(s) as presently set forth in its (their) totality. It should not be interpreted as indicating that amended claim(s) broadly reciting certain limitations would be allowable. A more detailed reason(s) for allowance may be set forth in a subsequent Notice of Allowance if and when all claims in the application are put into a condition for allowance. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEMANT PATEL whose telephone number is (571)272-8620. The examiner can normally be reached M-F 8:00 AM - 4:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on 571-272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is 

HEMANT PATEL
Primary Examiner
Art Unit 2653



/HEMANT S PATEL/           Primary Examiner, Art Unit 2653