DETAILED ACTION
Claims 1-15, 17 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 3, 5-9, 12-15, 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibbon et al. (US 2007/0098350) in view of Quinn et al. (US 2011/0267419), and further in view of Mincher et al. (US 8913189), and further in view of Nomura et al. (US 2009/0148133).

Claim 1, Gibbon teaches a method of processing media content comprising video content and associated audio content, the method comprising: 
receiving the video content and the associated audio content (p. 0023); 
determining, based on the analysis (i.e. images and text of speech), one or more navigation points (i.e. jump locations) for enabling navigation of the media content, the one or more navigation points indicating points of interest in the associated audio content (i.e. speech is analyzed by captions) for short-term rewinding (i.e. jumping backward) (p. 0025-0027); 
embedding the one or more navigation points into metadata for the media content (p. 0026); and 
outputting the video content, the associated audio content, and the metadata (fig. 2B, p. 0024-0026); 
Gibbon is silent regarding a method of processing media content comprising video content and associated audio content, the method comprising:
analyzing the associated audio content; 
modifying the media content for replaying the media content with improved intelligibility of the associated audio content, wherein the modifying is performed in response to a user instruction instructing replay from one of the one or more navigation points, wherein the modifying comprises performing dialog enhancement to boost and clarify a dialog and wherein the dialog enhancement is faded out once a subsequent navigation point is reached;
wherein analyzing the audio content involves applying speech detection to the audio content; 
wherein the one or more navigation points are placed at respective starting points of spoken utterances included in the associated audio content. 
Quinn teaches a method of processing media content comprising video content and associated audio content, the method comprising:
analyzing the associated audio content (p. 0093); 
wherein analyzing the audio content involves applying speech detection to the audio content (i.e. speech of participants) (p. 0093); 
wherein the one or more navigation points are placed at respective starting points of spoken utterances included in the associated audio content (i.e. speaker begins speaking) (p. 0093). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have provided audio analysis as taught by Quinn to the system of Gibbon to location speech points in a data stream (p. 0093).
Mischer teaches a method of processing media content comprising video content and associated audio content, the method comprising:
“modifying the media content for replaying the media content with improved intelligibility (i.e. through use of audio processing function) of the associated audio content, wherein the modifying is performed in response to a user instruction instructing replay (i.e. playing content with events) from one of the one or more navigation points” (col. 4, lines 20-41, col. 5, lines 35-59).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have provided audio enhancement as taught by Minscher to the system of Gibbon to hear audio more clearly (col. 5, lines 35-59).
Nomura teaches the specific feature of:
“wherein the modifying comprises performing dialog enhancement (i.e. increase volume) to boost and clarify a dialog and wherein the dialog enhancement is faded out once a subsequent navigation point is reached” (i.e. according to a segment) (fig. 21; p. 0260-0272).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have provided audio enhancement as taught by Nomura to the system of Gibbon to allow for segments of increased volume (p. 0269).

Claim 3, Gibbon teaches the method of claim 1, wherein modifying the media content is selectively applied to a section of the media content of particular interest to the user (p. 0025-0027). 

Claim 5, Gibbon teaches the method of claim 4, wherein the level of modifying the media content commences to return to zero at a subsequent navigation point (i.e. the duration of the playback has an end point) (p. 0025-0027). 

Claim 6, Gibbon teaches the method of claim 1, wherein the one or more navigation points indicate respective offsets from a starting point of a respective current frame (i.e. scene 1-3 are according to a time line) (p. 0006, 0025-0027). 

Claim 7, Gibbon teaches the method of claim 1, wherein the metadata is time-aligned with the associated audio content (i.e. according to a timeline (p. 0006, 0025-0027). 
Claim 8, Gibbon teaches the method of claim 1, wherein the method is performed at an encoder (100) for encoding the media content (p. 0012); and 
the method further comprises receiving an input of one or more additional navigation points (i.e. automatically extracted points or manual by user) (p. 0028). 

Claim 9, Gibbon teaches the method of claim 1, further comprising: 
generating an audio-visual representation of the media content based on the video content, the associated audio content, and the metadata (fig. 2B). 

Claim 12, Gibbon teaches the method of claim 9, further comprising: 
providing a fast-forward replay mode (i.e. jumping to navigation points) in which respective portions of the media content are replayed starting from respective ones of the one or more navigation points (p. 0026-0027). 

Claim 13, Gibbon teaches the method of claim 9, further comprising: 
resuming playback after a pause of the replay at a timing indicated by a most recent one of the one or more navigation points (i.e. the ability to pause playback of content is envisioned) (p. 0005). 

Claim 14, Gibbon teaches an encoder (100) comprising a processor and a memory storing instructions for causing the processor to perform the operations of claim 1 (p. 0012, 0021).
 
Claim 15, Gibbon teaches a decoder (100) comprising a processor and a memory storing instructions for causing the processor to perform the operations of claim 1 (p. 0012). 

Claim 17, Gibbon teaches a computer-readable storage medium storing a program for causing a computer to perform the operations of claim 1 when performed on the computer (p. 0012). 

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibbon et al. (US 2007/0098350) in view of Quinn et al. (US 2011/0267419), and further in view of Mincher et al. (US 8913189), and further in view of Nomura et al. (US 2009/0148133), and further in view of Everett et al. (US 2018/0376187).

Claim 2, Gibbon is silent regarding the method of claim 1 wherein modifying the media content further comprises one or more of increasing program playback loudness, muting non-dialog audio tracks, and enabling of subtitles. 
Everett teaches the method of claim 1 wherein modifying the media content further comprises one or more of performing dialog enhancement, increasing program playback loudness (i.e. increase volume), muting non-dialog audio tracks, and enabling of subtitles (p. 0026).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have provided volume adjustment as taught by Everett to the system of Gibbon to improve user engagement (p. 0026).

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibbon et al. (US 2007/0098350) in view of Quinn et al. (US 2011/0267419), and further in view of Mincher et al. (US 8913189), and further in view of Nomura et al. (US 2009/0148133), and further in view of Everett et al. (US 2018/0376187), and further in view of Berwick et a. (US 2018/0048831).

Claim 4, Gibbon is silent regarding the method of claim 2, wherein a level of modifying the media content is fading out over time. 
Berwick teaches the method of claim 2, wherein a level of modifying the media content is fading out over time (p. 0033).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have provided fade out transitions as taught by Berwick to the system of Gibbon to generate combined video (p. 0033).

Claim 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibbon et al. (US 2007/0098350) in view of Quinn et al. (US 2011/0267419), and further in view of Mincher et al. (US 8913189), and further in view of Nomura et al. (US 2009/0148133), and further in view of White et al. (US 2015/0373281).

Claim 10, Gibbon is silent regarding the method of claim 9, further comprising: 
setting a scan rate for scanning through the media content based on a density of the one or more navigation points over time, wherein a higher density of navigation points over time is indicative of more interesting media content and a lower density of navigation points over time is indicative of less interesting media content. 
White teaches the method of claim 9, further comprising: 
setting a scan rate (i.e. user scrolling through points of interest) for scanning through the media content based on a density of the one or more navigation points over time (i.e. points of interest), wherein a higher density of navigation points over time is indicative of more interesting media content and a lower density of navigation points over time is indicative of less interesting media content (p. 0066, 0094-0098). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have provided density of navigation points as taught by White to the system of Gibbon to visually represent points of interest across a timeline (p. 0066).

Claim 11, Gibbon is silent regarding the method of claim 10, further comprising: 
setting a correspondence between points on a visual representation of a scan bar and points in time in the video content at least in part based on the density of the one or more navigation points over time. 
White teaches the method of claim 10, further comprising: 
setting a correspondence between points on a visual representation of a scan bar and points in time in the video content at least in part based on the density of the one or more navigation points over time (p. 0066, 0094-0098). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have provided density of navigation points as taught by White to the system of Gibbon to visually represent points of interest across a timeline (p. 0066).

Response to Arguments
Applicant's arguments filed 2/25/2022 have been fully considered but they are not persuasive.

Claim 1, Applicant argues that the Office relies on Mincher as allegedly teaching “modifying the media content for replaying the media content with improved intelligibility of the associated audio content, wherein the modifying is performed in response to a user instruction instructing replay from one of the one or more navigation points.” See Office Action at p. 4-5. As a first matter, the Office fails to address the arguments presented by the Applicant in the Response After Final Action dated December 21, 2021 that Mincher fails to teach or suggest the above recited limitation, as required by MPEP 707.07(f) 7 7.38 (“The examiner must, however, address any arguments
presented by the applicant which are still relevant to any references being applied.”).
The Examiner respectfully disagrees.  The After Final Action dated December 21, 2021 response presented amendments that changed the scope of the claim which would require further search and consideration.  This was noted by the Examiner when addressing the limitation in the Advisory Action dated 12/13/2021.

	Applicant further argues that, Mincher fails to teach or suggest at least “modifying the media content for replaying the media content with improved intelligibility of the associated audio content, wherein the modifying is performed in response to a user instruction instructing replay from one of the one or more navigation points”
Mincher describes a system for synchronizing video data and audio data based in part on the audible events and the visual events in the data. See Mincher at Abstract. As described in Mincher, “the audio processing module 138 is configured to use one or more audio processing functions 140(1), 140(2), ... 140(F) to modify at least a portion of the audio data 104 and generate processed audio data 142.” See Mincher at 5:25-27. The audio processing module 138 processes the audio data 104 based on one or more visual events 128 corresponding to said audio data 104. See Mincher at 5:35-45. Then, said processed audio data and, in some cases, the corresponding video data is presented to the end user. See Mincher at, e.g., FIG. 5, FIG. 7, 12:3-9. In other words, the audio data presented to the end user is (1) already modified audio data that (2) has been modified based on (e.g., in response to) the visual event(s), and not in response to a user instruction instructing replay from one of the one or more navigation points.
	The Examiner respectfully disagrees.  Mincher teaches that the audio is processed for presentation.  Mincher also teaches sync data which is made for presentation and transmission to the user (col. 10, lines 9-25).  Therefore, the modified audio data is processed in response to a user request to replay media because processing the audio is synchronized for the presentation of the video and audio data.  When a device is to be presented audio or video data, the audio processing will be performed and synchronized.  This is interpreted as “in response to a user instruction instructing replay…”.

	Applicant further argues that the Office relies on Nomura as allegedly teaching “wherein the modifying comprises performing dialog enhancement to boost and clarify a dialog and wherein the dialog enhancement is faded out once a subsequent navigation point is reached,” as recited by claim 1. The Applicant disagrees with this assertion.
“When at least one user exists who has not seen the television program corresponding to the program ID is judged to exist, the playback segment selection unit 115 modifies each sound intensity included in the read dialog segment table to be 1.5 times greater, and temporarily stores the modified dialog segment table. When no users exist who have not seen the television program corresponding to the program ID, the playback segment selection unit 115 temporarily stores the dialog segment table without modifying the sound intensity.”
(Emphasis added). See Nomura at [0264]-[0265]. In other words, Nomura describes a system that increases the sound intensity of a playback segment when a user has not seen the segment before, and does not modify the sound intensity when replaying a segment for a user (e.g., the user has seen the segment before). Furthermore, the unseen segments that have increased sound intensity are favored to be picked for playback, as they are more likely to exceed the threshold values of the overview evaluation function. See Nomura at [0270]-[0275] and Fig. 21.
Accordingly, the proposed combination of Gibbon, Quinn, Mincher, and Nomura, as suggested by the Office, would render Nomura inoperable for its intended purpose. The Office suggests that “it would have been obvious to one of ordinary skill in the art... to have provided audio enhancement as taught by Nomura to the system of Gibbon to allow for segments of increased volume.” See Office Action at p. 5. However, the proposed combination would require Nomura to increase the sound intensity for segments that the user has seen before, which is contradictory to the system described in paragraphs [0264]-[0265]. This would also render the playback segment selection unit inoperable for its intended purpose, as the unseen segments (e.g., with increased sound intensity) would no longer be favored to be picked for playback. See Nomura at [0270]-[0275].

	In response to Applicants argument, the Applicant has misconstrued the Examiner’s position.  Nomura is merely relied upon for teaching “wherein the modifying comprises performing dialog enhancement to boost and clarify a dialog and wherein the dialog enhancement is faded out once a subsequent navigation point is reached”.  Nomura is merely relied upon for teaching the claimed dialog enhancements to the audio.  The motivation to combine Nomura to Quinn and Mincher is to allow segments to have audio enhancements i.e. volume increases.  Quinn and Mincher are relied upon for teaching features relating to enhancing audio for seen and unseen segments as shown in the Office Action above.

Conclusion
Claims 1-15, 17 are rejected.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Inquiries
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MUSHFIKH I ALAM whose telephone number is (571)270-1710.  The examiner can normally be reached on 1:00PM-9:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on 571-272-4195.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MUSHFIKH I. ALAM
Primary Examiner
Art Unit 2426



/MUSHFIKH I ALAM/Primary Examiner, Art Unit 2426                                                                                                                                                                                                        8/26/2022