DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 15, 19 and dependent claims from 1, 15, 19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8, 10, 13-15, 19-20, 23-32 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2014/0071342 by Winograd et al. in view of US 2020/0074992 by Xiong et al. and US 2013/0124984 by Kuspa.

Regarding claim 1, Winograd et al. discloses a computer-implemented method, comprising: 
accessing audio of an audio track that is associated with a video recording (paragraph 0034 teaches “The received audio, video and/or still image signals can be processed (e.g., converted from analog to digital, color correction, sub-sampled, evaluated to detect embedded watermarks, analyzed to obtain fingerprints etc.) under the control of the processor 204.”); 
detecting, within an identified section of the audio track presence of one or more segments (paragraph 0039 teaches “In some examples, such a specifically designed sound track can include one or more sections (e.g., dialog sections) with a higher volume than a normal audio content, a frequency-adjusted audio content that is tailored to compensate for deficiencies in an individual's auditory system, a dialog-only audio portion, and/or an audio description, which can include a narrator describing the scenes of the first content, including what transpires during silent or non-dialog portions of the first content.  In another example, the detection of identification information can trigger the presentation of a content on the user device that is specially tailored for visually impaired persons.”, fig. 4, At 404, one or more segments of the first content is received at a second device.  For example, the one or more segments may correspond to one or more audio segments of the first content that are acoustically transmitted by the first device and received via a microphone of the second device.  In another example, the one or more segments may correspond to one or more video frames of the first content that are optically transmitted by the first device and received via a camera of the second device. Continuing with the operations 400 of FIG. 4, at 406 identification information is extracted from the received segments of the first content at the second device.  Such identification information can include a content identifier, which can uniquely identify the content, as well as one or more timecodes that, for example, identify temporal locations of the received segments within the first content.  At 408, the extracted identification information is used to retrieve a corresponding second content (or multiple second contents) that is necessary for full comprehension of the first content.” Winograd et al. discloses a specifically designed sound trach can include one or more dialog sections with a higher volume than a normal audio content. Furthermore, a frequency-adjusted audio content that is tailored to compensate for deficiencies in an individual's auditory system, a dialog-only audio portion, and/or an audio description, which can include a narrator describing the scenes of the first content, including what transpires during silent or non-dialog portions of the first content. Herein, Winograd et al. teaches based 
in response to detecting the presence of the one or more segments within the identified section of the audio track, reducing a volume level of the audio track in the identified section or in an additional section of the audio track following the identified section (in addition to discussion above, paragraph 0039 teaches “In some examples, such a specifically designed sound track can include one or more sections (e.g., dialog sections) with a higher volume than a normal audio content, a frequency-adjusted audio content that is tailored to compensate for deficiencies in an individual's auditory system, a dialog-only audio portion, and/or an audio description, which can include a narrator describing the scenes of the first content, including what transpires during silent or non-dialog portions of the first content.  In another example, the detection of identification information can trigger the presentation of a content on the user device that is specially tailored for visually impaired persons.”, fig. 4, paragraph 0047-0048 (as discussed above), paragraph 0060 teaches “It should be noted that in some embodiments where the second content provides an alternate audio content (whether or not related to parental control), the original audio associated with the first content may be muted.
accessing an audio segment that includes additional spoken dialog, the additional spoken dialog being a different language than the spoken dialog detected in the identified section of the audio track (in addition to discussion above, paragraph 0039 teaches designed sound track, paragraph 0059 teaches “In some embodiments, the second content can include a speech track (or a closed-caption track) that is in a different language than the spoken language in the first content that is being presented by the first device.”); and 
modifying the audio track by inserting the accessed audio segment into the identified section or the additional section of the audio track, the inserted audio segment having a higher volume level than the reduced volume level of the audio track in the identified section or the additional section (in addition to discussion above, paragraph 0039, 0060-0061 teaches second content (alternate audio content/designed sound track) inserted having higher volume than first content).
Winograd et al. fails to disclose detecting the presence of spoken dialog within the identified section of the audio track; reducing a volume level of the audio track in the identified section, the inserted audio segment having a higher volume level than the reduced volume level of the audio track in the identified section.
Xiong et al. discloses detecting the presence of spoken dialog within the section of the audio track (paragraph 0016)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include detecting the 
Winograd et al. and Xiong et al. fail to disclose reducing a volume level of the audio track in the identified section, the inserted audio segment having a higher volume level than the reduced volume level of the audio track in the identified section.
Kuspa discloses reducing a volume level of the audio track in the identified section, the inserted audio segment having a higher volume level than the reduced volume level of the audio track in the identified section (paragraph 0202 teaches “In some embodiments, video description content may be allowed to overlap certain portions of the audio track.  For example, a user may have the option of modifying the video description content to overlap seemingly less important portions of the dialogue, music, sound effects, or the like.  In some embodiments, the main audio recorded dialogue, music, sound effects, or the like may be dipped (e.g., reduced) in volume so that the video description may be heard more clearly.  For example, the volume of music may be lowered while the video description content is being recited.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include reducing a volume level of the audio track in the identified section, the inserted audio segment having a higher volume level than the reduced volume level of the audio track in the identified section, as taught by Kuspa into the system of Winograd et al. and Xiong et 

Regarding claim 8, the computer-implemented method of wherein detecting the presence of the spoken dialog comprises automatically detecting the presence of the spoken dialog (Xiong et al., paragraph 0016).
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 10, the computer-implemented method further comprising processing the audio segment to alter a length of time of the audio segment prior to inserting the audio segment into the identified section of the audio track (in addition to discussion above, Winograd et al., paragraph 0043 teaches “The embedded timecodes, once detected at a watermark extractor, can be used in conjunction with the detected identification information to retrieve and present the second content in synchronization with the corresponding segments of the first content.  To facilitate access and retrieval of the second content, the stored secondary content can be indexed using, for example, a combination of the identification information and the timecode.” Herein, timecode helps to insert the audio segment into the identified section).

Regarding claim 13, the computer-implemented method wherein an amount of reduction in the volume level of the audio track in the identified section depends upon at least one characteristic of the identified section (in addition to discussion above, Winograd et al.. paragraph 0039, 0060 teaches based on the identification alternate audio is received).

Regarding claim 14, the computer-implemented method wherein detecting the presence of spoken dialog within the identified section of the audio track is performed, at least in part, by training a machine learning model to classify samples of audio tracks as containing or not containing spoken dialog and classify the identified section of the audio track using the trained machine learning model (in addition to discussion above, Winograd et al.. paragraph 0039, 0060 teaches based on the identification alternate audio is received; Xiong et al., fig. 1, paragraph 0016, 0031).
	The motivation for combining references has been discussed in independent claim above.

Claim 15 is rejected for the same reason as discussed in the corresponding claim 1 above.
Claim 19 is rejected for the same reason as discussed in the corresponding claim 1 above.
Claim 20 is rejected for the same reason as discussed in the corresponding claim 10 above.

Regarding claim 23, the computer-implemented method of wherein: the characteristic of the identified section comprises presence of a soundtrack within the identified section; and the amount of reduction in the volume level of the audio track in the identified section depends upon the presence of the soundtrack (Winograd et al., paragraph 0039, 0060; Kuspa, paragraph 0202). 7  
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 24, the computer-implemented method wherein detecting the presence of spoken dialog within the identified section of the audio track is performed, at least in part, by a Voice Activity Detection (VAD) technique (Xiong et al., paragraph 0016).
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 25, the computer-implemented method wherein: the volume level of the audio track is reduced in the identified section of the audio track; and the accessed audio segment is inserted into the identified section of the audio track (Winograd et al., paragraph 0039 teaches “In some examples, such a specifically designed sound track can include one or more sections (e.g., dialog sections) with a higher volume than a normal audio content, a frequency-adjusted audio content that is tailored to compensate for deficiencies in an individual's auditory system, a dialog-only audio portion, and/or an audio description, which can include a narrator describing the scenes of the first content, including what transpires during silent or non-dialog portions of the first content.  In another example, the detection of identification information can trigger the presentation of a content on the user device that is specially tailored for visually impaired persons.”, paragraph 0060 teaches “It should be noted that in some embodiments where the second content provides an alternate audio content (whether or not related to parental control), the original audio associated with the first content may be muted.”; Kuspa, paragraph 0202 teaches “In some embodiments, video description content may be allowed to overlap certain portions of the audio track.  For example, a user may have the option of modifying the video description content to overlap seemingly less important portions of the dialogue, music, sound effects, or the like.  In some embodiments, the main audio recorded dialogue, music, sound effects, or the like may be dipped (e.g., reduced) in volume so that the video description may be heard more clearly.  For example, the volume of music may be lowered while the video description content is being recited.”).
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 26, the computer-implemented method wherein: the volume level of the audio track is reduced in the additional section of the audio track following the identified section of the audio track; and the accessed audio segment is inserted into the additional section of the audio track following the identified section of the audio track (Winograd et al., paragraph 0039 teaches “In some examples, such a specifically designed sound track can include one or more sections (e.g., dialog sections) with a higher volume than a normal audio content, a frequency-adjusted audio content that is tailored to compensate for deficiencies in an individual's auditory system, a dialog-only audio portion, and/or an audio description, which can include a narrator describing the scenes of the first content, including what transpires during silent or non-dialog portions of the first content.  In another example, the detection of identification information can trigger the presentation of a content on the user device that is specially tailored for visually impaired persons.”, paragraph 0060 teaches “It should be noted that in some embodiments where the second content provides an alternate audio content (whether or not related to parental control), the original audio associated with the first content may be muted.”, herein, designed sound track can include one or more sections of dialog sections and an audio descriptions, thus meets claimed additional section.; Kuspa, paragraph 0202 teaches “In some embodiments, video description content may be allowed to overlap certain portions of the audio track.  For example, a user may have the option of modifying the video description content to overlap seemingly less important portions of the dialogue, music, sound effects, or the like.  In some embodiments, the main audio recorded dialogue, music, sound effects, or the like may be dipped (e.g., reduced) in volume so that the video description may be heard more clearly.  For example, the volume of music may be lowered while the video description content is being recited.”).
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 27, the computer-implemented method wherein the additional spoken dialog is a translation of the spoken dialog (in addition to discussion above, Winograd et al., paragraph 0039 teaches designed sound track, paragraph 0059 teaches “In some embodiments, the second content can include a speech track (or a closed-caption track) that is in a different language than the spoken language in the first content that is being presented by the first device.”).

Regarding claim 28, the computer-implemented method wherein reducing the volume level of the audio track in the identified section or in the additional section comprises: determining an amount to reduce the volume level of the audio track in the identified section that causes the additional spoken dialog to be comprehensible to a listener hearing both the additional spoken dialog and the spoken dialog; reducing the volume level of the audio track in the identified section by the amount (Winograd et al., paragraph 0060 teaches “It should be noted that in some embodiments where the second content provides an alternate audio content (whether or not related to parental control), the original audio associated with the first content may be muted.”, Herein, volume amount of the first content reduce to mute for alternate audio content to hear; Kuspa, paragraph 0202 teaches “In some embodiments, video description content may be allowed to overlap certain portions of the audio track.  For example, a user may have the option of modifying the video description content to overlap seemingly less important portions of the dialogue, music, sound effects, or the like.  In some embodiments, the main audio recorded dialogue, music, sound effects, or the like may be dipped (e.g., reduced) in volume so that the video description may be heard more clearly.  For example, the volume of music may be lowered while the video description content is being recited.” Herein, the amount of volume is reduces to hear video description more clearly).
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 29, the computer-implemented method further comprising increasing, before inserting the accessed audio segment into the identified section of the audio track, the volume level of the accessed audio segment by: determining an amount to increase the volume level of accessed audio segment that causes the additional spoken dialog to be comprehensible to a listener hearing both the additional spoken dialog and the spoken dialog; and increasing the volume level of the accessed audio segment by the amount (Winograd et al., paragraph 0060 teaches “It should be noted that in some embodiments where the second content provides an alternate audio content (whether or not related to parental control), the original audio associated with the first content may be muted.”, Herein, volume amount of the first content reduce to mute for alternate audio content to hear; Kuspa, paragraph 0202 teaches “In some embodiments, video description content may be allowed to overlap certain portions of the audio track.  For example, a user may have the option of modifying the video description content to overlap seemingly less important portions of the dialogue, music, sound effects, or the like.  In some embodiments, the main audio recorded dialogue, music, sound effects, or the like may be dipped (e.g., reduced) in volume so that the video description may be heard more clearly.  For example, the volume of music may be lowered while the video description content is being recited.” Herein, the amount of volume is reduces to hear video description more clearly).
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 30, the system wherein the volume level of the audio track in the identified section is reduced an amount that causes the additional spoken dialog to be comprehensible to a listener hearing both the additional spoken dialog and the spoken dialog (Winograd et al., paragraph 0060 teaches “It should be noted that in some embodiments where the second content provides an alternate audio content (whether or not related to parental control), the original audio associated with the first content may be muted.”, Herein, volume amount of the first content reduce to mute for alternate audio content to hear; Kuspa, paragraph 0202 teaches “In some embodiments, video description content may be allowed to overlap certain portions of the audio track.  For example, a user may have the option of modifying the video description content to overlap seemingly less important portions of the dialogue, music, sound effects, or the like.  In some embodiments, the main audio recorded dialogue, music, sound effects, or the like may be dipped (e.g., reduced) in volume so that the video description may be heard more clearly.  For example, the volume of music may be lowered while the video description content is being recited.” Herein, the amount of volume is reduces to hear video description more clearly).
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 31, the system wherein the physical electronic memory comprises additional computer-executable instructions that, when executed by the physical electronic processor, cause the physical electronic processor to: determining, before the accessed audio segment is inserted into the identified section of the audio track, an amount to increase the volume level of the accessed audio segment that causes the additional spoken dialog to be comprehensible to a listener hearing both the additional spoken dialog and the spoken dialog; and increase the volume level of the accessed audio segment by the amount (Winograd et al., paragraph 0060 teaches “It should be noted that in some embodiments where the second content provides an alternate audio content (whether or not related to parental control), the original audio associated with the first content may be muted.”, Herein, volume amount of the first content reduce to mute for alternate audio content to hear in high volume, volume amount is decrease to increase the volume level to hear; Kuspa, paragraph 0202 teaches “In some embodiments, video description content may be allowed to overlap certain portions of the audio track.  For example, a user may have the option of modifying the video description content to overlap seemingly less important portions of the dialogue, music, sound effects, or the like.  In some embodiments, the main audio recorded dialogue, music, sound effects, or the like may be dipped (e.g., reduced) in volume so that the video description may be heard more clearly.  For example, the volume of music may be lowered while the video description content is being recited.” 
	The motivation for combining references has been discussed in independent claim above.

Regarding claim 32, the system wherein the additional spoken dialog is a translation of the spoken dialog (in addition to discussion above, Winograd et al., paragraph 0039 teaches designed sound track, paragraph 0059 teaches “In some embodiments, the second content can include a speech track (or a closed-caption track) that is in a different language than the spoken language in the first content that is being presented by the first device.”).

Claims 11, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2014/0071342 by Winograd et al., US 2020/0074992 by Xiong et al. and US 2013/0124984 by Kuspa in view of US 2018/0358052 by Miller et al.

Regarding claim 11, Winograd et al. discloses inserting the accessed audio segment into the identified section of the audio track, Xiong et al. discloses detecting the presence of spoken dialog within the section of the audio track, Kuspa discloses the inserted audio segment having a higher volume level than the reduced volume level of the audio track in the identified section, but fail to disclose the computer-implemented method wherein the processing of the audio segment to alter the length of time of the audio segment in time further comprises at least one of increasing or decreasing the length of time of the audio segment to match a length of time of the identified section of the audio track.

It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include the computer-implemented method wherein the processing of the audio segment to alter the length of time of the audio segment in time further comprises at least one of increasing or decreasing the length of time of the audio segment to match a length of time of the identified section of the audio track, as taught by Miller et al. into the system of Winograd et al., Xiong et al. and Kuspa, because such incorporation would allow more options to a user during playback of the video with audio, thus increase user flexibility of the system.

Claim 18 is rejected for the same reason as discussed in the corresponding claim 11 above.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NIGAR CHOWDHURY whose telephone number is (571)272-8890.  The examiner can normally be reached on Monday-Friday 9AM-5PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Tran can be reached on 571-272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/NIGAR CHOWDHURY/Primary Examiner, Art Unit 2484