DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Acknowledgement is made of the amendment papers filed on June 28, 2021. The amendment has been entered. Claims 1, 21, 24, 28, and 31-33 have been amended. Claims 6 and 10 have been canceled. Claims 1, 4, 5, 7, 8, 11-17, and 21-33 remain pending.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 4, 5, 7, 8, 11-17, 21-23, and 28-33 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Gilson (US 10,726,871 B2).
In re Claim 1, Gilson discloses a computer-implemented method (see FIG. 2, COL. 6: LINES 7-59, where the methods and systems can be implemented on a computer 201, COL. 7: LINES 5-60, COL. 9: LINES 7-54, where the disclosure relates to systems that enable a user to consume content at a slower or faster rate than normal, and to adjusting the speed of content playback with, optionally, audio pitch shifting, and COL. 13: LINES 54-60), comprising: 
receiving, by at least one computer processor (see FIG. 2: 203 and COL. 6: LINES 50-59; and see COL. 9: LINES 44-54), an audio signal representing audio content (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67; see also COL. 7: LINES 16-60); 
determining, by at least one computer processor (see FIG. 2: 203), a speech tempo of speech in the audio content as the audio content is being played (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67), wherein the speech tempo is a measure of a number of syllables of speech in the audio content per unit of time as the audio content is being played (Id., where the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words; see also FIG. 7 and COLS. 12-13: LINES 1-12, where playback speed can be adjusted to meet user defined rate of syllables or words per second, and where the number of words or syllables per second of specific timeframes can be calculated throughout the show); and 
in response to the determining the speech tempo of speech in the audio content (see FIG. 4, FIG. 5, and FIG. 6), automatically adjusting, by at least one computer processor (see FIG. 2: 203), a playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67), wherein the automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes:
storing a database including a plurality of selectable playback speeds, each selectable playback speed of the plurality of selectable playback speeds corresponding to a different speech (see COL. 7: LINES 24-60, COL. 9: LINES 44-54, and COLS. 10-11: LINES 21-67, where the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors, e.g. a user preference profile can comprise one or more of a slow motion, normal, fast, faster, and fastest profile, and where user preferences can be associated with one or more segments of content, content type, and the like, and where the content playback profile can comprise a minimum playback speed, an average playback speed, a maximum playback speed, and the like; see also COL. 12: LINES 1-43, COL. 13: LINES 26-60, and COLS. 13-14: LINES 61-64); 
determining in which speech tempo range of the plurality of different speech tempo ranges the determined speech tempo of speech in the audio content falls (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67, where the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words, etc., and where the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors; see also FIG. 7 and COLS. 12-13: LINES 1-41, where provided are methods for translating a user friendly playback setting into an actual playback speed, and where playback speed can be adjusted to meet user defined rate of syllables or words per second); 
selecting the speech tempo range of the plurality of different speech tempo ranges in which the determined speech tempo of speech in the audio content falls (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67, where the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words, etc., and where the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors; and COLS. 12-13: LINES 1-60, where provided are methods for translating a user friendly playback setting into an actual playback speed, and playback speed can be adjusted to meet user defined rate of syllables or words per second); and 
changing the playback speed of the audio content as the audio content is being played to be the selectable playback speed corresponding to the selected speech tempo range of the plurality of different speech tempo ranges (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67; see also FIG. 7 and COLS. 12-13: LINES 1-60, where playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show; and see COLS. 13-14: LINES 61-51).

In re Claim 4, Gilson discloses wherein automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes: 
determining whether the determined speech tempo of speech in the audio content falls below a threshold (see FIG. 4: 402, FIG. 5: 502-503, FIG. 6: 602-604, and cols. 10-11: ll. 21-67); and 
automatically increasing the playback speed of the audio content as the audio content is being played based on the determination of whether the determined speech tempo of speech in the audio content falls below the threshold (see FIG. 4: 403, FIG. 5: 502-503, FIG. 6: 603-604, and cols. 10-11: ll. 21-67; see also FIG. 7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show; and see cols. 13-14: ll. 61-51).

In re Claim 5, Gilson discloses wherein automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes: 
determining whether the determined speech tempo of speech in the audio content surpasses a threshold (see FIG. 4: 402, FIG. 5: 502-503, FIG. 6: 602-604, and cols. 10-11: ll. 21-67); and 
automatically decreasing the playback speed of the audio content as the audio content is being played based on the determination of whether the determined speech tempo of speech in the audio content surpasses the threshold (see FIG. 4: 403, FIG. 5: 502-503, FIG. 6: 603-604, and cols. 10-11: ll. 21-67; see also FIG. 7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show; and see cols. 13-14: ll. 61-51).

In re Claim 7, Gilson discloses wherein automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes automatically increasing the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content (see FIG. 4: 403, FIG. 5: 502-503, FIG. 6: 603-604, and cols. 10-11: ll. 21-67; see also FIG. 7, cols. 12-13: ll. 1-60, and cols. 13-14: ll. 61-51).

In re Claim 8, Gilson discloses wherein automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes automatically decreasing the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content (see FIG. 4: 403, FIG. 5: 502-503, FIG. 6: 603-604, and cols. 10-11: ll. 21-67; see also FIG. 7, cols. 12-13: ll. 1-60, and cols. 13-14: ll. 61-51).

In re Claim 11, Gilson discloses wherein determining the speech tempo of speech in the audio content as the audio content is being played (see FIG. 4: 402, FIG. 5: 501, and FIG. 6: 601-603) includes:
detecting syllables spoken in the audio content as the audio content is being played (see cols. 10-11: ll. 21-67, whereby the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words; and see FIG. 7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show);
determining a first number of syllables spoken in the audio content as the audio content is being played over a first period of time based on detecting syllables spoken in the audio content as the audio content is being played (see FIGS. 4-7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show); and
determining the speech tempo of speech in the audio content as the audio content is being played based on the determined first number of syllables spoken in the audio content as the audio content is being played over the first period of time (see FIGS. 4-7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show).

In re Claim 12, Gilson discloses wherein determining the speech tempo of speech in the audio content as the audio content is being played further includes: 
(see FIGS. 4-7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show); and 
updating the determined speech tempo of speech in the audio content as the audio content is being played based on the determined second number of syllables spoken in the audio content as the audio content is being played over the second period of time (Id., and see cols. 13-14: ll. 61-64).

In re Claim 13, Gilson discloses wherein automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes: 
detecting a silent segment in the audio content (see cols. 10-11: ll. 21-67, whereby a user can specify a faster, slower, or normal playback speed for credit rolls, silent scenes, fight scenes, mature scenes, and the like); and 
changing the playback speed of the audio content as the audio content is being played in response to detection of the silent segment in the audio content (Id., and see FIGS. 4-6, col. 12: ll. 30-43, col. 13: ll. 26-41, and cols. 13-14: ll. 61-64).

In re Claim 14, Gilson discloses wherein automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes:
determining whether to increase or decrease the playback speed of the audio playback speed of the audio content as the audio content is being played (see FIG. 4: 402-403, FIG. 5: 501-503, and FIG. 6: 602-604; and see FIG. 7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show; see also cols. 13-14: ll. 61-51) in response to each detected corresponding incremental change in a current speech tempo of the speech in the audio content as the audio content is being played (Id., and see cols. 10-11: ll. 21-67).

In re Claim 15, Gilson discloses wherein automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes:
determining a type of content of media including the audio content (see cols. 10-11: ll. 21-67, whereby the playback factor can comprise one or more of a content type, a user preference profile, a third party playback profile, number of spoken syllables or words per unit of time, frequency of spoken words, comprehension difficulty level, or a rate of spoken words, and whereby content type can comprise a movie, a commercial, a television program, a music video, or audio only; and see cols. 13-14: ll. 61-51, whereby a plurality of different playback speeds can be utilized for a single piece of content, enabling the methods provided to determine when to apply each playback speed);
determining whether or to what extent to automatically adjust the playback speed of the audio content as the audio content is being played based on the type of content of the media (see cols. 10-11: ll. 21-67, whereby the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors, and whereby the speed of playback can be sped up and slowed down depending on the segment of content according to user preferences, and whereby user preferences can be associated with one or more segments of content, content type, and the like, e.g., a user can specify a faster, slower, or normal playback speed for credit rolls, silent scenes, fight scenes, mature scenes, and the like, and a user can specify a faster, slower, or normal playback speed for commercials, sitcoms, news reports, weather reports, infomercials; and see cols. 12-13: ll. 1-60, whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show or specific types of scenes throughout the show); and
automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content and the determination of whether to automatically adjust the playback speed of the audio content as the audio content is being played based on the type of content of the media (Id., and see FIG. 4: 403, FIG. 5: 502-503, and FIG. 6: 603-604).

In re Claim 16, Gilson discloses wherein the audio content is part of audiovisual content (see cols. 10-11: ll. 21-67, whereby content type can comprise a movie, a commercial, a television program, a music video, or audio only; and see cols. 12-13: ll. 1-60, whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show or specific types of scenes throughout the show).

In re Claim 17, Gilson discloses:
resampling, by at least one computer processor (see FIG. 2: 203), the audio content to diminish changes in pitch of the speech when the playback speed of the audio content is automatically adjusted as the audio content is being played (see col. 9: ll. 7-54, whereby the disclosure relates to adjusting the speed of content playback with audio pitch shifting; and see col. 10: ll. 21-54, whereby providing content to a display device can further comprise pitch shifted audio and/or an increased frame rate, and whereby the methods and systems provided can automatically adjust not only video frame rate, but also the pitch of associated audio, so as to prevent audio from sounding very high pitched in the event that playback speed is increased and to prevent audio from sounding very low pitched in the event that playback speed is decreased).

Claim 21 essentially recites the same limitations as claims 1 and 15, and is rejected for similar reasons. Therefore, Gilson anticipates all limitations of the claim.

In re Claim 22, Gilson discloses wherein the computer executable instructions, when executed by a computer processor, further cause the following to be performed:
determining to not automatically adjust playback speed of the audio content as the media is being played (see FIG. 5: 502-503 and col. 11: ll. 41-58) in response to a determination that the type of content of media is sports or a music performance (see col. 9: ll. 23-43, whereby dynamically adjusting playback speed based on content type…  and user preferences enables viewers to more easily understand the audio tracks while still maintaining a total play time similar to, or less than, what they would expect with a constant playback value; and cols. 10-11: ll. 21-67, whereby the playback factor can comprise one or more of a content type, a user preference profile… number of spoken syllables or words per unit of time… or a rate of spoken words, and the content type can comprise a movie, a commercial, a television program, a music video, or audio only, and whereby the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors, e.g., a user preference profile can comprise one or more of a slow motion, normal speed, fast, faster, fastest profile, and depending on the segment of content, the speed of playback can be sped up and slowed down according to user preferences, and whereby normal speed can mean the content takes the same amount of time to consume as originally intended, and further whereby user preferences can be associated with one or more segments of content, content type, and the like, e.g., a user can specify a faster, slower, or normal playback speed for credit rolls, silent scenes, fight scenes, mature scenes, and the like, and e.g., a user can specify a faster, slower, or normal playback speed for commercials, sitcoms, news reports, weather reports, infomercials, and the like; see also cols. 12-14: ll. 1-64, whereby the at least one playback factor can comprise one or more of a content type, a user preference profile, a third party playback profile, a rate of spoken words, a number of spoken words, a number of spoken syllables, and the like).

In re Claim 23, Gilson discloses wherein the determined type of content of media is sports (see cols. 10-11: ll. 21-67, whereby the content type can comprise, e.g., a movie, a commercial, a television program, a music video, or audio only, and depending on the segment of content, the speed of playback can be sped up and slowed down according to user preferences, and whereby normal speed can mean the content takes the same amount of time to consume as originally intended, and further whereby user preferences can be associated with one or more segments of content, content type, and the like, e.g., a user can specify a faster, slower, or normal playback speed for credit rolls, silent scenes, fight scenes, mature scenes, and the like, and e.g., a user can specify a faster, slower, or normal playback speed for commercials, sitcoms, news reports, weather reports, infomercials, and the like) and the computer executable instructions, when executed by a computer processor, further cause the following to be performed:
determine that the current speech tempo of speech in the audio content as the media is being played falls above a threshold (see FIGS. 4-6 and cols. 10-11: ll. 21-67); and
determine to automatically adjust the playback speed of the audio content as the media is being played based on the determination that the current speech tempo of speech in the audio content as the media is being played falls above the threshold and a determination that the type of content of media is not sports and is not a music performance (see FIGS. 4-6 and cols. 10-11: ll. 21-67, whereby the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, or a rate of spoken words, and the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors, and depending on the segment of content, the speed of playback can be sped up and slowed down according to user preferences, and normal speed can mean the content takes the same amount of time to consume as originally intended, and whereby user preferences can be associated with one or more segments of content, content type, and the like, e.g., a user can specify a faster, slower, or normal playback speed for credit rolls, silent scenes, fight scenes, mature scenes, and the like, and a user can specify a faster, slower, or normal playback speed for commercials, sitcoms, news reports, weather reports, infomercials, and the like; see also cols. 12-14: ll. 1-64, whereby the at least one playback factor can comprise one or more of a content type, a user preference profile, a third party playback profile, a rate of spoken words, a number of spoken words, a number of spoken syllables, and the like).

In re Claim 28, Gilson discloses:
receiving by at least one computer processor (FIG. 2: 203), a selection of a target speech tempo from a user (see cols. 10-11: ll. 21-67, whereby the playback factor can comprise a user preference profile, and the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors; see also FIG. 7 and cols. 12-13: ll. 1-41, whereby provided are methods for translating a user friendly playback setting into an actual playback speed, and whereby playback speed can be adjusted to meet user defined rate of syllables or words per second); and
changing by at least one computer processor (203), the playback speed of the audio content as the audio content is being played in to have the audio played back with a resulting target speech tempo of the selected target speech tempo (see FIG. 4: 403, FIG. 5: 503, FIG. 6: 604, and cols. 10-11: ll. 21-67, by way of dynamically adjusting the playback speed of the content; see also cols. 13-14: ll. 61-51).

In re Claim 29, Gilson discloses wherein the selection of the target speech tempo from the user is received user via a settings menu graphical user interface generated and provided by a receiving device operation and playback manager generated by the at least one computer processor (see col. 12: ll. 13-62, whereby provided are methods for translating a user friendly playback setting, e.g. a selection on a scale from 1 to 10, into an actual playback speed; col. 9: ll. 44-54, whereby playback speed adjustment can be performed at a user’s device; and cols. 7-8: ll. 16-21, whereby the system memory 212 contains playback data 207 and program modules such as operating system 205 and variable playback software 206 that are immediately accessible to and are presently operated on by the processing unit 203, and where variable playback software 206 can comprise standalone software and/or software integrated into existing content players, and where the user can enter commands and information into the computer 201 via an input device, and whereby any step and/or result of the methods can be output in any form to an output device, and such output can be any form of visual representation, including, but not limited to, textual, graphical, animation, and the like).

In re Claim 30, Gilson discloses:
continuously determining, by at least one computer processor (FIG. 2: 203), whether to increase or decrease playback speed of the audio content as the audio content is being played (see FIG. 4: 402-403, FIG. 5: 501-503, and FIG. 6: 602-604; and see FIG. 7 and cols. 12-13: ll. 1-60, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show; see also cols. 13-14: ll. 61-51) for each detectable corresponding incremental change in the current speech tempo of the audio content (Id., and see cols. 10-11: ll. 21-67, by way of dynamically adjusting the playback speed).

In re Claim 31, Gilson discloses wherein a relationship between the detected speech tempo and a corresponding increase or decrease of playback speed is linear (see cols. 10-11: ll. 21-67, whereby a user preference profile can comprise one or more of a slow motion, normal speed, fast, faster, fastest profile, and, depending on the segment of content, the speed of playback can be sped up and slowed down according to user preferences, and there is no requirement for a linear change between profiles; cols. 12-13: ll. 8-60, whereby settings may be calculated; and cols. 13-14: ll. 61-64, whereby playback speeds can be adjusted automatically by monitoring transitions between segments and segment-types, and whereby a plurality of different playback speeds can be utilized for a single piece of content enabling the methods provided to determine when to apply each playback speed, and transitions between playback speeds can be immediate or can be gradual).

In re Claim 32, Gilson discloses wherein a relationship between the detected speech tempo and a corresponding increase or decrease of playback speed is logarithmic (see cols. 10-11: ll. 21-67, whereby a user preference profile can comprise one or more of a slow motion, normal speed, fast, faster, fastest profile, and, depending on the segment of content, the speed of playback can be sped up and slowed down according to user preferences, and there is no requirement for a linear change between profiles; cols. 12-13: ll. 8-60, whereby settings may be calculated; and cols. 13-14: ll. 61-64, whereby playback speeds can be adjusted automatically by monitoring transitions between segments and segment-types, and whereby a plurality of different playback speeds can be utilized for a single piece of content enabling the methods provided to determine when to apply each playback speed, and transitions between playback speeds can be immediate or can be gradual).

In re Claim 33, Gilson discloses wherein a relationship between the detected speech tempo and a corresponding increase or decrease of playback speed is exponential (see cols. 10-11: ll. 21-67, whereby a user preference profile can comprise one or more of a slow motion, normal speed, fast, faster, fastest profile, and, depending on the segment of content, the speed of playback can be sped up and slowed down according to user preferences, and there is no requirement for a linear change between profiles; cols. 12-13: ll. 8-60, whereby settings may be calculated; and cols. 13-14: ll. 61-64, whereby playback speeds can be adjusted automatically by monitoring transitions between segments and segment-types, and whereby a plurality of different playback speeds can be utilized for a single piece of content enabling the methods provided to determine when to apply each playback speed, and transitions between playback speeds can be immediate or can be gradual).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective 
Claims 24 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Gilson (US 10,726,871 B2) in view of Lukin et al. (US 7,974,838 B1).
In re Claim 24, Gilson discloses a computer-implemented method for compressing digital media data (see FIG. 2, COL. 6: LINES 7-59, where the methods and systems can be implemented on a computer 201, COL. 7: LINES 5-60, COL. 9: LINES 7-54, where the disclosure relates to systems that enable a user to consume content at a slower or faster rate than normal, and to adjusting the speed of content playback with, optionally, audio pitch shifting, and COL. 13: LINES 54-60), comprising:
receiving, by at least one computer processor (see FIG. 2: 203 and COL. 6: LINES 50-59; and see COL. 9: LINES 44-54), an audio signal representing audio content of the digital media data (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67; see also COL. 7: LINES 16-60);
determining, by at least one computer processor (see FIG. 2: 203), a speech tempo of speech in the audio content (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67), wherein the speech tempo is a measure of a number of syllables of speech in the audio content per unit of time as the audio content is being played (Id., where the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words; see also FIG. 7 and COLS. 12-13: LINES 1-12, where playback speed can be adjusted to meet user defined rate of syllables or words per second, and where the number of words or syllables per second of specific timeframes can be calculated throughout the show); and
(see FIG. 4, FIG. 5, and FIG. 6), compressing, by at least one computer processor (see FIG. 2: 203), the digital media data based on the determined speech tempo of the speech in the audio content of the digital media data (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67; see also FIG. 7 and COLS. 12-13: LINES 1-12); and
automatically adjusting a playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67), wherein the automatically adjusting the playback speed of the audio content as the audio content is being played based on the determined speech tempo of the speech in the audio content includes: 
storing a database including a plurality of selectable playback speeds, each selectable playback speed of the plurality of selectable playback speeds corresponding to a different speech tempo range of a plurality of different speech tempo ranges (see COL. 7: LINES 24-60, COL. 9: LINES 44-54, and COLS. 10-11: LINES 21-67, where the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors, e.g. a user preference profile can comprise one or more of a slow motion, normal, fast, faster, and fastest profile, and where user preferences can be associated with one or more segments of content, content type, and the like, and where the content playback profile can comprise a minimum playback speed, an average playback speed, a maximum playback speed, and the like; see also COL. 12: LINES 1-43, COL. 13: LINES 26-60, and COLS. 13-14: LINES 61-64); 
determining in which speech tempo range of the plurality of different speech tempo ranges the determined speech tempo of speech in the audio content falls (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67, where the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words, etc., and where the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors; see also FIG. 7 and COLS. 12-13: LINES 1-41, where provided are methods for translating a user friendly playback setting into an actual playback speed, and where playback speed can be adjusted to meet user defined rate of syllables or words per second); 
selecting the speech tempo range of the plurality of different speech tempo ranges in which the determined speech tempo of speech in the audio content falls (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67, where the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words, etc., and where the user preference profile can comprise a playback speed preferred by an end user and/or parameters selected by the user based on playback factors; and COLS. 12-13: LINES 1-60, where provided are methods for translating a user friendly playback setting into an actual playback speed, and playback speed can be adjusted to meet user defined rate of syllables or words per second); and 
changing the playback speed of the audio content as the audio content is being played to be the selectable playback speed corresponding to the selected speech tempo range of the plurality of different speech tempo ranges (see FIG. 4, FIG. 5, FIG. 6, and COLS. 10-11: LINES 21-67; see also FIG. 7 and COLS. 12-13: LINES 1-60, where playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show; and see COLS. 13-14: LINES 61-51).
Gilson relates to adjusting the speed of content playback with audio pitch shifting (see col. 9: ll. 23-43). Gilson notes that merely increasing the playback speed by a set amount would cause, at certain times, (depending on the talking speed of the speaker, amount of scene/camera cuts, or background noise, number of people speaking, and the like), difficulty in understanding what was said (Id.). As such, (Id.). In Gilson, playback speed adjustment can be performed at a user's device, at the content creator's facility, at the content provider's facility, at a third party vendor facility, or any combination thereof (see col. 9: ll. 44-54). Similarly, preferences and profiles for playback speeds can be stored on content media, a user's device, at the content creator's facility, at the content provider's facility, at a third party vendor facility, or any combination thereof (Id.). Moreover, preferences and playback profiles for content can be stored and downloaded through the Internet (Id.).
In Gilson’s operating environment, the computer 201 comprises a system memory 212 and a mass storage device 204 (see FIG. 2). The system memory 212 contains data such as playback data 207 and/or program modules such as operating system 205 and variable playback software 206 that are immediately accessible to and/or are presently operated on by the processing unit 203 (see col. 7: ll. 16-29). Optionally, any number of program modules can be stored on the mass storage device 204, including operating system 205 and variable playback software 206 (see col. 7: ll. 44-60). Moreover, variable playback software 206 can comprise standalone software and/or software integrated into existing content players, for example, Windows Media Player, Realplayer, iTunes, and the like (Id.). In addition, playback data 207 can also be stored on the mass storage device 204 (Id.).
Further in Gilson, MPEG encoders such as encoder 112, are included for encoding local content or a video camera 109 feed (see col. 4: ll. 7-9); the control system 118 can provide input to the modulators for setting operating parameters, such as system specific MPEG table packet organization or conditional access information (see col. 4: ll. 24-34); the methods can utilize digital audio/video compression such as MPEG, or any other type of compression, and in an MPEG encoded transmission, content and other data (see cols. 4-5: ll. 58-10); the output of a single MPEG audio and/or video coder is called a transport stream comprised of one or more elementary streams, where an elementary stream is an endless near real-time signal (see col. 5: ll. 11-56); and the multi program transport stream carries many different programs and each may use a different compression factor and a bit rate that can change dynamically even though the overall bit rate stays constant (see cols. 5-6: ll. 57-59).
To this end, Gilson fully enables the claimed method for compressing digital media data, but does not explicitly detail that the digital media data is compressed by re-encoding the digital media data content. Though, one of ordinary skill in the art would recognize that such a feature of digital media compression is known in the art as evidenced below.
In a similar variable speed playback endeavor, Lukin is directed to a system and method for analysis and adjustment of vocal qualities, potentially in real-time (see FIGS. 1-2 and col. 1: ll. 10-13). For example, FIG. 2 illustrates an embodiment 20 of the disclosed invention capable of performing pitch adjustment in real time (see col. 4: ll. 1-17). Though, the vocal extraction and pitch detection may be performed in advance, with the pitch information stored for later use (Id.). Alternatively, a latency may be used with the audio source to allow the required processing, such latency not discernable by the singer or audience (Id.). In any case, Lukin teaches that should the algorithm include artifacts arising from a time-frequency transformation with a fixed window size, an adaptive multi-resolution processing technique may be utilized, which comprises processing source material with several different time-frequency resolutions and combining results in a transience-adaptive manner (see col. 5: ll. 14-22).
In Lukin, a PSOLA-type (Pitch-synchronous Overlap and Add) algorithm is used for pitch shifting (see col. 8: ll. 14-45). The PSOLA algorithm is combined with sampling rate conversion (resampling) to achieve pitch shifting, as known in the prior art (Id.). For example, to achieve pitch shifting by the factor of x[t], the embodiment applies a PSOLA time stretching by the factor x[t], and then resamples the (Id.). The resampling operation synchronously changes pitch and duration of the signal, which produces the desired pitch shifting effect (Id.).
In this way, Lukin teaches:
compressing, by at least one computer processor (see cols. 2-3: ll. 65-22), the digital media data by re-encoding the digital media data content based on the determined speech tempo of the speech in the audio content of the digital media data (see FIGS. 1-2 and col. 8: ll. 12-30, whereby the PSOLA algorithm is combined with sampling rate conversion/resampling to achieve pitch shifting, and where the resampling operation synchronously changes pitch and duration of the signal, which produces the desired pitch shifting effect; col. 8: ll. 31-42, where the PSOLA algorithm for time scale modification breaks the signal into windowed time granules, division of the signal into granules is guided by pitch detection, certain granules of the input signal are duplicated in the output signal in order to achieve time stretching, and certain granules of the input signal are discarded from the output signal in order to achieve time compression; and col. 8: ll. 43-63, whereby for resampling, a polyphase FIR filtering approach may be used, which reverts the signal to its original time duration but now at the desired pitch).
Based on the foregoing, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gilson’s digital media data compression method by incorporating Lukin’s pitch shifting technique as it amounts to nothing more than routine experimentation while yielding predictable results. One motivation would have been to synchronously change pitch and duration of the signal thereby producing the desired pitch shifting effect (see Lukin, col. 8: ll. 14-45).

In re Claim 25, Lukin further teaches wherein the re-encoding the audio content based on the determined speech tempo of the speech in the audio content of the digital media data includes:8Application No. 16/054,910
(see FIGS. 1-2 and col. 8: ll. 12-30, whereby the PSOLA algorithm is combined with sampling rate conversion/resampling to achieve pitch shifting, and whereby to achieve pitch shifting by the factor of x[t], the embodiment applies a PSOLA time stretching by the factor x[t], and then resamples the resulting signal to the original duration (i.e. by 1/x[t] times), and whereby the resampling operation synchronously changes pitch and duration of the signal, which produces the desired pitch shifting effect);
downsampling the audio content at the determined downsampling rate to remove the non-perceptible information from the audio content of the digital media data (Id., and see col. 8: ll. 31-42, whereby the PSOLA algorithm for time scale modification breaks the signal into windowed time granules with 2-times overlap, division of the signal into granules is guided by pitch detection where each granule has the length of 2 pitch periods, and then, in order to achieve time stretching by a fractional factor k, 1<k<2, every (k-1)N granules out of N are duplicated in the output signal according to their pitch period, or conversely, in order to achieve time compression, certain granules of the input signal are discarded from the output signal); and
re-encoding the downsampled audio content to generate a compressed version of the audio content (Id., and see col. 8: ll. 43-63, whereby for resampling, a polyphase FIR filtering approach may be used, which reverts the signal to its original time duration but now at the desired pitch).
Claims 26 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Gilson (US 10,726,871 B2) in view of Lukin et al. (US 7,974,838 B1), hereafter in further view of Luo et al. (US 10,339,974 B1).
In re Claim 26, Gilson in view of Lukin discloses the method of claim 24 as applied above. The combination of Gilson and Lukin also teaches wherein the re-encoding of the audio content based on the determined speech tempo of the speech in the audio content of the digital media data includes:
detecting silent regions present in the audio content based on the determined speech tempo of speech in the audio content (see Gilson—FIG. 4: 402, FIG. 5: 501, FIG. 6: 601-603, and cols. 10-11: ll. 21-67, whereby the playback factor can comprise one or more of a content type, a user preference profile, number of spoken syllables or words per unit of time, frequency of spoken words, etc., and whereby a user can specify a faster, slower, or normal playback speed for credit rolls, silent scenes, fight scenes, mature scenes, and the like; FIG. 7 and cols. 12-13: ll. 1-41, whereby playback speed can be adjusted to meet user defined rate of syllables or words per second, and whereby the number of words or syllables per second of specific timeframes can be calculated throughout the show; and cols. 13-14: ll. 61-64); and
re-encoding the digital media data content without the detected silent regions (see Gilson—FIG. 4: 403, FIG. 5: 502-503, FIG. 6: 603-604, cols. 10-11: ll. 21-67, and cols. 12-13: ll. 1-41; also see Lukin—FIGS. 1-2 and col. 8: ll. 12-63, the PSOLA algorithm is combined with sampling rate conversion/resampling to achieve pitch shifting, and the resampling operation synchronously changes pitch and duration of the signal which produces the desired pitch shifting effect, certain granules of the input signal are duplicated in the output signal in order to achieve time stretching, certain granules of the input signal are discarded from the output signal in order to achieve time compression, and for resampling, a polyphase FIR filtering approach may be used which reverts the signal to its original time duration but now at the desired pitch).
To this end, the combination of Gilson and Lukin does not explicitly teach: removing the detected silent regions from the audio content of the digital media data.
In a similar variable speed playback endeavor, Luo is directed to an improved device and process for controlling playback operation associated with an audio content (see cols. 1-2: ll. 66-23). Luo provides a method (see FIG. 2 and col. 2: ll. 40-52) of operating an audio controller device (see FIG. 1) that includes: (see col. 4: ll. 8-39); identifying a plurality of audio segments from the audio data based on a plurality of contextual parameters associated with the audio data (see FIG. 3 and cols. 4-6: ll. 40-42); associating each of the plurality of audio segments to one of a plurality of primary audio control interfaces provided at the audio controller device (see FIGS. 4-5 and cols. 6-7: ll. 43-30); and controlling a playback operation associated with respective one of the audio segments when an input is received at one or more of the primary audio control interfaces (Id., and see cols. 7-8: ll. 31-18).
In Luo’s method, audio controller device 100 determines contextual parameters 126 associated with the stored audio data 124 by one or more of: processing the content of the audio data 124; processing data received from machine learning algorithms; and processing input identifying user preferences (see FIG. 2: 220 and cols. 9-10: ll. 55-13). The contextual parameters associated with the audio data 124 may be determined based on one or more of: speech portion/silent portion in the audio data 124, volume level associated with different portions of the audio data 124, language/tone/accent/length of speech/rate of speech of different portions of the audio data 124, user profile identifying a speaker corresponding to the speech content associated with different portions of the audio data 124, and user profile identifying a potential listener of the audio data 124 (Id.).
Thereafter, Luo’s audio controller device 100 identifies a plurality of audio segments from the audio data 124 based on the plurality of contextual parameters associated with the audio data 124 (see FIG. 2: 230 and cols. 10-11: ll. 14-5). The audio controller device 100 further sets an optimal playback speed rate for each of the audio segments based on the one or more contextual parameters that are identified as affecting the respective one of the audio segments (Id.). The audio controller device 100 may determine optimal playback speed rate for each of the audio segments based on one or more of: pre-determined mapping of different optimal playback speed rates to different contextual parameters, user preferences, and input from machine learning algorithms that determine optimal playback speed rate based on playback speed rates used by listeners with different user profiles for different combination of (Id.). The optimal playback speed rates for different audio segments may be different (Id.).
Thereafter, Luo’s audio controller device 100 associates each of the plurality of identified audio segments to a respective one of a plurality of primary audio control interfaces 134 (see FIG. 2: 240 and col. 11: ll. 6-51), determines whether an input is received at one or more of the primary audio control interfaces 134 (see FIG. 2: 250 and cols. 11-12: ll. 52-48), and then when an input is received at one or more of the primary audio control interfaces 134, controls the playback operation corresponding to the audio segments for which input is received at one or more of the primary audio control interfaces 134 (see FIG. 2: 260 and cols. 11-12: ll. 52-48).
In this way, Luo teaches:
removing the detected silent regions from the audio content of the digital media data (see FIG. 2, cols. 4-5: ll. 40-34, and cols. 9-12: ll. 36-48, whereby the contextual parameters associated with the audio data 124 may be determined based on speech portion/silent portion, volume level associated with different portions, language/tone/accent/length of speech/rate of speech of different portions; FIG. 5 and cols. 13-14: ll. 3-67, whereby graphical user interface component 510-1 is associated with audio segment `1` that is identified based on contextual parameter `silent portion`, and for audio segment `1`, the user selected `skip` via the selection button 515-1 because it appears from the contextual parameter `silent portion` that audio segment ‘1’ does not include any audio data that may be of interest to the user and therefore the user would have wanted to skip listening to the audio segment; and see cols. 6-7: ll. 54-21, whereby the playback operation parameters for each audio segment include varying a playback speed rate, skipping a playback operation of the respective audio segments, and varying a speaker volume level, and whereby the electronic processor 116 automatically applies the playback speed rate to a playback duration (between a start frame and an end frame) of the corresponding audio segment, and the playback speed rate is automatically adjusted for the next audio segment within the audio file in accordance with the playback speed rate that is set for the next audio segment).
Based on the foregoing, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gilson and Lukin’s digital media data compression method by incorporating Luo’s audio segment skipping technique as it amounts to nothing more than routine experimentation while yielding predictable results. At least one motivation would have been to provide users with the ability to skip silent audio segments that do not include any audio data that may be of interest to the user (see Luo, FIG. 5 and col. 14: ll. 8-67).

In re Claim 27, Luo teaches wherein the detecting silent regions present in the audio content based on the determined speech tempo of speech in the audio content (see FIG. 2) includes determining that regions in the audio content with a detected speech tempo of zero are silent regions (see cols. 4-5: ll. 40-34 and cols. 9-12: ll. 36-48, whereby the contextual parameters associated with the audio data 124 may be determined at least based on speech portion/silent portion, and/or volume level associated with different portions; FIG. 5 and cols. 13-14: ll. 3-67, whereby for audio segment `1`, the user selected `skip` because it appears from the contextual parameter `silent portion` that audio segment ‘1’ does not include any audio data that may be of interest to the user; and see cols. 6-7: ll. 54-21, whereby the electronic processor 116 automatically applies the playback speed rate to a playback duration (between a start frame and an end frame) of the corresponding audio segment, and the playback speed rate is automatically adjusted for the next audio segment within the audio file in accordance with the playback speed rate that is set for the next audio segment).


Response to Arguments
Applicant's arguments filed June 28, 2021 have been fully considered but they are not persuasive.
Gilson explicitly teaches playback data 207 can be stored in any of one or more databases known in the art, where such databases can be centralized or distributed across multiple systems (see COL. 7: LINES 5-60). 
Examiner has detailed above the manner in which the prior art enables the claimed invention.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER L ELJAIEK whose telephone number is (571)272-5474.  The examiner can normally be reached on Monday-Thursday, 9:00am-3:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  
For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/ALEXANDER L. ELJAIEK/
Examiner
Art Unit 2651



/DUC NGUYEN/               Supervisory Patent Examiner, Art Unit 2651