DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 18, 2022 has been entered. Claims 1 – 9 and 11 – 21 are currently pending and considered below.

Response to Arguments
Applicant's arguments filed January 18, 2022 have been fully considered but they are not persuasive. Applicant argues on page 13 of the Remarks in regards to Kahn, “Kahn does not remedy the above-noted deficiencies of Johnston,” however the Examiner respectfully disagrees and has provided citations from Kahn relevant to the newly amended features in the rejection below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1 – 9, 11, 12, and 14 – 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Johnston et al. (US 2015/0301788 A1), hereinafter Johnston, in view of Kahn (US 2019/0069045 A1), hereinafter Kahn.

Claim 1: Johnston discloses a media control device, comprising: 
a first output port (see at least, “a shared sound system 124, such as a home theater audio entertainment system including one or more speakers,” [0021], and “In some embodiments, the equipment of the first user 226a also includes a personal sound system, e.g., similar to the personal sound system 228 of the second user 226b,” [0031]);
a second output port different from the first output port (see at least, “personal sound system 228 of the second user 226b,” [0031]); and 
circuitry, coupled to the first output port and the second output port, wherein the circuitry is configured to (see at least, “Media Processor 216,” including e.g., “Controller 260,” “User Interface 262,” etc., FIG. 2): 
receive a user input indicative of disability information of a user (see at least, “Determine whether a selection for a second audio portion has been received at 410,” [0046], “By way of non-limiting example, such alternative audio presentations can include presentations adapted to individuals having hearing impairments,” [0049]); 
control the first output port to output a first audio portion of media content (see at least, “The first decoded audio portion is forwarded to equipment of a first user at 408,” [0045], e.g., “a dialog portion of a multimedia program,” [0064], “verbal portion,” [0093]), wherein the first audio portion is associated with an image portion of the audio content (see at least, “Preferably, the forwarded decoded 
retrieve a second audio portion based on the received user input (see at least, “To the extent that the selection was received at 410, a second encoded audio portion is extracted at 412 and decoded at 414. The second decoded audio portion is forwarded to equipment of a second user in sync with the detected video at 416,” [0048]); and 
control the second output port to output each of the first audio portion and the second audio portion based on the received user input (see at least, “Alternative audio content can be prepared, e.g., with the dialog accentuated with respect to other background sounds and/or musical scores of a particular program. Such preparations can be produced by adjusting or otherwise mixing different audio tracks accordingly. Alternatively or in addition, a separate dialog, distinct from an original program production, can be prepared or otherwise separately recorded,” [0050], “Alternative audio content can also include post-processed versions of the original audio content of a particular program. For example, the sound track can be filtered to accentuate or otherwise emphasis one or more of high frequency content or low frequency content. Other techniques can include filtering to reduce or otherwise eliminate impulsive noise,” [0051]).
Johnston does not explicitly disclose wherein the second audio portion is an audio form of a description of content of the image portion; identify a gap in the first audio portion during which the second audio portion is playable, wherein the gap is identified based on a duration of the second audio portion; adjust a timing of output of the second audio portion based on the identified gap; and control output based on the adjusted timing of the second audio portion. However, Kahn discloses automatic generation of descriptive video service tracks similar to the alternative audio disclosed by Johnston (see at least, “Such post-production synchronization markers can be used to synchronize other content, such as subtitles and/or descriptive audio, to one or more of the video or the soundtrack,” Johnston [0038]). Kahn discloses in order to create descriptive audio “The device 102 is configured to receive any given program, such as program 100, and to process the program to produce DVS tracks for the program,” Kahn [0028]. Kahn further explicitly discloses wherein the audio portion is an audio form of a description of content of the image portion (see at least, “The video and metadata may also be provided to an object recognition unit 110 that recognizes the objects in each scene. This information may then be passed onto a descriptive text generator 112. The descriptive text may then be provided to a text-to-speech convertor 114,” [0031], “An example of a component for converting text-to-speech may be as follows. After items in a scene are identified by unit 110, the system may generate a string containing the text for each item in unit 112. As an example: "There is a cat and a bookcase in the room." The string of words may be tokenized into words, and each word may be divided into phonemes. The phonemes are sound samples roughly corresponding to a syllable. The phonemes may be concatenated together, along with silence breaks between each word/token of the string, and an audio file, such as a PCM (Pulse code modulated) file is generated of the DVS speech fragment. In some embodiments, a locale preference may be received to further customize the generated speech based on the locale. For example, Canadian English may pronounce some words slightly differently than American English. If a Canadian locale is selected, a different set of phonemes may be used to alter the pronunciation to be more in line with Canadian speech,” [0032]); 
identify a gap in the first audio portion during which the second audio portion is playable, wherein the gap is identified based on a duration of the second audio portion (see at least, “An example of a component, such as component 106, for detecting gaps in audio may be provided via a sound amplitude scheme which is configured to detect periods of silence within a movie/video asset. In embodiments, a normalized scale may be used, with a level of zero being complete silence, and a level of 100 being maximum volume. In embodiments, a predetermined threshold may be established for determining a period of silence. In embodiments, a predetermined duration may be established for 
adjust a timing of output of the second audio portion based on the identified gap (see at least, “The device 102 is configured to construct a database or table 116 in which the scenes, gaps in silence in the dialogue, and DVS audio fragments are indexed and stored with timecodes or the like. This information may be accessed by an alignment module 118 which receives the original video 100, aligns the DVS speech fragments with the main dialogue of the program 100 at appropriate times of eligible silence in the main dialogue within the relevant scene. A multiplexer 120 then mixes the DVS speech fragments with the main dialogue of the program 100 to produce a DVS track and thereby provides a modified video program 122 having a DVS track. Alternatively, the DVS track may be stored apart from the video program 100 and only accessed at a time, when desired,” [0033]); and 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the technique of automatic generation of descriptive video service tracks as disclosed by Kahn in the invention of Johnston thereby allowing for the advantage of descriptive audio for the visually impaired (see at least, “Descriptive Video Service (DVS) provides enhanced access to traditional television programs, movies, and other video programs for viewers who are blind or visually-impaired. The service provides descriptive narration of key visual elements visually appearing in a video. This extra narration is inserted for playback during natural pauses occurring in the original dialogue of the program to aid low-vision viewers in better understanding the story or subject matter of a program or scene within a video program,” Kahn [0001]) while also having the advantages Kahn provide e.g., “any of the above discussed embodiments may be used on existing or older programs such that many archives of older programs may be processed in a batch mode to quickly build a library of programs with DVS tracks to help sight-impaired viewers enjoy more television, movie, and like programming. Thus, the embodiments disclose automatic generation of DVS tracks, as opposed to a costly manual process. Because of the production effort required for a DVS track, the amount of content with available DVS has been limited. The embodiments discussed above provide easier, quicker, and less expensive manners of generating DVS tracks,” Kahn [0025] and “The newly generated audio which is descriptive of the video should not overlap or otherwise interfere with the original audio or dialogue of the program,” [0016].

Claim 2: Johnston and Kahn disclose the media control device according to claim 1, wherein the second audio portion includes the first audio portion (see at least, “For example, the first decoded audio portion can include a first verbal portion and a first non-verbal portion, and the second decoded audio portion can include a second verbal portion and a second non-verbal portion, where one of the second verbal portion, the second non-verbal portion, or both differ from a corresponding one of the first verbal portion, the first non-verbal portion, or both according to a hearing impairment. Other embodiments can be used in the subject disclosure,” Johnston [0093], “To the extent that the selection was received at 410, a second encoded audio portion is extracted at 412 and decoded at 414. The second decoded audio portion is forwarded to equipment of a second user in sync with the detected video at 416,” Johnston [0048], “Such preparations can be produced by adjusting or otherwise mixing different audio tracks accordingly. Alternatively or in addition, a separate dialog, distinct from an original program production, can be prepared or otherwise separately recorded,” Johnston [0050], A multiplexer 120 then mixes the DVS speech fragments with the main dialogue of the program 100 to produce a DVS track and thereby provides a modified video program 122 having a DVS track. Alternatively, the DVS Kahn [0033]).

Claim 3: Johnston and Kahn disclose the media control device according to claim 1, wherein the output, via the first output port, excludes the second audio portion (see at least, “For example, the first decoded audio portion can include a first verbal portion and a first non-verbal portion, and the second decoded audio portion can include a second verbal portion and a second non-verbal portion, where one of the second verbal portion, the second non-verbal portion, or both differ from a corresponding one of the first verbal portion, the first non-verbal portion, or both according to a hearing impairment. Other embodiments can be used in the subject disclosure,” Johnston [0093], e.g., “wherein the first decoded audio portion comprises a language and the second decoded audio portion comprises a different language, each of the language and the different language corresponding to the presentation of the decoded video portion,” Johnston claim 3).

Claim 4: Johnston and Kahn disclose the media control device according to claim 1, further comprising: an internal audio reproduction device connected to the first output port, wherein the circuitry is further configured to: control the internal audio reproduction device to output the first audio portion of the media content, via the first output port (see at least, “Similarly, a first audio portion 108a' of the same multimedia program is extracted from the first program stream 102' and processed, as required, for presentation at a shared sound system 124, such as a home theater audio entertainment system including one or more speakers,” Johnston [0021]); and wirelessly output each of the first audio portion and the second audio portion to an external audio reproduction device, via the second output port, wherein the output of each of the first audio portion and the second audio portion is based on the received user input (see at least, “The second audio portion determined by the media processor 116 can 
headphones via a wireless link 132,” Johnston [0024]).

Claim 5: Johnston and Kahn disclose the media control device according to claim 1, further comprising: an internal audio reproduction device connected to the first output port, wherein the circuitry is further configured to: control the internal audio reproduction device to output the first audio portion of the media content, via the first output port (see at least, “Similarly, a first audio portion 108a' of the same multimedia program is extracted from the first program stream 102' and processed, as required, for presentation at a shared sound system 124, such as a home theater audio entertainment system including one or more speakers,” Johnston [0021]); and output each of the first audio portion and the second audio portion to an external audio reproduction device, via the second output port, wherein the output of each of the first audio portion and the second audio portion is based on the received user input (see at least, “The second audio portion determined by the media processor 116 can be distributed to the headphones 130 directly via a cabled connection between the media processor 116 and the headphones. For example, a connector portion of cabled headphones 130 can be plugged into a corresponding connection portion of an audio interface of the media processor 116 providing the second audio portion. Alternatively or in addition, the second audio portion can be distributed to the headphones via a wireless link 132,” Johnston [0024]).

Claim 6: Johnston and Kahn disclose the media control device according to claim 1, wherein the circuitry is further configured to: wirelessly output the first audio portion to a first external audio reproduction Johnston [0036], “In some embodiments, the equipment of the first user 226a also includes a personal sound system, e.g., similar to the personal sound system 228 of the second user 226b. Multiple personal sound systems 228 can be included with or without a shared audio system 224. By way of non-limiting example, a system 200 having one or more personal audio systems 228 with or without a shared audio system 224 can be used in media presentations to large groups, e.g., in a projection theater, a class room, a business meeting, and the like,” Johnston [0031]).

Claim 7: Johnston and Kahn disclose the media control device according to claim 1, wherein the circuitry is further configured to: control the output of the first audio portion, via the first output port; and control the output of the first audio portion and the second audio portion, via the second output port, concurrently (see at least, “The decoded video portion is ultimately forwarded to the video display Johnston [0029]).

Claim 8: Johnston and Kahn disclose the media control device according to claim 1, further comprising: a plurality of output ports which include the second output port, wherein the circuitry is further configured to: receive a plurality of user inputs, wherein each of the plurality of user inputs is indicative of the disability information of a plurality of users; and control the plurality of output ports to output each of the first audio portion and the second audio portion based on the received plurality of user inputs (see at least, “In some embodiments, the equipment of the first user 226a also includes a personal sound system, e.g., similar to the personal sound system 228 of the second user 226b. Multiple personal sound systems 228 can be included with or without a shared audio system 224. By way of non-limiting example, a system 200 having one or more personal audio systems 228 with or without a shared audio system 224 can be used in media presentations to large groups, e.g., in a projection theater, a class room, a business meeting, and the like. Individual users or groups of users at a common video presentation can be presented with an alternative audio presentation without detracting from another alternative audio presentation and/or a default audio presentation,” Johnston [0031]).

Claim 9: Johnston and Kahn disclose the media control device according to claim 1, further comprising: a memory configured to store: the media content that includes the first audio portion and the first image portion (see at least, “A digitally encoded data stream 118,218,318 is received at 402. The digitally Johnston [0042]), and text information which describes the first image portion of the media content, wherein the circuitry is further configured to convert the stored text information into the second audio portion (see at least, “The video and metadata may also be provided to an object recognition unit 110 that recognizes the objects in each scene. This information may then be passed onto a descriptive text generator 112. The descriptive text may then be provided to a text-to-speech convertor 114,” Kahn [0031], “An example of a component for converting text-to-speech may be as follows. After items in a scene are identified by unit 110, the system may generate a string containing the text for each item in unit 112. As an example: "There is a cat and a bookcase in the room." The string of words may be tokenized into words, and each word may be divided into phonemes. The phonemes are sound samples roughly corresponding to a syllable. The phonemes may be concatenated together, along with silence breaks between each word/token of the string, and an audio file, such as a PCM (Pulse code modulated) file is generated of the DVS speech fragment. In some embodiments, a locale preference may be received to further customize the generated speech based on the locale. For example, Canadian English may pronounce some words slightly differently than American English. If a Canadian locale is selected, a different set of phonemes may be used to alter the pronunciation to be more in line with Canadian speech,” Kahn [0032], “The device 102 is configured to construct a database or table 116 in which the scenes, gaps in silence in the dialogue, and DVS audio fragments are indexed and stored with timecodes or the like. This information may be accessed by an alignment module 118 which receives the original video 100, aligns the DVS speech fragments with the main dialogue of the program 100 at appropriate times of eligible Kahn [0033]). 

Claim 11: Johnston and Kahn disclose the media control device according to claim 1, wherein the circuitry is further configured to synchronize the first audio portion and the image portion, the first audio portion is outputted via the first output port and the second output port, and the image portion is outputted via a display screen (see at least, “Reduction of timing errors, or audio-video synchronization, can be addressed by any one of various generally well understood techniques. By way of illustrative example, the media processor 316 optionally includes a synchronization control module 370, and three delay devices 372a, 372b, 372c. A first delay device 372a can be positioned between the first audio processor 358a and the program demultiplexer 352, as shown, to delay the default audio stream 308a'. Alternatively or in addition, the first delay device 372a can be positioned between the first audio processor 358a and the directional audio system 324 to delay the default audio content 344. In some embodiments, a first audio delay introduced by the first delay device 372 is fixed, e.g., according to a calibrated or otherwise determinable delay. Such delay can result from processing delay differences, e.g., between the video processor 356 and the first audio processor 358a. Alternatively, the first audio delay can be adjusted according to a control signal or similar command from the synchronization control module. Similar delays can be introduced in like manners to one or more of the video stream 306', the video content 342, the first alternate audio stream 308b or the first alternate audio content 346,” Johnston [0040]).

Claim 12: Johnston and Kahn disclose the media control device according to claim 1, wherein the circuitry is further configured to receive the user input from a visually impaired person, as the user (see at least, “In step 22 a modified program stream is constructed which contains the DVS track created as discussed above. The DVS track may be on a separate audio PID (i.e., Packet Identifier) such that it is only played and audible when a viewer wishes to hear it. For example, customer premise equipment (CPE) can be configured, if desired, to mix the DVS track with the main audio so that a sight-impaired viewer can hear both tracks simultaneously, i.e., the DVS track has audio that plays in the gaps of the main audio. In addition, the DVS audio track may be provided and encoded in different languages (i.e., English, Spanish, etc.); thus, the selection of a desired language may also be provided as an option to the viewer of the content,” Kahn [0019]. 

Claims 14 – 19 are substantially similar in scope to claims 1, 3, 7, and 4 – 6, respectfully, and therefore are rejected for the same reasons.

Claim 20 is substantially similar in scope to claim 1 and therefore is rejected for the same reasons (see also at least, “Yet another embodiment of the subject disclosure includes a machine-readable storage medium, including executable instructions which, responsive to being executed by a processor, cause the processor to facilitate performance of operations,” Johnston [0017]).

Claim 21: Johnston and Kahn disclose the media control device according to claim 1, wherein the content of the image portion includes at least one of a plurality of entities in the image portion, one of aesthetics or decor in the image portion, a scene in the image portion, a text in the scene, a title of the media content, environmental condition in the image portion, an emotion of a character in the image portion, one of physical attributes of the character, facial expressions of the character, or clothing of the Kahn [0002], “As discussed above, Descriptive Video Service (DVS) provides contextual information for television and like programming that is intended to benefit sight-impaired viewers of the program. A DVS track typically describes scenes and conveys information that may be difficult to infer solely from listening to the main audio track of the program,” Kahn [0010], “As an example, a scene of a video program may include a person silently reading a note. The text of the note may be visible in the frames of the video of the scene, but not necessarily read aloud as part of the main audio track. Thus, a sight-impaired viewer may not be able to fully appreciate the content of the note and thereby may not fully understand the significance of this part of the program. The DVS audio track alleviates this problem because the content of the note would be included in the audio (i.e., the audio would include a reading of the note) thereby permitting a sight-impaired viewer to be able to better follow along with the program and have a full appreciation of the content of the note. Of course, this provides only one example and any object or the like appearing in video may be subject to description,” Kahn [0011]).

Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Johnston and Kahn in view of Shintani et al. (US 2015/0078595 A1), hereinafter Shintani.

Claim 13: Johnston and Kahn disclose the media control device according to claim 1, further comprising: an image capturing device (see at least, “The UI 804 can further include an image sensor 813 such as a Johnston [0085]) and control the second output port to output the first audio portion and the second audio portion based on the determined disability information (see at least, “For example, the first decoded audio portion can include a first verbal portion and a first non-verbal portion, and the second decoded audio portion can include a second verbal portion and a second non-verbal portion, where one of the second verbal portion, the second non-verbal portion, or both differ from a corresponding one of the first verbal portion, the first non-verbal portion, or both according to a hearing impairment. Other embodiments can be used in the subject disclosure,” Johnston [0093]), but do not disclose wherein the circuitry is further configured to: control the image capturing device to capture an image of the user; determine the disability information of the user based on the captured image. However, Shintani discloses in regards to audio accessibility, circuitry configured to: control the image capturing device to capture an image of the user; determine the disability information of the user based on the captured image (see at least, “Accordingly, delivery of the audio to a listener can be tailored to the individual's hearing characteristics, and in conjunction with ultrasonic delivery, the individualized audio can be directed to an individual. Furthermore, the individual can be identified by a camera, using image recognition and then the tailored sound can be directed to the identified individual,” Shintani [0020], “In the preferred implementation, a camera or other image capture device is used to locate and identify listeners using facial recognition and stored listener profiles, and to spatially characterize each listener,” Shintani [0022]. “In order to customize the audio experience of each of the listeners, a profile can be established for each listener, and a default or guest profile can be provided for unrecognized listeners. The camera 24, by imaging the listening area, can be used to provide images that upon analysis can determine 1) the location of each listener, 2) the location of the head and ears of each listener, 3) recognize each registered and profiled listener, or assign the listener to be a guest, 4) to track movements of the listeners, 5) to note movements that are of significance to the listening experience in the listeners, and Shintani [0024]). It would have been obvious to utilize the audio accessibility features disclosed by Shintani in the invention of Johnston and Kahn since doing so has the advantage of providing “steps to try to improve the presentation of audio to a person who has a hearing disability,” Shintani [0002].

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSEPH SAUNDERS whose telephone number is (571)270-1063. The examiner can normally be reached Monday-Thursday, 9:00 a.m. - 4 p.m., EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on (571)272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOSEPH SAUNDERS JR/Primary Examiner, Art Unit 2652