PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 16/683,369
Filing Date: 14 Nov 2019
Appellant(s): AHMAD HAMBALY et al.
                       

__________________
Preetam B. Pagar
Attorney for Appellant(s)


EXAMINER’S ANSWER




This is in response to the appeal brief filed June 24, 2021


(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated January 11, 2021 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”


(2) Response to Argument

The appellant on page 6 of the Appeal Brief starts the arguments under section III, and for claim 1 argues first against Choe for the portion of limitation that is underlined in bold letters as following:
 Supplementing the audio communication to include portions of the description of the element that were not included in the audio communication from the sender when it is determined that the at least one participant of the talk group is not currently visually engaged.

The appellant is arguing that the limitations are not disclosed, but previously the examiner had provided mapping for each and every limitation, and had highlighted specific sections of the citations to emphasize the correlation; however, based on the arguments it appears that the appellant is interpreting very narrowly based on limitations that the examiner cannot read into the claim language from the specification. “supplementing the audio communication to include portions of the description of the a video file including embedded text and/or voice is created”. Then, as further cited “additional voice” READS on “supplementing audio” (more explained below). The text description and/or audio description that is generated describes the video scene, thus, it READS on (portion of the description of the element) as cited [0074] “The file may be created by a file generator configured to generate video files along with text and/or along with additional voice.  Please also note [0074] “The voice, if included, may be created using a text-to-speech system”; The words “may be” indicate that “text-to-speech” is one of the options to add voice, although the examiner due to lack of the limitation for “human captured voice” did not need to meet “a human voice”.  Additionally [0074] discloses "inserted in the video sequence based on the associated time. For example, the sentence, such as, "There is a possible failure to yield violation between 00:42 to 00:51," can be embedded as text and/or voice in frames that include video images that correspond to the possible failure to yield violation when it occurred". Thus, the supplemental audio (voice) may be generated based on “text-to-speech” in frames that include video images that correspond to portions of the description of the element”.   Please note that each supplemental audio (voice) is for each portions of the description of the element in specific video frames, and such audio is different from any other descriptive audio in previous frames, thus it meets the limitation “(supplemental the audio) …… were not included in the audio communication”, because the previous frames had different audio unrelated to the audio for the scene in the current frames of video.   Furthermore “the audio communication from the sender” in BRI is any audio (human voice or computer generated voice, or sound) and the sender in BRI could be a human or a computer that transmits the video/audio as detailed on page 6 of Final Office action using [0061, 95, 96] wherein a portable device / smart phone and server provide the communication and video / audio is transmitted, as following:
      [0061, 95, 96] a smart phone camera may capture the image and send it to the server which is in communication with one or more cameras; also a smart phone may receive the images and description (text and audio), wherein per [0061] description (text and audio) may be created at either the server or at the camera (if it has processing power, such as a smart phone camera). The receiving smart phone per [0095, 96] receives image, video and the description (text and audio) and it also may search for supplemental data. Thus, "intended to be sent to at least one participant of a talk group" is met by the smart phone receiving the image, video and description (text and audio); the "talk group" is met by a smart phone communication in a network [0095] "In one embodiment, the portable device is a smart phone or other hand-held device that can connect to a wireless network".

The appellant on page 8, argues the underlined bold section of the limitation “monitoring audio communication of a sender of the at least one image frame to identify a reference to an element of the plurality of elements”, against Kalampoukas and states that the abstract, [0012] and [0014] do not teach this limitation. First the appellant argues “The cited portion of [0012] Kalampoukas describes comparing a content element to a reference. Kalampoukas, at best, describes capturing a content element, which is audio, and then comparing it with a reference signal to determine if they are the same. (Kalampoukas, P [0014]). In other words, Kalampoukas describes capturing the content element itself, not a reference to the content element”. 
   The examiner respectfully disagrees, it is understood by one of ordinary skill in art, that the limitation “monitoring audio communication of a sender of the at least one image frame to identify a reference to an element of the plurality of elements” in BRI is interpreted as audio communication is monitored to identify audio reference words, which in  light of specification [0012] and Fig 2 without reading any additional feature into the claim language, it could be  words such as “1 dead body” Fig 2, element 217. Then the words (audio) as reference is compared to understand what it means. So far this content recognition for audio as appellant agrees and noted above is met by [0012] and [0014]. Furthermore, the action that is taken (not claimed) such as spec Fig 2, element 255 that is based on recognizing “1 dead body” the supplemental information “male on floor, has blood stains” could be generated based on image frame content recognition, which although not claimed could be met by [0012] which samples “audio” 
Then the appellant states “It is understood that rejections based on a combination of references cannot be overcome by attacking the references individually. However, the FOA has not alleged that the specific features described above were the result of the combination of the references, but rather that they were disclosed specifically by Choe and Kalampoukas respectively. If the Examiner believed such features are disclosed by the combination of the references, it is respectfully submitted that no explanation has been provided as to how the combination results in audio, which did not previously exist, being supplemented or detecting a reference to an element when the art appears to describe detecting the element itself”.
   This argument is not persuasive, because it relies on assumptions and conclusions by the appellant, it was noted in detail above, that audio description is supplemented per scene and related to the scene, so each audio description is supplemental to the audio in the previous frames of the video; thus the examiner disagrees with the argument stating “the audio, which did not previously exist”, only the limitations that are missing from the main reference are met in the combination. The argument “detecting a reference to an element when the art appears to describe detecting the element itself”, was explained by the examiner above and it is an incorrect conclusion by the appellant. The office action at the bottom of page 8 noted the limitations that Choe does not explicitly disclose, and on page 9 noted the limitation that Kalampoukas teaches, provided the citations and added a motivation to combine, thus 

The appellant on page 9, argues 2. Proposed Modification Renders Primary Reference Unsuitable for its Intended use, the appellant attacks Finkelstein [0098] and argues that eye gaze detection is insufficient to meet the claim language for “determination of if the user is visually engaged or not”. This argument is not persuasive because the specification para 31, defines “eye gaze detection” is one of the methods for “determination of if the user is visually engaged or not”.
Specification para [0031] in the example environment 100, remote recipient 112 may be categorized as visually engaged 118. Visually engaged means that remote recipient 112 is in possession of a communications device 113 that is capable of receiving video. Furthermore, being visually engaged also includes that the recipient is viewing the display screen. There are many known techniques to determine if a user is engaged with a display screen (e.g. eye gaze detection, screen interaction detection, device position detection, screen powered on detection, etc.). The techniques described herein are not dependent on any particular technique to determine if a user is viewing a display screen.

Then the appellant continues “Even if, for the sake of argument, it is assumed that Finkelstein does describe determining if the user is currently visually engaged, the proposed modification would render the primary reference, Choe, unsatisfactory for its intended use. At page 8-9 of the FOA, it is alleged that the combination would allow for the verbal description of the scene to be inserted when the user is not looking at the This argument is not persuasive, because,  the office action on page 8 stated “It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Choe as taught in Finkelstein to provide "eye gaze detection" for the limitations “determining that the at least one participant of the talk group is not currently visually engaged” and “when it is determined that the at least one participant of the talk group is not currently visually engaged”, for the purpose of being able to insert the additional (supplemental) voice/audio for the scene description generated by Choe in the regular video/audio timeline of the scene depending on the gaze of the participant viewer so that the participant viewer can benefit from additional audio descriptive of Choe at the best time when the participant viewer is not looking at the display, otherwise when the recipient is already watching the display he can see the image / video and hear the regular audio and it would be less beneficial to hear the additional audio narrative.  Thus, this is not the same as the appellant argues the office action has stated. 

The appellant continues, “However, one of the stated purposes of Choe is to use the text / audio description of the scene in order to create a searchable database. {Choe, P [0079]). In order to do this, the text description (and therefore the audio description since it is simply a text to speech conversion of the text description) must always be included, regardless of if someone is currently visually engaged or not”.  
   Respectfully, this argument is incorrect, because Choe [0074, 96] states that the text description and/or voice (audio) is needed for search. That is, “and/or” 

The appellant continues "Otherwise, according to the proposed combination, if a user was actually visually engaged with the image, the text description would not need to be generated.  Respectfully, this argument does not appear to have any factual basis, as it is a matter of opinion. For example many people activate subtitles (description) while watching a movie, news, etc. Thus, a user actually visually engaged with the image, may choose to have the text description, and its generation does not harm the combination. The combination for the visually engaged could create the supplemental text and/or voice (audio), and for the visually not engaged it could create supplemental voice (audio) via text to speech conversion. Therefore, the prima facie case of obviousness established by the combination of Choe as taught in Finkelstein is suitable for its Intended use as noted by the motivational statement.



For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,

/Sean N. Haiem/Examiner, Art Unit 2422                                                                                                                                                                                                        

Conferees:


/JOHN W MILLER/Supervisory Patent Examiner, Art Unit 2422                                                                                                                                                                                                        


Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.