DETAILED ACTION
This final rejection is responsive to the amendment filed 16 August 2021.  Claims 1-30 are pending.  Claims 1 and 22 are independent claims.  Claims 1 and 22 are amended.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Remarks
Applicant’s prior art arguments have been fully considered but they are not persuasive.
Applicant argues (pgs. 7-9) that the cited references do not teach key groups.
Examiner respectfully disagrees.  The instant specification defines key groups (¶[0033]) as any fraction of the writing or of the key frames.  Accordingly, Cutler teaches finding key frames that summarize the key points in a meeting, which is used for indexing (e.g. whiteboard content just prior to erasure).  The claims do not further specify further functionality of key groups in order to distinguish them over the teachings of Cutler. 
The foregoing applies to all independent claims and their dependent claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7 and 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Cutler (US 2004/0263636 A1) hereinafter known as Cutler in view of Zhang (US 2007/0156816 A1) hereinafter known as Zhang.

independent claim 1, Cutler teaches:
inputting a media stream of video and audio data of a presentation to a compute server; and  (Cutler: Fig. 2 and ¶[0054] and ¶[0062]; Cutler teaches a meeting room server receiving video and audio inputs of a presentation.)
performing a conversion of the media stream into a notetaking resource, the conversion comprising: detecting in the video data at least one of a writing surface and a displayed image;  (Cutler: Fig. 2 and ¶[0064]; Cutler teaches receiving images of the whiteboard camera which captures images.)
detecting in the video data writing on the at least one writing surface and displayed image;  (Cutler: Fig. 2 and ¶[0064], ¶[0068], ¶[0160]; Cutler teaches receiving images of the whiteboard camera which captures images consisting of writings on a whiteboard.)
...
identifying at least of key frame in the writing;  (Cutler: ¶[0017]-¶[0018] and ¶[0064] and ¶[0085] and ¶[0099]-¶[0100] and ¶[0131]; Cutler teaches processing key frames from the whiteboard camera.)
identifying at least one key group in the at least one key frames;  (Cutler: ¶[0141]; Cutler teaches finding key frames that summarize the key points in a meeting, which is used for indexing (e.g. whiteboard content just prior to erasure).  The foregoing is interpreted as a key group.)
associating a time stamp metadata to one or more elements of the at least one key frames and key groups;  (Cutler: ¶[0017]-¶[0018], ¶[0064], ¶[0085], ¶[0099]-¶[0100], ¶[0131], ¶[0139], and ¶[0159]; Cutler teaches processing key frames from the whiteboard camera along with timestamps for each pen stroke and key frame for the whiteboard images.)
time ordering the one or more elements of the at least one key frames and key groups; and  (Cutler: Fig. 10 and ¶[0168]-¶[0170]; Cutler teaches a UI with a timeline control wherein the keyframes are displayed along with timestamps.) 
generating a composite user interface with one or more panes for playing at least one of the video and audio data, and a pane for displaying the time ordered one or more elements of the (Cutler: Fig. 10 and ¶[0168]-¶[0170]; Cutler teaches a UI with a timeline control wherein the keyframes are displayed along with timestamps.  The UI also contains the video and audio data of the presentation.)

Cutler does not explicitly teach:
at least one of removing artifacts and enhancing the writing;  (Cutler: ¶[0099]; Cutler teaches the whiteboard camera output being fed into an image enhancer.

However, Zhang teaches:
at least one of removing artifacts and enhancing the writing;  (Zhang: ¶[0019]-¶[0020]; Zhang teaches white-balancing and color enhancement so that the remote participant can view the contents of the whiteboard better.)

Cutler and Zhang are in the same field of endeavor as the present invention, as the references are directed to capturing presentation content.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine a system of extracting audio and visual content of a presentation and displaying to the user in an interface as taught in Cutler with further enhancing the writing on the whiteboard through white-balancing and color enhancement as taught in Zhang.  Cutler already teaches capturing whiteboard writing.  However, Cutler does not explicitly teach further enhancing the writing on the whiteboard through white-balancing and color enhancement.  Zhang provides this additional functionality.  As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Cutler to include teachings of Zhang because the combination would allow the remote user to clearly see the whiteboard content, as suggested by Zhang: ¶[0020].






Regarding claim 2, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, at least one of converting the key frames into key groups and interspersing other key grouped media with the time ordered one or more elements.  (The instant specification defines key groups (¶[0033]) as any fraction of the writing or of the key frames.  Accordingly, Cutler: ¶[0064] and ¶[0085], ¶[0099]-¶[0100]; Cutler teaches the keyframes summarizing the key points of the contents written on the whiteboard and further performing pen stroke analysis.  Further, ¶[0171] teaches that clicking on a key frame thumbnail synchronizes all the audio and video streams to the time the key frame was captured.)







Regarding claim 3, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, during playback, in the user interface highlighting the time ordered one or more elements when a time stamp metadata of the matches a corresponding time in the at least one of the video and audio data.  (Cutler: ¶[0170]-¶[0171]; Cutler teaches displaying future pen strokes in gray and pen strokes in the past shown in their full color.)







Regarding claim 4, Cutler in view of Zhang teaches the method of claim 1 (as cited above):


further comprising, enabling the user, in the user interface to watch a user-selected time of the at least one of the video and audio data with a matching time ordered one or more elements, or conversely a user-selected time ordered one or more elements with a matching time of the at least one of the video and audio data.  (Cutler: Fig. 10 and ¶[0168]-¶[0170]; Cutler teaches a UI with a timeline control wherein the keyframes are displayed along with timestamps.  The UI also contains the video and audio data of the presentation.)







Regarding claim 5, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
wherein an arrangement of the time ordered one or more elements in a pane is altered from an original arrangement in shown in the video data.  (Cutler: ¶[0170]-¶[0171]; Cutler teaches the user being able to select a keyframe and synchronizing all audio and video streams to the time the key frame was captured.)







Regarding claim 6, Cutler in view of Zhang teaches the method of claim 5 (as cited above):

Cutler further teaches:
wherein the arrangement is for improved readability or to match a display format.  (Cutler: ¶[0170]-¶[0171]; Cutler teaches the keyframes being displayed in a scrollable list.)







Regarding claim 7, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, detecting a presenter's speech in the audio data and time matching the presenter's speech with corresponding time ordered one or more elements, and providing a synchronous playback of the presenter's speech.  (Cutler: ¶[0170]-¶[0171]; Cutler teaches the user being able to select a keyframe and synchronizing all audio and video streams to the time the key frame was captured.  Moreover, ¶[0068] and ¶[0143] teach using speech recognition to transcribe the audio portions of the meeting.)







Regarding claim 10, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further including adding links in the notetaking resource to external non-presentation provided information.  (Cutler: ¶[0171]-¶[0173]; Cutler teaches the user being able to select pen strokes, keyframes, and speakers in the interface.)







Regarding claim 11, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, adding visible annotators in the displayed panes, to allow the user to control at least one of zoom, fast forward, reverse, scroll down, scroll up, page up, page down, collapse, open, skip, volume, time forward, and time back.  (Cutler: Fig. 10; Cutler teaches various controls on the interface that allow the user to play, reverse, fast forward, etc...)







Regarding claim 12, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, detecting in the video data a presenter and tracking at least one of a movement, gesture, hand position, arm position, direction of writing of the presenter.  (Cutler: ¶[0096]-¶[0097]; Cutler teaches tracking the presenter using motion.)







Regarding claim 13, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, at least one of altering an appearance or visibility of one or persons in the video data pane, modifying a background, and enhancing the writing is via denoising.  (Cutler: ¶[0096]-¶[0097]; Cutler teaches The presenter view camera's virtual cameraman 542 tracks the presenter using motion and shape and outputs a video stream of the head and torso. It essentially emulates a PTZ camera and person tracker with a fixed single high resolution camera; a smaller cropped view of the high resolution video is output of the virtual cameraman for the presenter view camera.)







Regarding claim 14, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, distributing the notetaking resource to a user.  (Cutler: Fig. 10; Cutler teaches displaying an interface with all the audio and video data of the presentation.)




Regarding claim 15, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, at least one of storing the notetaking resource in a distribution server located on a cloud and dynamically compressing the video data in the event of a communication disruption.  (Cutler: ¶[0161]-¶[0163]; Cutler teaches a DM archive server which streams recorded meetings to remote clients and also compresses the data streams so that if the network connection becomes degraded, video quality degrades before audio quality.)




Regarding claim 16, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising, generating the notetaking resource in realtime from a live presentation.  (Cutler: ¶[0054], ¶[0056], and ¶[0085]; Cutler teaches the DM system is a real-time communication and recording system for live meetings.)




Regarding claim 17, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising: recording the presentation video via one or more cameras situated in a presentation room; recording the presentation audio via one or more microphones situated in the presentation room; merging the presentation video and audio into the media stream; and outputting the media stream.  (Cutler: Fig. 2; Cutler teaches combining inputs of all the cameras and the microphones and outputting the media stream.)




Regarding claim 18, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
wherein the displayed image is either a projected image or and image from an image displaying device.  (Cutler: Fig. 2 and ¶[0015] and ¶[0054]-¶[0056]; Cutler teaches capturing images from cameras, e.g. whiteboard camera.)




Regarding claim 19, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
further comprising a presentation auto start detection.  (Cutler: ¶[0068]; Cutler teaches automatically detecting a person/presenter entering the room.)




Regarding claim 20, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches:
wherein the detected writing includes performing at least one of writing edge, ridge, line, stroke detection, and OCR.  (Cutler: ¶[0131]; Cutler teaches pen stroke detection.)




Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Cutler (US 2004/0263636 A1) hereinafter known as Cutler in view of Zhang in view of Ochshorn (US 2020/0126559 A1) hereinafter known a Ochshorn.

Regarding claim 8, Cutler in view of Zhang teaches the method of claim 7 (as cited above):

Cutler further teaches:
further comprising, generating from the presenter's speech a transcript and time matching the transcript with corresponding time ordered one or more elements, and ...  (Cutler: ¶[0170]-¶[0171]; Cutler teaches the user being able to select a keyframe and synchronizing all audio and video streams to the time the key frame was captured.  Moreover, ¶[0068] and ¶[0143] teach using speech recognition to transcribe the audio portions of the meeting.)

Cutler does not explicitly teach: ... providing a transcript pane with synchronous highlighting of words in the transcript during playback.
	
However, Ochshorn does teach: ... providing a transcript pane with synchronous highlighting of words in the transcript during playback.  (Ochshorn: Fig. 17 and ¶[0102]; Ochshorn teaches highlighting the words in the transcript synchronously with playback.)
Cutler and Ochshorn are in the same field of endeavor as the present invention, as the references are directed to extracting presentation content.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine a system of extracting audio and visual content of a presentation and displaying to the user in an interface wherein the system further transcribes the audio portions of the meetings as taught in Cutler with highlighting the words in the transcript synchronously with playback as taught in Ochshorn.  Cutler already teaches transcribing the audio content.  However, Cutler does not explicitly teach highlighting the words in the transcript synchronously with playback.  Ochshorn provides this additional functionality.  As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Cutler to include teachings of Ochshorn because the combination would allow the user to clearly follow the presentation.




Regarding claim 9, Cutler in view of Zhang in view of Ochshorn teaches the method of claim 8 (as cited above):

Cutler further teaches:
further comprising a word or topic search capability.  (Cutler: ¶[0143]; Cutler teaches key word searches to search the transcript.)




Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Cutler in view of Zhang  in view of Lahade (US 2004/0239640 A1) hereinafter known as Lahade. 

Regarding claim 21, Cutler in view of Zhang teaches the method of claim 1 (as cited above):

Cutler further teaches: 
further comprising detecting a writing surface ...  (Cutler: Fig. 2 and ¶[0064]; Cutler teaches receiving images of the whiteboard camera which captures images.)

However, Lahade teaches:
... with a sliding board.  (Lahade: ¶[0034]-¶[0035]; Lahade teaches a moving whiteboard.)
Lahade is analogous to the present invention, since it is reasonably pertinent to the problem faced by the inventor, i.e. using a whiteboard for presentations.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine a system of extracting audio and visual content of a presentation from a whiteboard and displaying to the user in an interface as taught in Cutler with the whiteboard being movable as taught in Lahade.  Cutler already teaches using a whiteboard for the presentatons.  However, Cutler does not explicitly teach a movable whiteboard.  Lahade provides this additional functionality.  As such, it would have been obvious to one of ordinary skill in the art to modify the .




Claims 22-30 are rejected under 35 U.S.C. 103 as being unpatentable over Cutler (US 2004/0263636 A1) hereinafter known as Cutler in view of Merril (US 2010/0328465 A1) hereinafter known as Merril.

Regarding claim 22, Cutler teaches:
a compute server with software modules to convert an input media stream into a notetaking resource, comprising: a writing surface analysis system, detecting a writing surface and ... from the media stream of writing on the writing surface and images displayed, and indexing detected text, wherein the detected text is organized into key frames and into key groups identified from the key frames, having associated time stamp metadata; and  (Cutler: Fig. 2 and ¶[0017]-¶[0018], ¶[0064], ¶[0085], ¶[0099]-¶[0100], ¶[0131], ¶[0139], and ¶[0159]; Cutler teaches processing key frames from the whiteboard camera along with timestamps for each pen stroke and key frame for the whiteboard images.  ¶[0141] further teaches finding key frames that summarize the key points in a meeting, which is used for indexing (e.g. whiteboard content just prior to erasure).  The foregoing is interpreted as a key group.)
a composite user interface with one or more panes for displaying one or more text and the media stream, the text and media stream being played in a time ordered manner.  (Cutler: Fig. 10 and ¶[0168]-¶[0170]; Cutler teaches a UI with a timeline control wherein the keyframes are displayed along with timestamps.  The UI also contains the video and audio data of the presentation.)



However, Merril does teach extracting text from slides during a presentation. (Merril: ¶[0043)

Cutler and Merril are in the same field of endeavor as the present invention, as the references are directed to extracting presentation content.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine a system of extracting audio and visual content of a presentation and displaying to the user in an interface wherein the content comprises pen strokes on a whiteboard as taught in Cutler and the pen strokes consisting of text as taught in Merril.  Cutler already teaches extracting pen strokes on a whiteboard.  However, Cutler does not explicitly teach extracting text.  Merril provides this additional functionality.  As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Cutler to include teachings of Merril because the combination would allow being able to view all the presentation’s content.




Regarding claim 23, Cutler in view of Merril further teaches the system of claim 22 (as cited above).

Cutler further teaches:
further comprising, a digital media analysis system, ... , analyzing, and indexing digital media elements, wherein the extracted text is also organized into at least one of key frames and key groups, having an associated time stamp metadata.  (Cutler: Figs. 2 and 10 and ¶[0017]-¶[0018], ¶[0064], ¶[0085], ¶[0099]-¶[0100], ¶[0131], ¶[0139], and ¶[0159]; Cutler teaches processing key frames from the whiteboard camera along with timestamps for each pen stroke and key frame for the whiteboard images.)

Cutler does not explicitly teach: detecting viewed transitions, extracting text.  Further, while Cutler does teach extracting pen strokes and organizing the keyframes using timestamp metadata, Cutler does not explicitly teach that the pen strokes are text.

However, Merril teaches detecting transitions (¶[0152]) and extracting text (¶[0043]).




Regarding claim 24, Cutler in view of Merril further teaches the system of claim 22 (as cited above).

Cutler further teaches:
further comprising, a room analysis system, detecting and indexing viewed room elements.  (Cutler: Fig. 2 and ¶[0096]-¶[0097]; Cutler teaches tracking the presenter using motion.  Further, Cutler also teaches combining all the input data from all the elements in the room, such as the cameras and microphones and combining it into an indexed interface.)




Regarding claim 25, Cutler in view of Merril further teaches the system of claim 22 (as cited above).


further comprising, a human(s) analysis system, detecting, tracking, and indexing viewed person(s) elements.  (Cutler: Fig. 2 and ¶[0096]-¶[0097]; Cutler teaches tracking the presenter using motion.)




Regarding claim 26, Cutler in view of Merril further teaches the system of claim 25 (as cited above).

Cutler further teaches:
wherein a pane of the user interface includes a time synchronous display of one or more indexed viewed person(s) elements.  (Cutler: Figs. 2 and 10 and ¶[0096]-¶[0097] and ¶[0169]; Cutler teaches tracking the presenter using motion and outputs a video stream of the head and torso.)




Regarding claim 27, Cutler in view of Merril further teaches the system of claim 22 (as cited above).

Cutler further teaches:
further comprising, a voice analysis system, detecting human voice, generating speech-to-text transcription, detecting important phrases, and indexing speech elements, wherein a pane of the user interface includes a time synchronous display of the transcription.  (Cutler: ¶[0018] and ¶[0143]; Cutler teaches speech recognition to transcribe the audio of the meeting so that the transcript is searchable by the user.)

In addition, Merril teaches enabling text search of the presentations (Fig. 9 and ¶[0170]-¶[0171] and ¶[0190]).




Regarding claim 28, Cutler in view of Merril further teaches the system of claim 22 (as cited above).

Cutler further teaches:
further comprising, a distribution server, providing a combined image of indexed viewed writing elements and indexed digital media elements to a user's device.  (Cutler: ¶[0161]-¶[0163]; Cutler teaches a DM archive server which streams recorded meetings to remote clients.)




Regarding claim 29, Cutler in view of Merril further teaches the system of claim 22 (as cited above).

Cutler further teaches:
further comprising, a video+audio muxer joining video and audio data to form the media stream.  (Cutler: Fig. 2 and ¶[0062]-¶[0068]; Cutler teaches combining audio and video signals of the presentation.)




Regarding claim 30, Cutler in view of Merril further teaches the system of claim 22 (as cited above).

Cutler further teaches:
further comprising, a microphone device, video camera device, and display device, the devices providing input data for the video and audio data.  (Cutler: Fig. 2 and ¶[0062]-¶[0068] and ¶[0082]; Cutler teaches combining audio and video signals of the presentation.  Specifically, microphones, cameras, and display/monitor.)




Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.  Applicants are required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEX OLSHANNIKOV whose telephone number is (571)270-0667.  The examiner can normally be reached on M-F 9:30-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on 571-272-3644.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/ALEKSEY OLSHANNIKOV/Primary Examiner, Art Unit 2145