DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 32-42 and 55-67 are rejected under 35 U.S.C. 103 as being unpatentable over Kurtz et al. (US Pub No. 20080298571 A1) in view of Karvounis et al. (US Pub No. 20150304634 A1). 

Regarding Claim 32,

	Kurtz discloses A method for identifying, predicting, or controlling a characteristic of an object, said method comprising: obtaining a video of said object; (Kurtz, Abstract, discloses A video communication system and method for operating a video communication system are provided. The video communication system has a video communication device, having an image display device and at least one image capture device, wherein the at least one image capture device acquires video images of a local environment and an individual therein, according to defined video capture settings, an audio system having an audio emission device and an audio capture device; and a computer operable to interact with a contextual interface, a privacy interface, an image processor, and a communication controller to enable a communication event including at least one video scene in which outgoing video images are sent to a remote site. Wherein the contextual interface includes scene analysis algorithms for identifying potential scene transitions and capture management algorithms for providing changes in video capture settings appropriate to any identified scene transitions; and wherein the privacy interface provides privacy settings to control the capture, transmission, display, or recording of video image content from the local environment; video images of object are obtained)

using an artificial intelligence (AI) algorithm to process said video to generate an output indicative of said characteristic of said object; (Kurtz, [0104], discloses the previously discussed contextual interface 450 includes an intelligent agent or artificial intelligence (AI) or set of algorithms that adaptively responds (and perhaps anticipates) user activities, and modifies the video capture process to improve the video experience. Contextual interface 450 can also be a learning system, that progressively gains understanding of user activities and communication needs. These algorithms, and the supporting system data that enables their operation, are outlined in Table 3; AI algorithms process the video images and outputs users(object) activities(characteristics)) and 


presenting said output of said video on a user interface of an electronic device of a user, wherein said augmented derivative of said video is usable to explain said output of said AI algorithm. (Kurtz, [0104], discloses the previously discussed contextual interface 450 includes an intelligent agent or artificial intelligence (AI) or set of algorithms that adaptively responds (and perhaps anticipates) user activities, and modifies the video capture process to improve the video experience. Contextual interface 450 can also be a learning system, that progressively gains understanding of user activities and communication needs. These algorithms, and the supporting system data that enables their operation, are outlined in Table 3. As shown in FIGS. 7A and 7B, the contextual interface 450 includes the transition test 630, the transition process 640, scene capture analysis 655, and scene capture management 650, which are all aspects or manifestations of this system intelligence. FIGS. 7B and 7C expand upon the operational process steps for a communication event 600 depicted in FIG. 7A, showing greater details regarding scene capture management 650 and the transition process 640. As shown in FIG. 7B, during a video scene 620, the device 300 can perform communication event analysis 655 and transition testing 630, for example as parallel path activities. Both the communication event analysis 655 and the transition test 630 are scene analysis algorithms (intra-scene analysis and inter-scene analysis respectively) tasked with assessing data directly derived from the video data streams (and audio) and identifying the significance of detected changes in scene content and context. The analysis approach is multivariate, as the scene analysis algorithms evaluate video scenes using a combination of scene content metrics, scene analysis rules, contextual cues, and statistical measures. Likewise, the scene capture management algorithm (650) and capture transition algorithm (FIG. 7C, step 644) are video capture scene adjustment algorithms tasked with modifying ongoing video capture by adjusting the defined video capture settings. FIG. 7D also expands upon the operational flow of system activities for a communication event 600 shown in FIG. 7A, but with emphasis on the interaction and data flow (including transmission of video and audio signals) exchanged between two devices 300 of a video communication system 290 across the video communication link represented by network 365; AI algorithm outputs details of motion or movements of users in captured video images)

Kurtz does not explicitly disclose generating an augmented derivative of said object by augmenting one or more features of said video; presenting augmented derivative of said video

Karvounis discloses generating an augmented derivative of said object by augmenting one or more features of said video; presenting augmented derivative of said video (Karvounis, [0152] HAR-SLAM has two main stages, the first is Forgetful SLAM, and the second is a rippling update. Forgetful SLAM has to be modified to include a link between forgotten landmarks and past poses. This is done in Table 13, where removed landmarks are correlated only to the past pose. These correlations are important for updating landmarks once the pose changes, and for revising landmarks to understand how to modify the past pose. Past poses have correlations to each other forming a chain; when a pose is modified, an update routine modifies all connecting neighbors. The rippling update is simply the propagation of updates to every connected pose and landmark in a directed graph. The correlation algorithm is shown in Table 15, where an augmented state is updated and the information matrix between the past and current pose is extracted. It is important to augment the system so the newly observed landmarks can be properly associated with the current pose and at the same time update the past pose correlation; selected landmark (features) of pose (object) in video are augmented and displayed using user interface devices as augmented derivative) 

Kurtz discloses the claimed invention except for the augmented derivative of features of object in video image. Karvounis teaches that it is known to augmented features of objects such as pose landmarks to be displayed to user of obtained object features in video images using artificial intelligence algorithms. It would have been obvious to one having ordinary skill in the art at the time the invention was made to (use modification in Kurtz), as taught by Karvounis in order to selectively display the augmented features to users of objects in video images to aid in user for applications including tracking of motion players in sport events and augmenting the selected movements for analysis to the sport event analysts.  

Regarding Claim 33,
	The combination of Kurtz and Karvounis further discloses Amplifying (Kurtz, [0046], discloses audio processor 325 or computer 340 can be used alone or in combination to provide enhancements including amplification, filtering, modulation or any other known enhancements; amplification is applied) said video. (Karvounis, [0152] HAR-SLAM has two main stages, the first is Forgetful SLAM, and the second is a rippling update. Forgetful SLAM has to be modified to include a link between forgotten landmarks and past poses. This is done in Table 13, where removed landmarks are correlated only to the past pose. These correlations are important for updating landmarks once the pose changes, and for revising landmarks to understand how to modify the past pose. Past poses have correlations to each other forming a chain; when a pose is modified, an update routine modifies all connecting neighbors. The rippling update is simply the propagation of updates to every connected pose and landmark in a directed graph. The correlation algorithm is shown in Table 15, where an augmented state is updated and the information matrix between the past and current pose is extracted. It is important to augment the system so the newly observed landmarks can be properly associated with the current pose and at the same time update the past pose correlation; selected landmark (features) of pose (object) in video are augmented and displayed using user interface devices as augmented derivative). Additionally, the rational and motivation applied to rejection of claim 32 apply to this claim. 

Regarding Claim 34, 
The combination of Kurtz and Karvounis further discloses wherein said characteristic of said object is an operational or structural defect. (Kurtz, [0014], [0125], discloses analyzes these cues to predict, and then pro-actively make a seamless transition in shifting the video-capture from a first speaker to a second speaker. These behavioral cues include acoustic cues (such as intonation patterns, pitch and loudness), visual cues (such as gaze, facial pose, body postures, hand gestures and facial expressions), or combinations of the foregoing, which are typically associated with an event; virtual environment images can be still images or video and the images can be stored as a library of virtual environment images in the device or obtained from other sites over a network. It can also be anticipated that some users 10 may potentially also desire that the appearance altering interface 490 have capabilities to alter personal appearance, relative to their skin, hair, clothing, or other aspects. For example, a user 10 may have the video communication device 300; through the appearance altering interface 490 of the contextual interface 450, for cosmetic reasons, change the appearance of their face 25, hair 40, or color of clothes. In such instances, it can be useful to use a reference image 460 of the user 10, in addition to current images of the same user 10, to create these effects. A comparable process can also be provided for altering the voice characteristics of the users 10; facial features are processed to identify quality of features and determine of any defect in said features including hair, color, skin). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 35,
	The combination of Kurtz and Karvounis further discloses wherein said characteristic is a future state of said object, or a characteristic correlated with other objects, sensors, data sources, processes control systems, or actuators. (Kurtz, [0014], discloses analyzes these cues to predict, and then pro-actively make a seamless transition in shifting the video-capture from a first speaker to a second speaker. These behavioral cues include acoustic cues (such as intonation patterns, pitch and loudness), visual cues (such as gaze, facial pose, body postures, hand gestures and facial expressions), or combinations of the foregoing, which are typically associated with an event; facial pose or body postures of speakers are processed to process future pose). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim. 

Regarding Claim 36, 
The combination of Kurtz and Karvounis further discloses enabling said user to perform an action through said user interface if said output is indicative of a suboptimal future state of said object. (Kurtz, [0014], discloses analyzes these cues to predict, and then pro-actively make a seamless transition in shifting the video-capture from a first speaker to a second speaker. These behavioral cues include acoustic cues (such as intonation patterns, pitch and loudness), visual cues (such as gaze, facial pose, body postures, hand gestures and facial expressions), or combinations of the foregoing, which are typically associated with an event; facial pose or body postures of speakers are processed to predict (prediction is estimation and therefore suboptimal level) future pose). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 37,
The combination of Kurtz and Karvounis further discloses obtaining additional video of said object; (Kurtz, [0131], discloses These additional image capture devices 120 or devices 100 can be networked together, and enable the device 300 to capture a more expanded field of view that users may move around in. They also can enable enhanced imaging by acquiring images from perspectives that may be too limited if the image capture devices 120 are positioned solely at the electronic imaging device 100. The image processor 320 can then generate an enhanced composite image; additional video images are captured)
using said AI algorithm to process said additional video to generate an additional output about said object; (Kurtz, [0104], discloses the previously discussed contextual interface 450 includes an intelligent agent or artificial intelligence (AI) or set of algorithms that adaptively responds (and perhaps anticipates) user activities, and modifies the video capture process to improve the video experience. Contextual interface 450 can also be a learning system, that progressively gains understanding of user activities and communication needs. These algorithms, and the supporting system data that enables their operation, are outlined in Table 3. As shown in FIGS. 7A and 7B, the contextual interface 450 includes the transition test 630, the transition process 640, scene capture analysis 655, and scene capture management 650, which are all aspects or manifestations of this system intelligence. FIGS. 7B and 7C expand upon the operational process steps for a communication event 600 depicted in FIG. 7A, showing greater details regarding scene capture management 650 and the transition process 640. As shown in FIG. 7B, during a video scene 620, the device 300 can perform communication event analysis 655 and transition testing 630, for example as parallel path activities; AI algorithms process additional video image data as input) and 

presenting said additional output and an augmented derivative of said additional video on user interface. (Kurtz, [0157], discloses a video communication system 300 has been described as a system that generates video imagery (basically the picture portion of a television signal) and the accompanying audio. It should be understood the system can also use digital still cameras or image processing to extract still images from a video stream. As an example, a key frame extraction algorithm that identifies the video frames that have the best composition and facial expressions can be used to create still images from the video output of system 290. The system 290 or device 300 can also generate metadata, including semantic data, which is stored with (or linked to) the image data, whether still or video. This metadata can include information such as the date, the identities of the local and remote participants, type of event data, key words extracted via voice recognition software, privacy settings for the communication event, and annotations or titles entered by users. This metadata can be useful in the archiving and recall of the video, still image, or audio data generated by the device 300 or system 290; video image processed result is output by the system). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.


Regarding Claim 38,
	The combination of Kurtz and Karvounis further discloses using augmented reality, streamed video in real-time, or characteristics selected via an interactive database query. (Karvounis, [0152], discloses HAR-SLAM has two main stages, the first is Forgetful SLAM, and the second is a rippling update. Forgetful SLAM has to be modified to include a link between forgotten landmarks and past poses. This is done in Table 13, where removed landmarks are correlated only to the past pose. These correlations are important for updating landmarks once the pose changes, and for revising landmarks to understand how to modify the past pose. Past poses have correlations to each other forming a chain; when a pose is modified, an update routine modifies all connecting neighbors. The rippling update is simply the propagation of updates to every connected pose and landmark in a directed graph. The correlation algorithm is shown in Table 15, where an augmented state is updated and the information matrix between the past and current pose is extracted. It is important to augment the system so the newly observed landmarks can be properly associated with the current pose and at the same time update the past pose correlation; selected landmark (features) of pose (object) in video are augmented and displayed using user interface devices as augmented derivative). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 39, 
The combination of Kurtz and Karvounis further discloses wherein said characteristic of said object comprises a spatial or temporal feature of said object. (Kurtz, [0014], discloses analyzes these cues to predict, and then pro-actively make a seamless transition in shifting the video-capture from a first speaker to a second speaker. These behavioral cues include acoustic cues (such as intonation patterns, pitch and loudness), visual cues (such as gaze, facial pose, body postures, hand gestures and facial expressions), or combinations of the foregoing, which are typically associated with an event; facial pose or body postures of speakers are processed to process future pose; facial motion is temporal feature varies with time in sequential frames). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 40, 
The combination of Kurtz and Karvounis further discloses wherein said spatial or temporal feature comprises a vibration or movement of said object. (Kurtz, [0049] FIG. 3B, discloses One subsystem therein is the image capture system 310, which includes image capture devices 120 and image processor 320. Another subsystem is the audio system, which includes microphones 144, speakers 125, and an audio processor 325. The computer 340 is operatively linked to the image capture system 310, image processor 320, the audio system and audio processor 325, and the system controller 330, as is shown by the dashed lines. While the dashed lines indicate a variety of other important interconnects (wired or wireless) within the video communications system 300, the illustration of interconnects is merely representative, and numerous interconnects that are not shown will be needed to support various power leads, internal signals, and data paths. The computer 340 also is linked to a user tracking process 480, which can be an algorithm operated within the computer 340, using motion detection data acquired from a motion detector 142. Likewise, the computer 340 can access a user identification process 470, which again can be an algorithm operated within the computer 340. Similarly, the computer can access a gaze adaptive process 495, which can include both a gaze correction process and a gaze tracking process (or algorithms); motion (movement) of video frames are processed and detected). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 41, 
The combination of Kurtz and Karvounis further discloses wherein said vibration or movement is imperceptible to the naked eye. (Karvounis, [0086], discloses a method is described for managing objects in a multi-dimensional space when the objects are no longer visible in a collection of images captured from the multi-dimensional space over time. The method comprises the operations of identifying one or more objects visible in a first image among the collection of images, adding the identified objects to a first list of tracked identified objects, tracking the identified objects from the first list of objects in one or more subsequent images among the collection of images, the one or more subsequent images being captured after capturing the first image, determining whether the tracked identified objects are absent in the one or more subsequent images, and removing, based on the determination, the tracked identified objects from the first list; objects are not visible in some set of images and therefore not visible to naked eye). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 42, 
The combination of Kurtz and Karvounis further discloses wherein said spatial or temporal feature comprises a color change of said object.  (Kurtz, [0127], discloses light source spectrum or model can also be compared to prior spectral data and color correction data that could be maintained and updated for capture from a given electronic imaging device 100. The reference images 460 can also be used as targets for providing acceptable image quality, by adjusting the current color values towards the expectation color values present in these images. Color changes can be tracked with an appropriate color space model, such as CIELAB; color change is determined in subject of video frames). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 55, 
The combination of Kurtz and Karvounis further discloses wherein amplifying said video comprises processing said video using one or more of video acceleration magnification or Eulerian video magnification. (Kurtz, [0112], discloses scene capture management 650 receives the identified change data from communication event analysis 655, and any associated intra-scene adjustment confidence values, and then applies a capture transition algorithm to determine how intra-scene video capture and processing scene adjustments will be made by the device 300. This algorithm includes a set of scene adjustment rules, based upon factors including event classification, privacy settings, temporal issues (rate and frequency of the capture changes compared to the rate of change of the local activities and the remote viewers perception of change), the magnitude of the changes, or intra-scene adjustment confidence values. For example, as the local user 10a of video scene 620 of FIG. 4C moves in his chair, scene capture management 650 can cause the device 300 to make changes in the capture FOV 420 and image focus over the space of a few frame times or many seconds, depending on what level of remote user awareness is desirable for a given change. As another example, rules based weighting factors can prioritize changes in FOV 420 and focus during a lock and track event to occur prior to other changes in image quality (color) or gaze correction; magnitude of video images and their changes are tracked and amplified or modulated according to the user requirements). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 56, 
The combination of Kurtz and Karvounis further discloses wherein (c) comprises processing said video using a phase-based motion estimation algorithm or an object edge tracking algorithm. (Karvounis, [0208], discloses software framework, the ASL Framework, was created at the Autonomous Systems Lab. This framework was written in C# and works with all Windows Platforms. The ASL Framework makes use of several third-party libraries. Within this framework is an image processing suite based on Intel's OpenCV library. For use in C#, a wrapper library called EMGU is used; Image processing capabilities include color filters, edge detectors, blurring, corner detection with Harris corners, face tracking, and feature tracking with Lucas-Kanade Optic Flow; edge tracking algorithms are utilized). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 57, 
The combination of Kurtz and Karvounis further discloses wherein (c) comprises selectively filtering one or more frequencies in said video. (Karvounis, [0258], discloses there are two more synchronizer functional units. The vision and inertial synchronizer matches feature packets from LK-SURF to inertial data from the TRX INU. A matching routine similar to the stereo synchronizer is used to pair packets. The paired packets are sent to the last synchronizer. The last synchronizer combines actuator data from the Xbox controller, integrated encoder data from the encoder integrator, and the pair of inertial and visual data from the previous synchronizer. The wheel encoder data is up sampled to match the frequency of the visual and inertial data packet. The INU, Xbox controller, and cameras operate at 20 Hz, but the wheel encoders operate at 15 Hz. Using the integrated wheel encoder results, the encoder information is up sampled using linear interpolation to match the 20 Hz signal. The final synchronizer sends a single packet, containing all the incoming data types, to the Forgetful SLAM functional unit; frequencies of incoming video frames are filtered (up sampled)). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 58, 
The combination of Kurtz and Karvounis further discloses wherein (c) comprises decomposing said video into a plurality of different spatial scales and orientations and processing each of said plurality of different spatial scales and orientations using a computer vision or machine learning algorithm. [0129] It is also noted that a residence may have multiple electronic imaging devices 100, with multiple displays 110 and cameras 120, linked in an internal network 360, as part of local video communications device 300. The multiple electronic imaging devices 100 can be used either simultaneously (such as multiple users 10) or sequentially (such as room to room) during a communication event 600. For example, as a user 10 moves from one room with an electronic imaging device 100 to another, the video capture of a communication event can track and follow the change in activity from room to room. A video capture mode with this activity following function can be either automatic or manually controlled, presumably by a local user. It can also be expected that users 10 may provide different privacy settings for different rooms (local environments 415) in their residence, which can effect how the device 300 responds when following activity from room to room. While a networked electronic imaging device 100 may be in use for a given communication event, that does not mean that electronic imaging devices 100 in other rooms are likewise on, and capturing or transmitting audio or video data. However, if this is occurring, the local displays 110 can show multiple split screen images 410 depicting image capture in each of the local environments 415. The contextual interface 450 can also apply video context knowledge of activity or event type, user classification, or user identity, as well as remote viewer identity or classification, to determine which captured content is captured and transmitted; video frames are split or divided into different sets to accommodate all scene information of object within in order to obtain orientations and angles of objects moving in the video frames). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 59, 
The combination of Kurtz and Karvounis further discloses wherein (c) comprises identifying a region of interest in said video and performing temporal analysis on said region of interest. (Karvounis, [0069], discloses LK-SURF is an image processing technique that combines Lucas-Kanade feature tracking with Speeded-Up Robust Features to perform spatial and temporal tracking. Typical stereo correspondence techniques fail at providing descriptors for features, or fail at temporal tracking. Feature trackers typically only work on a single image for features in 2D space, but LK-SURF tracks features over 3D space. This new tracker allows stereo images to produce 3D features can be tracked and identified. Several calibration and modeling techniques are also described, including calibrating stereo cameras, aligning stereo cameras to an inertial system, and making neural net system models. These methods help to improve the quality of the data and images acquired for the SLAM process; temporal tracking of object in images is performed). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 60, 
The combination of Kurtz and Karvounis further discloses further comprising transmitting, in real-time, an alert or status indicator that indicates that said object is predicted to have said characteristic. (Kurtz, [0122], discloses the Bayesian and Markovian probabilistic inference methods may be used individually or in combination (as a hybrid) to enable the contextual interface 450 to manage event transitions. For example, the Bayesian probabilistic inference method can be used in scene analysis of current and prior video imagery to identify a potential inter-scene transition (transition test 630), testing whether an activity change is, or is not, a transition. If a transition is identified as an inter-scene transition, than a directional Markov model can be used to determine the likely event classification for the new video scene 620'. Much as before, confidence values can be tabulated, to measure the certainty of the Bayesian inter-scene transition inference or the Markovian inter-scene event classification inference. These can again be used for validation tests, perhaps resulting in the use of interim event settings. Such an approach, using a Bayesian model, can be considered to be pro-active or anticipatory, as it attempts to predict a new event state (and video capture mode), based on belief models of what may occur. It may be more difficult to implement as compared to the previously discussed statistical approach, which was more reactive; new state of object is predicted in video frames). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 61, 
The combination of Kurtz and Karvounis further discloses wherein said object is inside or outside a wind turbine, a nuclear reactor, a chemical reactor, a semiconductor fabrication system, an airfoil, a plasma system, a flame, a flow, an engine, a biological system, a medical imaging system, or a data source for a financial trading system. (Kurtz, [0161], discloses there are concepts for smart medical homes, in which individuals, and particularly the elderly may be monitored relative to their health status as they live in their residence. Accordingly, a variety of sensors may be distributed about the residence, including sensors in the furniture, flooring, appliances, and medicine cabinet. Cameras may also be used to monitor the individuals, but the individuals may find them too invasive. While cameras may be hidden behind electronic picture frames to make them more unobtrusive, the mere presence of the cameras may leave the individuals uneasy about being monitored; object of medical imaging system is disclosed). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 62, 
The combination of Kurtz and Karvounis further discloses wherein said AI algorithm is a deep neural network, a reservoir computing algorithm, a reinforcement learning algorithm, an adaptive learning algorithm, or a generative adversarial network.  (Kurtz, [0049], discloses the computer can access a gaze adaptive process 495, which can include both a gaze correction process and a gaze tracking process (or algorithms); adaptive algorithms is disclosed). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 63, 
The combination of Kurtz and Karvounis further discloses wherein said AI algorithm has been trained on training examples comprising video of said object or video of objects of a same type as said object. (Kurtz, [0104], discloses the previously discussed contextual interface 450 includes an intelligent agent or artificial intelligence (AI) or set of algorithms that adaptively responds (and perhaps anticipates) user activities, and modifies the video capture process to improve the video experience. Contextual interface 450 can also be a learning system, that progressively gains understanding of user activities and communication needs. These algorithms, and the supporting system data that enables their operation, are outlined in Table 3; AI algorithm uses training examples to train the machine learning algorithm for detecting objects). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 64, 
The combination of Kurtz and Karvounis further discloses wherein (d) presenting said video on said user interface. (Kurtz, [0049], [0157], discloses the present invention for a video communication system 300 has been described as a system that generates video imagery (basically the picture portion of a television signal) and the accompanying audio. It should be understood the system can also use digital still cameras or image processing to extract still images from a video stream. As an example, a key frame extraction algorithm that identifies the video frames that have the best composition and facial expressions can be used to create still images from the video output of system 290. The system 290 or device 300 can also generate metadata, including semantic data, which is stored with (or linked to) the image data, whether still or video. This metadata can include information such as the date, the identities of the local and remote participants, type of event data, key words extracted via voice recognition software, privacy settings for the communication event, and annotations or titles entered by users. This metadata can be useful in the archiving and recall of the video, still image, or audio data generated by the device 300 or system 290; The computer 340 also accesses or is linked to a user interface 440. This user interface 440 includes interface controls 190, which can take many physical forms, including a keyboard, joystick, a mouse, a touch screen, push buttons, or a graphical user interface. Screen 115 can also be a functional element in the operation of the interface controls 190. The user interface 440 also includes a privacy interface 400 and a contextual interface 450, and may further include an appearance-altering interface 490. The user interface 440 can also include a cue-based interface, which can be a portion of the contextual interface 450. The cue-based interface essentially observes cues, including speech commands, voice cues (intonation, pitch, etc.), gestures, body pose, and other interpretive cues, and then derives or determines responsive actions for the video communication system 300. These interfaces combine database, analysis, and control functions, which are enabled by the computer 340, the memory 345, the display 110, the image capture devices 120, the interface controls 190, and various other device components; output is displayed on user interface). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 65, 
The combination of Kurtz and Karvounis further discloses wherein said augmented derivative of said video is overlaid on said video. (Kurtz, [0157], discloses a video communication system 300 has been described as a system that generates video imagery (basically the picture portion of a television signal) and the accompanying audio. It should be understood the system can also use digital still cameras or image processing to extract still images from a video stream. As an example, a key frame extraction algorithm that identifies the video frames that have the best composition and facial expressions can be used to create still images from the video output of system 290. The system 290 or device 300 can also generate metadata, including semantic data, which is stored with (or linked to) the image data, whether still or video. This metadata can include information such as the date, the identities of the local and remote participants, type of event data, key words extracted via voice recognition software, privacy settings for the communication event, and annotations or titles entered by users. This metadata can be useful in the archiving and recall of the video, still image, or audio data generated by the device 300 or system 290; video image processed result is output by the system). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.


Regarding Claim 66, 
The combination of Kurtz and Karvounis further discloses wherein said object is or is associated with a dynamical system. (Kurtz, [0139], discloses both image capture of the local user to provide eye gaze perception for a remote viewer, and image display of the remote viewer with eye gaze correction relative to the local user are complicated by the variable geometrical relationships of users 10, displays 110, and cameras 120. Indeed, both eye contact image capture and eye contact image display may need to change dynamically as users 10 move around, effectively requiring eye gaze tracking for both image capture and display. However, the relevance of these issues depend on the degree to which user's 10 accept the fact that they are engaged in a video communication event, compared to the extent to which they would prefer to have the sense of "almost being there", as if they were just looking through a window into the other environment. The relevance of these issues also depends on the video context, and particularly the event classifications, as user expectations for eye contact will vary with event type; dynamic system is disclosed). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.

Regarding Claim 67, 
The combination of Kurtz and Karvounis further discloses wherein (c) is performed prior to (b). (Kurtz, [0157], discloses a video communication system 300 has been described as a system that generates video imagery (basically the picture portion of a television signal) and the accompanying audio. It should be understood the system can also use digital still cameras or image processing to extract still images from a video stream. As an example, a key frame extraction algorithm that identifies the video frames that have the best composition and facial expressions can be used to create still images from the video output of system 290. The system 290 or device 300 can also generate metadata, including semantic data, which is stored with (or linked to) the image data, whether still or video. This metadata can include information such as the date, the identities of the local and remote participants, type of event data, key words extracted via voice recognition software, privacy settings for the communication event, and annotations or titles entered by users. This metadata can be useful in the archiving and recall of the video, still image, or audio data generated by the device 300 or system 290; video image processed result is output by the system). Additionally, the rational and motivation to combine the references Kurtz and Karvounis as applied in rejection of claim 32 apply to this claim.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20160189397 A1
US 20170334066 A1

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINALBEN V PATEL whose telephone number is (571)270-5872. The examiner can normally be reached M-F: 10am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Pinalben Patel/Examiner, Art Unit 2661