DETAILED ACTION

Response to Arguments
Applicant’s arguments with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Due to the new grounds of rejection using a new reference of Rasheed et al. (US Pub. No. 2016/0165193 A1), this office action has been made Non-Final.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-5, 7, 8, 10-12, 14, 15, 17-19, 21-27 are rejected under 35 U.S.C. 103 as being unpatentable over Eledath et al. (US Pub. No. 2016/0378861 A1 and in further view of Venetianer et al. (US Pub. No. 2008/0100704 A1) and in further view of Rasheed et al. (US Pub. No. 2016/0165193 A1).
Regarding claim 1, Eledath discloses, an apparatus comprising a processor and a memory storing executable instructions that, in response to execution by the processor, cause the apparatus to: (See Eledath ¶152, “includes a plurality of instructions embodied in memory accessible by a processor of at least one of the computing devices.”)
receive a video feed; process the video feed in real-time as the video feed is received, including the apparatus being caused to: (See Eledath ¶36, “Among other things, embodiments of the disclosed technologies can utilize computer vision 
perform object detection and recognition on the video feed to detect and classify objects therein, perform activity recognition to detect and classify activities of at least some of the objects, and output classified objects and classified activities in the video feed; (See Eledath ¶49, “A scene-understanding server (e.g., scene understanding services 220) provides interfaces to modules that recognize classes and specific instances of objects (vehicles, people etc.), locales and activities being performed (e.g., FIGS. 33-34).”)
generate natural language text that describes the video feed from the classified objects and activities; (See Eledath ¶105, “Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image. (i.e. “Gray Van Owned by John Doe”) Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person.” (i.e. “Jim Jones Works with John Doe at ABC Co.”))
produce a semantic network including a graph with vertices that represent the classified objects, and edges that connect the vertices and represent semantic relationships between the classified objects, at least some of the semantic relationships corresponding to respective ones of the classified activities; (See Eledath ¶54, “Data collected in the system 110 can be stored and organized for situational awareness, analysis and reasoning by automated algorithms and human users … In a graph 
and store the video feed, classified objects and classified activities, natural language text, and semantic network in a knowledge base; (See Eledath ¶97, “The video 122 may be stored in computer memory as a video file and analyzed by the system 110 as disclosed herein.” Further see Eledath ¶105, “the system 110 creates a link between the features 608, 612 and stores the link and related information in the knowledge base 106 or other databases or searchable storage locations.”)
and generate a graphical user interface (GUI) configured to enable queries of the knowledge base, and presentation of selections of the video feed, classified objects and classified activities, natural language text, and semantic network. (See Eledath ¶128, “For example, the reasoning module 1804 may infer based on the user intent and processing performed by the inference module 1812 that there is a need to perform a query on a certain database to find the information the user is looking for. … converts the result of task flow/workflow execution and/or other processing initiated by the reasoning module 1804 into suitable output, e.g., graphical/textual overlays, system-generated natural language, etc., and sends the output to the appropriate output device (e.g., display, speaker), as illustrated by augmented image 1808.”)

However Venetianer discloses, receive a video feed comprising video from multiple sources; (See Venetianer ¶92, “The video sensors 14 provide source video to the computer system 11.  Each video sensor 14 can be coupled to the computer system 11 using, for example, a direct connection (e.g., a firewire digital camera interface) or a 
network.”
Further see Venetianer ¶97, “In block 21, the video surveillance system is set up as discussed for FIG. 1.  Each video sensor 14 is orientated to a location for video surveillance.  The computer system 11 is connected to the video feeds from the video equipment 14 and 15.”
Further see Venetianer ¶166, “The event discriminators may also use other types of primitives, as discussed above, and/or combine video primitives from multiple video sources to detect event occurrences.”)
wherein the classified activities comprise an interaction between one or more of the classified objet and a geographic area in the video feed; (See Venetianer ¶128, “Activity detectors correspond to a behavior related to an area of the video scene.  They describe how an object might interact with a location in the scene.  FIG. 18 illustrates three exemplary activity detectors.  FIG. 18a represents the behavior of crossing a perimeter in a particular direction using a virtual video tripwire … FIG. 18b represents the behavior of loitering for a period of time on a railway track.  FIG. 18c represents the behavior of taking something away from a section of wall …Other exemplary activity detectors may include detecting a person falling, detecting a person changing direction 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the multiple video sources and classifying activities that include an interaction between objects and geographic location as suggested by Venetianer to Eledath’s surveillance apparatus using known engineering techniques, with a reasonable expectation of success. The motivation for doing is in order to obtain more information by classifying objects from multiples viewpoints as well as more accurately determining many types of activities based on their interactions and locations. 
Eledath, Venetianer, and Rasheed disclose the above limitations where multiple streaming video sources are used for surveillance but they fail to disclose that one of the sources can be an unmanned aerial vehicle.
However Rasheed discloses, receiving video feed comprising video from multiple sources, including a moving aerial source; (See Rasheed ¶27, “The group of sensors 110-113 (and/or their computing device 118) may respectively stream video and metadata 115-117 to the video processing engine 120 Further see Rasheed ¶29, This typically includes geographically large areas that cannot be covered by a single camera or sensor and which are therefore covered by multiple fixed cameras and/or which may be covered by mobile cameras mounted on vehicles, including for example aerial vehicles.”)
geo-register the classified objects with respective geographic locations; (See Rasheed ¶65, “At 420, the process 400 transforms the video information from pixel 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the aerial surveillance streaming images that are geo-referenced as suggested by Rasheed to Eledath and Venetianer’s surveillance images using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to obtain aerial images from multiple viewpoints and for the user to know the geographical locations of the objects they are viewing.

Regarding claim 2, Eledath, Venetianer, and Rasheed disclose, the apparatus of claim 1, wherein at least some of the multiple sources are moving sources. (See Eledath ¶98, “For example, if the camera 114 is supported by the person 104 (e.g., as a component of a wearable or body-mounted device).”)

Regarding claim 3, Eledath, Venetianer, and Rasheed disclose, the apparatus of claim 1, wherein the apparatus being caused to geo-register the classified objects 
and wherein the GUI is further configured to present an aerial image or map of a scene in the video feed, identifying thereon the classified objects at the respective geographic locations (See Rasheed ¶66, “In 430, the process 400 may identify, recognize, locate, classify, and/or otherwise detect one or more objects and/or movements in the video, and may associate, label, or otherwise describe the location of the objects and/or movements in terms of the geo-registered space. For example, a car shown in a video feed from the EO camera 110 may be tagged.”)
	Rasheed discloses the above limitations, and discloses (see ¶66), tagging a car with multiple locations as it moves but he fails specifically disclose showing a trajectory of the object.
However Venetianer discloses, includes respective trajectories of any moving ones of the classified objects, and with the respective trajectories of the moving ones of the classified objects. (See Venetianer ¶169, “In block 62, an activity record is generated for each event occurrence that occurred.  The activity record includes, for example: details of a trajectory of an object;” Further see Venetianer ¶170, “In block 63, output is generated.  The output is based on the event occurrences extracted in block 44 and a direct feed of the source video from block 41.”)

Regarding claim 4, Eledath, Venetianer, and Rasheed disclose, the apparatus of claim 1, wherein the apparatus being caused to perform object detection and recognition includes being caused to assign respective unique identifiers to the classified objects, and the presentation of selections of the video feed in the GUI includes identifying the classified objects on the video feed and including the respective unique identifiers. (See Fig. 6 which shows identifiers of “Gray Van” and “Jim Jones” within the video feed in the user’s GUI.)

Regarding claim 5, Eledath, Venetianer, and Rasheed disclose, the apparatus of claim 1, wherein at least some of the objects are moving objects, and the apparatus being caused to perform object detection and recognition includes being caused to detect and classify the moving objects using convolutional neural networks. (See Eledath ¶67, “Some embodiments utilize parts-based deformable models, convolutional neural networks and subspace image embeddings for object detection.”)
However Venetianer discloses, classify the moving objects using motion compensation, (See Venetianer ¶157, “The motion detection technique of block 51 and the change detection technique of block 52 are complimentary techniques, where each technique advantageously addresses deficiencies in the other technique.”
Further see Venetianer ¶158, “As an option, if the video sensor 14 has motion (e.g., a video camera that sweeps, zooms, and/or translates), an additional block can be inserted before blocks between blocks 51 and 52 to provide input to blocks 51 and 52 
background subtraction (See Venetianer ¶156, “In block 52, objects are detected via change.  Any change detection algorithm for detecting changes from a background model can be used for this block. … As an example, a stochastic background modeling technique, such as dynamically adaptive background subtraction, can be used.”)

Regarding claim 7, Eledath, Venetianer, and Rasheed disclose, the apparatus of claim 1, wherein the apparatus being caused to generate the GUI includes being caused to generate the GUI configured to enable queries of the knowledge base based on similarity between a user-specified object and one or more of the classified objects in the video feed. (See Eledath ¶131, “For example, in response to a user asking "who is that?" the reasoner 1600 may need to analyze gesture and/or gaze data to determine the person in the scene to whom the user is referring as "that", and then initiate a face recognition algorithm to identify such person, and then initiate a search query to determine additional details about the person (e.g., residence, employment status, etc.).  The dialog boxes 1618, 1620, 1622 illustrate examples of output intents that may be produced by the reasoner 1600.” Also see Figs. 5-7 which show queries by the a user and the displayed output in a GUI.)

Regarding claim 8, Eledath, Venetianer, and Rasheed disclose, a method of intelligent video analysis, the method comprising: receiving a video feed comprising video from multiple sources, including a moving aerial source; processing the video feed 

Regarding claim 10, Eledath and Hurd disclose, the method of claim 8, wherein geo-registering the classified objects including respective trajectories of any moving ones of the classified objects, and wherein the GUI is further configured to present an aerial image or map of a scene in the video feed, identifying thereon the classified objects at the respective geographic locations and with the respective trajectories of the 

Regarding claim 11, Eledath, Venetianer, and Rasheed disclose, the method of claim 8, wherein performing object detection and recognition includes assigning respective unique identifiers to the classified objects, and the presentation of selections of the video feed in the GUI includes identifying the classified objects on the video feed and including the respective unique identifiers. (See the rejection of claim 4 as it is equally applicable for claim 11 as well.)

Regarding claim 12, Eledath, Venetianer, and Rasheed disclose, the method of claim 8, wherein at least some of the objects are moving objects, and performing object detection and recognition includes detecting and classifying the moving objects using motion compensation, background subtraction and convolutional neural networks. (See the rejection of claim 5 as it is equally applicable for claim 12 as well.)

Regarding claim 14, Eledath, Venetianer, and Rasheed disclose, the method of claim 8, wherein generating the GUI includes generating the GUI configured to enable queries of the knowledge base based on similarity between a user-specified object and one or more of the classified objects in the video feed. (See the rejection of claim 7 as it is equally applicable for claim 14 as well.)

Regarding claim 15, Eledath, Venetianer, and Rasheed disclose, a non-transitory computer-readable storage medium having computer- readable program code stored therein that in response to execution by a processor, causes an apparatus to: (See Eledath ¶161, “Embodiments may also be implemented as instructions stored using one or more machine-readable media, which may be read and executed by one or more processors.”)
receive a video feed comprising video from multiple sources, including a moving aerial source; process the video feed in real-time as the video feed is received, including the apparatus being caused to: perform object detection and recognition on the video feed to detect and classify objects therein, perform activity recognition to detect and classify activities of at least some of the objects, and output classified objects and classified activities in the video feed, wherein the classified activities comprise an interaction between one or more of the classified objects and a geographic area in the video feed; generate natural language text that describes the video feed from the classified objects and activities; produce a semantic network including a graph with vertices that represent the classified objects, and edges that connect the vertices and represent semantic relationships between the classified objects, at least some of the semantic relationships corresponding to respective ones of the classified activities; geo-register the classified objects with respective geographic locations; and store the video feed, classified objects and classified activities, natural language text, and semantic network in a knowledge base; and generate a graphical user interface (GUI) configured to enable queries of the knowledge base, and presentation of selections of the video feed, classified objects and classified activities, natural language text, and 

Regarding claim 17, Eledath and Hurd disclose, the computer-readable storage medium of claim 15, wherein the apparatus being caused to geo-register the classified objects includes respective trajectories of any moving ones of the classified objects, wherein the GUI is further configured to present an aerial image or map of a scene in the video feed, identifying thereon the classified objects at the respective geographic locations and with the respective trajectories of the moving ones of the classified objects. (See the rejection of claim 3 as it is equally applicable for claim 17 as well.)

Regarding claim 18, Eledath, Venetianer, and Rasheed disclose, the computer-readable storage medium of claim 15, wherein the apparatus being caused to perform object detection and recognition includes being caused to assign respective unique identifiers to the classified objects, and the presentation of selections of the video feed in the GUI includes identifying the classified objects on the video feed and including the respective unique identifiers. (See the rejection of claim 4 as it is equally applicable for claim 18 as well.)

Regarding claim 19, Eledath, Venetianer, and Rasheed disclose, the computer-readable storage medium of claim 15, wherein at least some of the objects are moving objects, and the apparatus being caused to perform object detection and recognition includes being caused to detect and classify the moving objects using motion 

Regarding claim 21, Eledath, Venetianer, and Rasheed disclose, the computer-readable storage medium of claim 15, wherein the apparatus being caused to generate the GUI includes being caused to generate the GUI configured to enable queries of the knowledge base based on similarity between a user- specified object and one or more of the classified objects in the video feed. (See the rejection of claim 7 as it is equally applicable for claim 21 as well.)

Regarding claim 22, Eledath, Venetianer, and Rasheed disclose, the apparatus of claim 1, wherein the moving aerial source is an unmanned aerial vehicle. (See Rasheed ¶27, “a mobile camera 113 such as an aerial video camera mounted on a flying vehicle, such as a remotely piloted vehicle, other unmanned aerial vehicle (UAV), a manned aircraft.”)

Regarding claim 23, Eledath, Venetianer, and Rasheed disclose, the apparatus of claim 1, wherein the GUI is further configured to present an aerial image or map identifying a geographic location of at least one of the multiple sources.  (See Rasheed ¶35, “In some embodiments, geo-registration allows display of the image sensors 110-113 themselves on the map-based user interface 140.  For example, the metadata associated with each image sensor 110-113 may provide location data that allows 
interface 140.”)

Regarding claim 24, Eledath, Venetianer, and Rasheed disclose, the method of claim 8, wherein the moving aerial source is an unmanned aerial vehicle. (See the rejection of claim 22 as it is equally applicable for claim 24 as well.)

Regarding claim 25, Eledath, Venetianer, and Rasheed disclose, the method of claim 8, wherein the GUI is further configured to present an aerial image or map identifying a geographic location of at least one of the multiple sources. (See the rejection of claim 23 as it is equally applicable for claim 25 as well.)

Regarding claim 26, Eledath, Venetianer, and Rasheed disclose, the computer-readable storage medium of claim 15, wherein the moving aerial source is an unmanned aerial vehicle. (See the rejection of claim 22 as it is equally applicable for claim 26 as well.)

Regarding claim 27, Eledath, Venetianer, and Rasheed disclose, the computer-readable storage medium of claim 15, wherein the GUI is further configured to present an aerial image or map identifying a geographic location of at least one of the multiple sources. (See the rejection of claim 23 as it is equally applicable for claim 27 as well.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID PERLMAN whose telephone number is (571)270-1417.  The examiner can normally be reached on Monday - Friday; 10:00am - 6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.