DETAILED ACTION

Allowable Subject Matter
Claims 1, 3-5, 7, 8, 10-15, 17-19, and 21-33 are allowed.
The following is an examiner’s statement of reasons for allowance: See Response to arguments below whereby some of applicant’s arguments are agreed upon. Specifically Eledath fails to disclose the following limitation from independent claims 1, 8, and 15, “produce a semantic network including a graph with vertices that represent the classified objects, and edges that connect the vertices and represent semantic relationships between the classified objects, at least some for the semantic relationships corresponding to respective ones of the classified activates.”
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.

Additional references were found that more clearly disclose some of the limitations in the independent claims and new claims 28-33, however they do not disclose all the limitations. The references are as follows:
Cheng et al. (US Pub. 2014/0328570 A1) ¶59 discloses dividing a video into multiple time segments, and ¶26-27 discloses using NLP to textually describe each segment.

Xu et al (“Semantic based representing and organizing surveillance big data using video structural description technology”) discloses creating semantic networks for surveillance events that can be used for object querying, see for example Fig. 3.

Response to Arguments
 Some of Applicant’s arguments, see Remarks, filed on 09/23/2021, have been fully considered and are persuasive.  Therefore the rejection of claims 1, 3-5, 7, 8, 10-12, 14, 15, 17-19, 21-27 over Eledath et al. (US Pub. No. 2016/0378861 A1) in view of Venetianer et al. (US Pub. No. 2008/0100704 A1) and in further view of Rasheed et al. (US Pub. No. 2016/0165193 A1) has been withdrawn.

Applicant has argued as follows: 1.  Eledath Does not Generate Natural Language Text That Describes the Video Feed From the Classified Objects and Activities 
In the Office Action, the Examiner finds that Eledath teaches the "generate natural language text that describes the video feed from the classified objects and activities" feature of claim 1. Applicant respectfully disagrees with the Examiner's findings. Eledath is directed to a vision-based user interface platform for augmented reality purposes. According to paragraph [0105] and FIG. 6, the disclosure in Eledath does indicate that text with identification information about a person or object is generated and displayed on virtual element 610 and 614. However, Applicant lassified objects, and classified activities of the van or person. 
Additionally, it appears that Eledath merely overlays text over the video and does not actually enable querying the knowledge base to display video feed corresponding to particular objects or activities. Eledath is directed to providing an augmented reality service to the user, whereas the pending claims generate a knowledge base of video that can be queried based on objects and activities that have occurred in the past. 
Accordingly, Applicant respectfully submits that the augmented reality system of Eledath does not disclose the natural language generation feature of the pending claims. 
	Examiner’s Response: It is agreed that the generated text only identifies objects and their spatiotemporal relationships (see Eledath ¶89), therefore Eledath does not disclose generating text for the classified activity relationships between objects. 

Applicant has argued as follows: 2.  Eledath Does Not Produce a Semantic Network According to the Pending Claims 
The Examiner finds that Eledath teaches the "produce a semantic network including a graph with vertices that represent the classified objects, and edges that connect the vertices and represent semantic relationships between the classified objects, at least some of the semantic relationships corresponding to respective ones of the classified activities" feature of pending claim 1. The Examiner cites to paragraphs 
While paragraph [0054] of Eledath does disclose that "[d]ata collected in the system 110 can be stored and organized for situational awareness, analysis and reasoning by automated algorithms and human users .... In a graph representation, nodes represent the objects of interest along with their attributes, and edges between the nodes represent inter-object relationships" the Examiner has overlooked another teaching from paragraph [0054] that appears to contradict the conclusion. Namely, paragraph [0054] further states that "[i]n a triple store, data objects - entities, events, the relations between them, attributes, etc. - are stored as subject-predicate object triples." This teaching in Eledath appears to indicate that entities, events, and relations between them (i.e., classified objects and the activities relating them) are data objects. These data objects are nodes of the graph, according to the cited portion of paragraph [0054]. 
In other words, Eledath appears to teach that the objects, including events/activities relating the objects, make up the vertices of the graph and inter-object relationships are edges between the nodes. This is not the case in the pending claims in which the edges represent semantic relationships between objects, and at least some of these semantic relationships correspond to respective ones of the classified activities. This does not appear to happen in Eledath. 
Accordingly, Applicant respectfully submits that the augmented reality system of Eledath does not disclose the semantic network production feature of the pending claims. 
Examiner’s Response: It is agreed that Eledath in ¶54 only discloses a semantic graph with spatiotemporal relationships between objects but he fails to disclose a semantic graph with objects as vertices and edges as activities. 

Applicant has argued as follows: 3.  One Having Ordinary Skill in the Art Would Not Combine the Teachings of Eledath and Venetianer 
Applicant respectfully submits that those having ordinary skill in the art would not think it obvious to combine the teachings of Venetianer with the teachings of Eledath. In the Office Action, the Examiner argues that "[i]t would have been obvious...to include multiple video sources and classifying activities that include an interaction between objects and geography location as suggested by Venetianer to Eledath's surveillance apparatus...in order to obtain more information by classifying objects from multiple viewpoints as well as more accurately determining many types of activities based on their interactions and locations." See Office Action dated July 2, 2021, page 7. Applicant respectfully disagrees. 
The rationale given by the Examiner for combining the references does not make sense in the context of Eledath's invention. Eledath is directed to augmented reality devices. In other words, Eledath's invention is directed to a surveillance system with a single viewpoint: the user's. Eledath's system, like almost every other augmented reality system, lends itself to a single viewpoint because that is the purpose of augmented reality systems: they add virtual features to a user's viewpoint to add extra meaning (or entertainment) to the reality the user sees before them. Those having ordinary skill in the art would not think it obvious to combine the multiple viewpoints teachings from 
Examiner’s Response: This point is not agreed upon because while some of the embodiments of Eledath use augmented reality cameras, others such as in ¶97 use “fixed-location cameras (such as "stand-off" cameras that are installed in walls or ceilings)” and would therefore lend themselves towards obtaining video from multiple camera sources.

Applicant has argued as follows: Moreover, the Examiner's rationale for combining the references includes the argument that it would be obvious to modify Eledath per Venetianer to "obtain more information by classifying objects from multiples [sis] viewpoints..." This does not appear to be the case. While paragraph [0166] does state that "[t]he event discriminators may also use other types of primitives as discussed above, and /or combine video primitives from multiple video sources to detect event occurrences," there is no indication that the device actually classifies objects based on multiple viewpoints. The event discriminators appear less focused on multiple viewpoints and more dependent on different video primitives, which could include different types of cameras as opposed to different viewpoints. In any event, the description in Venetianer is hardly clear enough for the Examiner to conclude that Venetianer classifies objects from multiple viewpoints and therefore, the Examiner's rationale for combining Venetianer and Eledath is flawed. 
Examiner’s Response: This point is not agreed upon, since Venetianer does disclose using multiple video sources from multiple viewpoints (¶97) as well as obtaining primitives or “activity description metadata” (¶100-101) from each video from all the sources. Additionally in ¶135 object detection is also disclosed, and since multiple sources are used to obtain the primitives all subsequence detections of objects and events must therefore be based upon on multiple camera sources.  

Applicant has argued as follows: Finally, the Examiner's argument that the combination would result in more accurate determinations of many types of activities is unsupported. The Examiner merely makes a conclusory statement that the combination would lead to more accurate results without providing any evidence to support this conclusion. The Examiner's argument is therefore unsupported and fails to provide a reason for combining the teachings of the two references. 
Accordingly, Applicant respectfully submits that those having ordinary skill in the art would not find it obvious to combine the teachings of Venetianer with those of Eledath to arrive at the features of the pending disclosure because there is no rational basis for doing so. 
Examiner’s Response: This point is disagreed, since using multiple video  viewpoints of an activity is obviously better than only a single viewpoint since more information is obtained by combining information every direction, and therefore it would provide better results, a secondary reference to prove this can certainly be provided but it was not felt to be necessary since this should be obvious.

Applicant has argued as follows: 4.  One Having Ordinary Skill in the Art Would Not Combine the Teachings of Eledath in View of Venetianer per Rasheed 
The Examiner argues that it would be obvious to modify Eledath in view of Ventianer, per Rasheed, to "obtain aerial images from multiple viewpoints and for the fixed-location cameras (such as "stand-off" cameras that are installed in walls or ceilings), user to know the geographical locations of the objects they are viewing." This rationale is circular and does not establish a logical underpinning for combining Rasheed with Eledath in view of Venetinaer. That is, the Examiner effectively concludes that one skilled in the art would be motivated to combine the cited references for the purpose of producing the combination. This is simply insufficient and does nothing to establish why one skilled in the art would be motivated to produce the combination in the first place. 
Examiner’s Response: While the motivation provided could certainly have been more detailed, the motivation should really be self-explanatory since using aerial images from multiple viewpoints allows for a larger a viewing area for surveillance purposes that enables better classification since nothing is hidden from the camera view. Additionally a viewer of a surveillance video will want to know the location of every object that is classified and by geo-referencing the object location data can be clearly displayed to the user. 

Applicant has argued as follows: The Examiner attempts to argue that the motivation for the combination is also for the user to know the geographical locations of the objects they are viewing. However, this too fails to establish an obvious reason for 
Examiner’s Response:  This point is disagreed upon, there is certainly good reasons to improve upon the taught geolocation as disclosed by Eledath. Geolocation only provides the location for some locations, whereas geo-registering is better since it provides the a location for every point on a map.

Applicant has argued as follows: Moreover, it appears that Rasheed uses geographic referencing so that its surveillance rules are not dependent on a particular image sensor, not because the device in Rasheed is interested in geographic locations of the objects. Therefore, a person having ordinary skill in the art would not look to Rasheed to solve the problems associated with the pending claims because Rasheed is directed to solving an entirely different problem. 
Accordingly, there would be no reason, and the Examiner has provided no logical reason, for a person having ordinary skill in the art to look to Rasheed to modify the teachings of Eledath and Venetianer to arrive at the pending claims. 
Examiner’s Response: Even if Rasheed is supposedly solving a different problem, which it has not been established, his teaching is still in the same field of art of surveillance imaging.

Applicant has argued as follows: 5.  Eledath Does not Render Obvious Enabling Queries of the Knowledge Base Based on Similarity Between a User-Specified Object and One or More of the Classified Objects in the Video Feed 
Claim 7 depends from independent claim 1, and recites that the GUI is generated to enable queries of the knowledge base based on similarity between a user-specified object and one or more of the classified objects in the video feed. In the Office Action, on page 11, the Examiner finds that paragraph [0131] of Eledath teaches the subject matter of claim 7. Specifically, the Examiner points to the following passage in the rejection: "in response to a user asking "who is that?" (with respect to a person on video surveillance) the reasoned 1600 may need to analyze gesture and/or gaze data to determine the person in the scene to whom the user is referring as "that", and then initiate a face recognition algorithm to identify such person, and then initiate a search query to determine additional details about the person (e.g., residence, employment status, etc.). The dialog boxes 1618, 1620, and 1622 illustrate examples of output intents that may be produced by the reasoned 1600." The Examiner further points to FIGS. 5-7 of Eledath as teaching the claimed feature. Applicant respectfully disagrees that this passage, or any other passage from Eledath, teaches the claimed feature. 
The claimed feature is that the GUI is configured to enable queries of the knowledge base based on similarity between a user-specified object and one or more of the classified objects in the video feed. That is, if the user queries the knowledge base by selecting or indicating an object on the GUI, the apparatus will search the knowledge base for another object with similar features. As disclosed in paragraph [0048] of the publication of the pending application, an example of this feature includes, "as indicated 
Examiner’s Response: Eledath in ¶138 clearly discloses that the database is being queried based on an object that a user is pointing at, so this object is considered to be the object that is input by the user. The features in ¶48 of applicant’s specification are not recited in the claim and are not necessary to be disclosed by Eledath. Eledath is using GUI that is uses gesture and voice based input, it is not necessary to use a touchscreen based selection input as applicant seems to be using. Furthermore Eledath in ¶44 does mention that a touch based interface could be used for initiating queries.

Applicant has argued as follows: Eledath merely discloses a user querying who the person is on the video feed, and the device in Eledath is configured to determine more information about the person. Eledath is not disclosing finding similar objects or persons to the one selected, like the pending claims. It is instead just providing more details about the selected person. Therefore, Eledath fails to teach, suggest, or otherwise render obvious at least this feature. Applicant further respectfully submits that Ventianer and Rasheed also fail to teach this feature and they are not cited for that purpose. 
 Examiner’s Response: While the example embodiments are for retrieving text, Eledath does disclosed in ¶95 that “"knowledge" may refer to any type of query-retrievable stored content, including a document file, an image file, a video file”, so potentially he does include retrieving similar items, however it is agreed that Eledath 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID PERLMAN whose telephone number is (571)270-1417.  The examiner can normally be reached on Monday - Friday; 10:00am - 6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/DAVID PERLMAN/Primary Examiner, Art Unit 2662