DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s remarks filed 23 May 2022 have been fully considered.
The rejections under section 101 are withdrawn in light of amendment.	The arrangement of the central hub and remote cameras is integrates the recited judicial exceptions into a practical application.
Applicant’s remarks regarding patentability over the prior art with respect to claim 1 are moot in view of the new grounds of rejection presented below.
Applicant argues that Rothschild does not teach identifying an inference for a user query based on non-video time-stamped metadata.  Examiner respectfully disagrees.  Rothschild states, “Video corresponding to the biometric event can then be displayed.”  This teaches that the biometric event is used to determine an inference for a video search query, as it describes a process that uses biometric data, i.e., non-video data, to locate video data.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, and 5 is/are rejected under 35 U.S.C. 103 as being obvious over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Jiang et al., US 2011/0208730 A1 (hereinafter “Jiang”).

As per claim 1, Shen teaches:
receiving, at a central hub, time-stamped metadata for each of the one or more video streams captured by a camera at a remote site that is physically remote from the central hub (Shen Fig. 1), the time-stamped metadata for each video stream identifying one or more objects and/or events occurring in the corresponding video stream as well as an identifier that uniquely identifies the corresponding video stream (Shen ¶¶ 0037-40), where a metadata database provides time-coded metadata identifying objects in a media asset;
receiving, at the central hub, a sequence of two or more user queries including a latest user query entered by a user via a user device (Shen ¶ 0050), where input is received, where the disclosed natural language processing techniques is a cognitive model as claimed (Specification [0055], “The analytics model store 32 may be a centralized storage repository for storing a plurality of analytical and/or machine learning models related to the video cognitive services 29, which enable the video query engine 25 to understand natural language.”), where any number of queries can be received;
the central hub sequentially processing the sequence of two or more user queries via a video query engine, wherein the video query engine includes one or more cognitive models (Shen ¶ 0050), where natural language processing is performed;
the video query engine processing the sequence of two or more user queries using the one or more cognitive models to identify an inference for the latest user query (Shen ¶ 0050), where natural language processing infers the meaning of the query;
the video query engine building a search query based at least in part on the latest user query and the identified inference (Shen ¶ 0050), where searching is performed based on the natural language processed input;
the video query engine applying the search query to the time-stamped metadata via the video query engine to search for one or more objects and/or events in the one or more video streams that match the search query (Shen ¶ 0050), where the metadata is searched using the output of the natural language processing techniques – the claimed applying;
the video query engine returning a search result to the user device, wherein the search result identifies one or more matching objects and/or events in the one or more video streams that match the search query, and for each matching object and/or event that matches the search query, providing a reference to the corresponding video stream and a reference time in the corresponding video stream that includes the matching object and/or event (Shen ¶ 0050), where the video at the relevant time-code identified by the metadata (“one or more frames of video”) is returned; and
for at least one of the matching object and/or event that matches the search query, using the reference to the corresponding video stream and the reference time to identify a video clip that includes the matching object and/or event (Shen ¶¶ 0059-60), where the matching video clip is played, and therefore identified; and
displaying on the user device the identified video clip that includes the matching object and/or event (Shen ¶¶ 0059-60), where the matching video clip is played.

Shen, however, does not teach:
wherein the inference is to a situational context under which the latest user query was entered by the user and is based at least in part on one or more user queries of the sequence of two or more user queries prior to the latest user query.

The analogous and compatible art of Jiang, however, teaches inferring a query context based on a prior query in a sequence of queries (Jiang ¶ 0021).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to modify the teachings of Shen with those of Jiang to infer a context of a latest query based on a sequence of prior queries as claimed in order to produce more accurate search results as in Jiang.

As per claim 2, the rejection of claim 1 is incorporated, but Shen does not teach:
wherein the one or more cognitive models is used to determine a user’s intent of the latest user query.

The analogous and compatible art of Jiang, however, teaches inferring a query context based on a prior query in a sequence of queries (Jiang ¶ 0021).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to modify the teachings of Shen with those of Jiang to use the model of Jiang to infer a context of a latest query based on a sequence of prior queries as claimed in order to produce more accurate search results as in Jiang.

As per claim 5, the rejection of claim 1 is incorporated, and Shen further teaches:
wherein the one or more cognitive models is used to determine which entities are the primary objects and/or events of interest to the user that entered the latest user query (Shen ¶¶ 0059-60), where entities (e.g., “President Kim”) are inferred using natural language processing.

Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Jiang et al., US 2011/0208730 A1 (hereinafter “Jiang”), and further in view of Reese et al., US 2014/0258270 A1 (hereinafter “Reese”).

As per claim 3, the rejection of claim 1 is incorporated, but Shen does not explicitly teach:
wherein the one or more cognitive models is used to determine an emotional state of the user that entered the latest user query.

The analogous and compatible art of Reese, however, teaches inferring a user’s query emotion using natural language processing (Reese ¶¶ 0506-57).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Reese with those of Shen to perform the natural language processing of Shen to infer a user emotion as in Reese in order to produce better search results.

Claims 6-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Jiang et al., US 2011/0208730 A1 (hereinafter “Jiang”), and further in view of Songfack, US 2007/0219980 A1 (hereinafter “Songfack”).

As per claim 6, the rejection of claim 1 is incorporated, but Shen does not teach:
wherein the one or more cognitive models of the video query engine are refined using machine learning over time.

The analogous and compatible art of Songfack, however, teaches refining a natural language parser using machine learning (Songfack ¶ 0013).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Songfack with those of Shen to refine the natural language parser of Shen using machine learning as in Songfack in order to more accurately parse the user input.

As per claim 7, the rejection of claim 6 is incorporated, but Shen does not teach:
wherein the one or more cognitive models of the video query engine are refined using machine learning over time based on user feedback.

The analogous and compatible art of Songfack, however, teaches refining a natural language parser using machine learning based on user feedback (Songfack ¶ 0013).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Songfack with those of Shen to refine the natural language parser of Shen using machine learning based on user feedback as in Songfack in order to more accurately parse the user input.

As per claim 8, the rejection of claim 6 is incorporated, but Shen does not teach:
wherein the user feedback includes a subsequent user query that is entered after the video query engine returns the search result to the use in order to refine the latest user query.

The analogous and compatible art of Songfack, however, teaches refining a natural language parser using machine learning based on user feedback including subsequent queries (Songfack ¶ 0013), where the log contains subsequent queries.

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Songfack with those of Shen to refine the natural language parser of Shen using machine learning based on user feedback including subsequent queries as in Songfack in order to more accurately parse the user input.

Claims 9-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Jiang et al., US 2011/0208730 A1 (hereinafter “Jiang”), and further in view of Rothschild et al., US 2017/0229149 A1 (hereinafter “Rothschild”).

As per claim 9, the rejection of claim 1 is incorporated, but Shen does not teach:
receiving time-stamped data generated by one or more non-video based devices; and
the one or more cognitive models using the time-stamped data generated by one or more non-video based devices to identify the inference for the latest user query.

The analogous and compatible art of Rothschild, however, teaches receiving time-stamped data generated by non-video based devices, and using this data to generate an inference for a query (Rothschild Abstract).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Rothschild with those of Shen to use time-stamped non-video data to generate an inference for search in order to locate video relevant to, e.g., a biometric event.

As per claim 10, the rejection of claim 9 is incorporated, but Shen does not teach:
wherein the one or more cognitive models use the time-stamped data generated by one or more non-video based devices and time-stamped metadata for one or more of the one or more video streams to identify the inference for the latest user query.

The analogous and compatible art of Rothschild, however, teaches receiving time-stamped data generated by non-video based devices, and using this data along with time-stamped video data to generate an inference for a query (Rothschild Abstract).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Rothschild with those of Shen to use time-stamped non-video data to generate an inference for search in order to locate video relevant to, e.g., a biometric event.

As per claim 11, the rejection of claim 9 is incorporated, but Shen does not teach:
wherein one or more non-video based devices comprises one or more security sensors.

The analogous and compatible art of Rothschild, however, teaches receiving time-stamped data generated by non-video based devices such as biometric sensors, a security sensor as claimed under a broadest reasonable interpretation, and using this data to generate an inference for a query (Rothschild Abstract).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Rothschild with those of Shen to use time-stamped non-video data to generate an inference for search in order to locate video relevant to, e.g., a biometric event.

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Jiang et al., US 2011/0208730 A1 (hereinafter “Jiang”), and further in view of Neely et al., US 2012/0259895 A1 (hereinafter “Neely”).

As per claim 12, the rejection of claim 1 is incorporated, but Shen does not teach:
processing the time-stamped metadata to identify contextual relationships between objects and/or events occurring in the one or more video streams before entering the latest user query.

The analogous and compatible art of Neely, however, teaches pre-processing time-stamped metadata to identify contextual relationships between objects and/or events in a video (Neely ¶ 0014).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Neely with those of Shen to pre-process the time-stamped metadata to identify contextual relationships between objects and/or events in the video streams in order to produce more relevant search results.

Claims 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Jiang et al., US 2011/0208730 A1 (hereinafter “Jiang”), and further in view of Amer et al., US 2019/0304157 A1 (hereinafter “Amer”).
	
As per claim 13, the rejection of claim 1 is incorporated, but Shen does not teach:
wherein using the reference to the corresponding video stream and the reference time to identify and display the video clip that includes the matching object and/or event is initiated automatically upon the video query engine returning the search result.

The analogous and compatible art of Amer, however, teaches automatically playing video search results (Amer ¶ 0170).

It would therefore been obvious to one of ordinary skill in the art to combine the teachings of Amer with those of Shen to automatically play a search result in order to produce a better user experience as an obvious design choice.

As per claim 14, the rejection of claim 1 is incorporated, but Shen does not teach:
wherein using the reference to the corresponding video stream and the reference time to identify and display the video clip that includes the matching object and/or event is initiated manually by a user after the video query engine returns the search result.

The analogous and compatible art of Amer, however, teaches playing video search results manually after a user selection (Amer ¶ 0170).

It would therefore been obvious to one of ordinary skill in the art to combine the teachings of Amer with those of Shen to play a search result manually after a user selection in order to produce a better user experience as an obvious design choice.

Claims 15 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Rothschild et al., US 2017/0229149 A1 (hereinafter “Rothschild”).

As per claim 15, Shen teaches:
a memory for storing time-stamped metadata for each of the one or more video streams, the time-stamped metadata for each video stream identifying one or more objects and/or events occurring in the corresponding video stream as well as an identifier that uniquely identifies the corresponding video stream (Shen ¶¶ 0037-40), where a metadata database provides time-coded metadata identifying objects in a media asset;
a video query engine that includes one or more cognitive models (Shen ¶ 0050), where the disclosed natural language processing techniques is a cognitive model as claimed (Specification [0055], “The analytics model store 32 may be a centralized storage repository for storing a plurality of analytical and/or machine learning models related to the video cognitive services 29, which enable the video query engine 25 to understand natural language.”), the video query engine configured to:
receive a user query from a user (Shen ¶ 0050), where input is received;
process the user query using the one or more cognitive models to identify an inference for the user query (Shen ¶ 0050), where natural language processing is performed;
build a search query based at least in part on the user query and the identified inference (Shen ¶ 0050), where searching is performed based on the natural language processed input;
apply the search query to the time-stamped metadata for each of the one or more video stream stored in the memory to search for one or more objects and/or events in the one or more video streams that match the search query (Shen ¶ 0050), where the metadata is searched using the output of the natural language processing techniques – the claimed applying;
return a search result to the user, wherein the search result identifies one or more matching objects and/or events in the one or more video streams that match the search query, and for each matching object and/or event that matches the search query, providing a reference to the corresponding video stream and a reference time in the corresponding video stream that includes the matching object and/or event (Shen ¶ 0050), where the video at the relevant time-code identified by the metadata (“one or more frames of video”) is returned; and
a user interface for displaying a video clip that includes a matching object and/or event (Shen ¶¶ 0059-60), where the matching video clip is played.

Shen, however, does not teach:
the memory further storing time-stamped data generated by one or more non-video devices; or
the video query engine configured to process the user query and at least some of the time-stamped data generated by one or more non-video based devices.

The analogous and compatible art of Rothschild, however, teaches receiving time-stamped data generated by non-video based devices, and using this data to generate an inference for a query (Rothschild Abstract).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Rothschild with those of Shen to use time-stamped non-video data to generate an inference for search in order to locate video relevant to, e.g., a biometric event.

As per claim 18, the rejection of claim 15 is incorporated, and Shen further teaches:
wherein the inference is to one or more of:
a user's intent of the user query, where this is an optional feature as claimed;
an emotional state of the user that entered the user query, where this is an optional feature as claimed;
a situational context in which the user query was entered, where this is an optional feature as claimed; and
which entities are the primary objects and/or events of interest to the user that entered the user query (Shen ¶¶ 0059-60), where entities (e.g., “President Kim”) are inferred using natural language processing.

As per claim 19, Shen teaches:
receiving time-stamped metadata for a video stream, the time-stamped metadata identifying one or more objects and/or events occurring in the video stream (Shen ¶¶ 0037-40), where a metadata database provides time-coded metadata identifying objects in a media asset;
entering a user query by a user into a video query engine, wherein the video query engine includes one or more cognitive models (Shen ¶ 0050), where input is received, where the disclosed natural language processing techniques is a cognitive model as claimed (Specification [0055], “The analytics model store 32 may be a centralized storage repository for storing a plurality of analytical and/or machine learning models related to the video cognitive services 29, which enable the video query engine 25 to understand natural language.”);
the video query engine processing the user query using the one or more cognitive models to build a search query (Shen ¶ 0050), where natural language processing is performed;
the video query engine applying the search query to the time-stamped metadata for a video stream via the video query engine to search for one or more objects and/or events in the video stream that matches the search query (Shen ¶ 0050), where the metadata is searched using the output of the natural language processing techniques – the claimed applying;
the video query engine returning a search result to the user, wherein the search result identifies one or more matching objects and/or events in the video stream that match the search query (Shen ¶ 0050), where the video at the relevant time-code identified by the metadata (“one or more frames of video”) is returned; and
displaying a video clip that includes at least one of the one or more matching objects and/or events (Shen ¶¶ 0059-60), where the matching video clip is played.

Shen, however, does not teach:
receiving time-stamped metadata generated by one or more non-video based devices; or
the video query engine processing the user query and at least some of the time-stamped metadata generated by one or more non-video based devices, wherein the one or more cognitive models use user query and at least some of the time-stamped metadata generated by one or more non-video based devices to build a search query.

The analogous and compatible art of Rothschild, however, teaches receiving time-stamped data generated by non-video based devices, and using this data to generate an inference for a query (Rothschild Abstract).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Rothschild with those of Shen to use time-stamped non-video data to generate an inference for search in order to locate video relevant to, e.g., a biometric event.

Claims 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Rothschild et al., US 2017/0229149 A1 (hereinafter “Rothschild”), and further in view of Songfack, US 2007/0219980 A1 (hereinafter “Songfack”).

As per claim 16, the rejection of claim 15 is incorporated, but Shen does not teach:
wherein the one or more cognitive models of the video query engine are refined using machine learning over time.

The analogous and compatible art of Songfack, however, teaches refining a natural language parser using machine learning (Songfack ¶ 0013).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Songfack with those of Shen to refine the natural language parser of Shen using machine learning as in Songfack in order to more accurately parse the user input.

As per claim 17, the rejection of claim 15 is incorporated, but Shen does not teach:
wherein the one or more cognitive models of the video query engine are refined using machine learning over time based on user feedback, wherein the user feedback includes a subsequent user query that is entered after the video query engine returns the search result to the use in order to refine the user query.

The analogous and compatible art of Songfack, however, teaches refining a natural language parser using machine learning based on user feedback including subsequent queries (Songfack ¶ 0013), where the log contains subsequent queries.

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Songfack with those of Shen to refine the natural language parser of Shen using machine learning based on user feedback including subsequent queries as in Songfack in order to more accurately parse the user input.

Claim 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al., US 2019/0311743 A1 (hereinafter “Shen”), in view of Rothschild et al., US 2017/0229149 A1 (hereinafter “Rothschild”), and further in view of Neely et al., US 2012/0259895 A1 (hereinafter “Neely”).

As per claim 20, the rejection of claim 19 is incorporated, but Shen does not teach:
processing the time-stamped metadata to identify contextual relationships between objects and/or events occurring in the video stream before entering the latest user query.

The analogous and compatible art of Neely, however, teaches pre-processing time-stamped metadata to identify contextual relationships between objects and/or events in a video (Neely ¶ 0014).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Neely with those of Shen to pre-process the time-stamped metadata to identify contextual relationships between objects and/or events in the video streams in order to produce more relevant search results.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM SPIELER whose telephone number is (571)270-3883. The examiner can normally be reached Monday-Friday, 11-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on 571-270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

WILLIAM SPIELER
Primary Examiner
Art Unit 2159



/WILLIAM SPIELER/               Primary Examiner, Art Unit 2159