DETAILED ACTION
This action is in response to claims filed 04 March, 2020 for application 17/001336 filed 24 August, 2020. Currently claims 1-30 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 1, 14 and 28 are objected to because of the following informalities:  the line including “a health of the at least one detected target object, and” has an unnecessary “and” as the following limitation has been deleted.  Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 28-30 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the modules in the claim can be interpreted as comprising software per se. See [0058] from the instant specification and the preamble of claim 28. Claims cannot comprise solely software. 
Claims 1, 3-7, 9-14, 17-21, 23-28 and 29 are rejected under 35 U.S.C. 101 because the claimed invention is directed to the abstract idea of analyzing frames of streams to assess and annotate objects within the stream without significantly more. 

This judicial exception is not integrated into a practical application. In particular, the claims only recites the additional elements –  “computer readable medium”, “software module”, and “AI engine” which are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of generating an index) such that they amount to no more than mere instructions to apply the exception using a 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of  “computer readable medium”, “software module”, and “AI engine” to perform the receiving, analyzing, generating and presenting steps amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Please see MPEP §2106.05(b). Additionally merely collecting and transmitting video/images amounts to extrasolution activity that cannot provide an inventive concept. The claims are not patent eligible.
Claims 3-7, 9-13, 17-27 and 29 recite additional steps which including further communicating, specifying an alert parameter and communicating an alert, specific alert parameters, a capturing device, defining a zone and associating a content stream with the zone, and further communicating. The additional elements are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of generating an index) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Please see MPEP §2106.05(b). The claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-6, 9-20, 23-28 and 30 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ananthanarayanan et al. (Real-Time Video Analytics: The Killer App for Edge Computing)(hereinafter “Ananth”) in view of Brust et al. (Towards Automated Visual Monitoring of Individual Gorillas in the Wild).

Regarding claim 1, Ananth discloses: A method comprising: 
establishing at least one target object to detect within a content stream, wherein establishing the at least one target object to detect comprises (“The video pipeline optimizer converts high-level video queries to video-processing pipelines composed of many vision modules; for example, a video decoder, followed by an object detector and then an object tracker. Each module implements predefined interfaces to receive and process events (or data) and then sends its results downstream.” P61 §Rocket: Video analytics software stack ¶2): 
identifying at least one target object profile from a database of target object profiles (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2); 
establishing at least one parameter for assessing the at least one target object (“We profiled 300 parameter configurations (such as frame sampling, resolution, and implementation choices) and compared it to the ground truth of the tracks obtained using crowdsourcing. Note the vast spread in accuracy as well as the CPU demand and data rates (which represent network demands). For each of the configurations, if the allocated resource is less than the demand, the analytics cannot keep up with the video stream’s incoming rate.” P62 last ¶), wherein establishing the at least one parameter comprises: 
analyzing the at least one frame associated with the content stream for the at least one target object (“As mentioned earlier, the resource demands of certain vision algorithms can make running them on every frame prohibitively expensive. Fortunately, most video streams have temporal redundancy among frames, making it possible to sub-sample them without losing accuracy. We developed a suite of content-aware and application-aware techniques to sample only a few key frames for processing without compromising the application’s accuracy.” §intelligent frame selection ¶1);
(“Each frame in each video stream has some number of objects. Our system’s goal is to use the minimum amount of bandwidth while maximizing the number of query-specified objects (of interest in the scene) delivered to the cloud. A smart traffic-scheduling algorithm uploads frames from only the cameras that have video frames containing the greatest number of relevant objects to the user’s query.” §intelligent feed selection ¶3, note: the system can be in the configuration profile for detecting vehicles in this example.); and 
communicating target object detection data, wherein communicating the target object detection data (“The video pipeline optimizer converts high-level video queries to video-processing pipelines composed of many vision modules; for example, a video decoder, followed by an object detector and then an object tracker. Each module implements predefined interfaces to receive and process events (or data) and then sends its results downstream.” P61 §Rocket: Video analytics software stack ¶2) comprises at least one of the following: 
transmitting the at least one frame along with annotations associated with the detected at least one target object, wherein the annotations correspond to the at least one parameter (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1, Figure 4, P63 ¶2).

However, Ananth does not explicitly disclose: specifying at least one of the following: 
a species of the at least one detected target object, a sub-species of the at least one detected target object, a gender of the at least one target object, an age of the at least one target object, a health of the at least one target object, 
further specifying a score based on a character of the physical attributes for the at least on target object; and
Tracking based on the at least one frame and the annotations associated with the detected at least one target object the detected at least one target object as it moves from the content stream to another content stream.

Brust teaches: a species of the at least one detected target object, a sub-species of the at least one detected target object, a gender of the at least one target object, an age of the at least one target object, a health of the at least one target object (“We propose two heuristics: selecting the face with the highest detection score and selecting the bounding box with the largest area.” Fig 1), 
further specifying a score based on a character of the physical attributes for the at least on target object (“We propose two heuristics: selecting the face with the highest detection score and selecting the bounding box with the largest area.” P2826 §7.2 ¶2).
 (“Building effective detection and identification frameworks are only first steps towards integrating these computational tools into field practitioners day-to-day work. Whilst speeding up the processing of incoming photographic datasets and allowing for quicker identification of encountered individuals may be the main immediate purpose for using visual animal biometric systems, the availability of independent filtering and validation procedures for accuracy, misclassifications and completeness of encounters provides a further tool for building and maintaining socio-demographic datasets. In particular, the integration of spatially-explicit data from camera trap monitoring with capture-recapture or distance sampling approaches via animal biometric systems may provide an opportunity to generate important and conservation-relevant information on population status, trends and socio-ecology for Mbeli Bai as well as in other settings.” P2827 §8.3, see also §8.4 where individuals can be tracked between different spatial areas over time which is interpreted as separate content streams).

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses identifying target objects using profiles and Brust explicitly teaches the use of features and updating profiles. It would have been 



Regarding claim 2, Ananth discloses: The method of Claim 1, further comprising: retrieving the at least one target object profile from a database of learned target object profiles, wherein the at least one learned target object profile is associated with the at least one target object to detect, and wherein the database of learned target object profiles is associated with target objects that have been trained for detection within at least one frame of the content stream (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2).

 claim 3, Ananth discloses: The method of Claim 1, wherein communicating the target object detection data comprises communicating the target object detection data when the at least one parameter is met (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1, Figure 4, P63 ¶2, note: results are transmitted when there is sufficient accuracy).

Regarding claim 4, Ananth discloses: The method of Claim 1, further comprising: specifying at least one alert parameter, and communicating the target object detection data comprises communicating the target object detection data when the at least one parameter is met (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1).

Regarding claim 5, Ananth discloses an alert parameter, however, does not explicitly disclose: wherein specifying the at least one alert parameter comprises 

Brust teaches: wherein specifying the at least one alert parameter comprises defining at least one of minimum, maximum, and exact age of the at least one detected target object (Fig. 1).

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses issuing an alert when a parameter is met. Brust teaches specific parameters for identifying objects such as wildlife. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the parameters for alert as taught by Ananth with specific parameters for specific objects as taught by Brust to yield predictable results.

Regarding claim 6, Ananth discloses an alert parameter, however, does not explicitly disclose: wherein specifying the at least one alert parameter comprises defining at least one of minimum, maximum, and exact score of the at least one detected target object.

Brust teaches: wherein specifying the at least one alert parameter comprises defining at least one of minimum, maximum, and exact score of the at least one detected target object (Fig. 1).

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses issuing an alert when a parameter is met. Brust teaches specific parameters for identifying objects such as wildlife. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the parameters for alert as taught by Ananth with specific parameters for specific objects as taught by Brust to yield predictable results.

Regarding claim 9, Ananth discloses an alert parameter, however, does not explicitly disclose: wherein specifying the at least one alert parameter comprises defining at least one of the following: the species, the sub-species, and gender of the at least one detected target object.

Brust teaches: wherein specifying the at least one alert parameter comprises defining at least one of the following: the species, the sub-species, and gender of the at least one detected target object (Fig. 1).

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses issuing an alert when a parameter is met. Brust teaches specific parameters for identifying objects such as wildlife. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify 

Regarding claim 10, Ananth discloses an alert parameter, however, does not explicitly disclose: wherein specifying the at least one alert parameter comprises defining at least one of minimum, maximum, and exact confidence level of the at least one detected target object.

Brust teaches: wherein specifying the at least one alert parameter comprises defining at least one of minimum, maximum, and exact confidence level of the at least one detected target object (Fig. 1).

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses issuing an alert when a parameter is met. Brust teaches specific parameters for identifying objects such as wildlife. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the parameters for alert as taught by Ananth with specific parameters for specific objects as taught by Brust to yield predictable results.

Regarding claim 11, Ananth discloses: The method of Claim 1, further comprising: receiving the content stream from a content source, the content source comprising at least one of the following: a capturing device, and a uniform resource (“The video pipeline optimizer converts high-level video queries to video-processing pipelines composed of many vision modules; for example, a video decoder, followed by an object detector and then an object tracker. Each module implements predefined interfaces to receive and process events (or data) and then sends its results downstream.” P61 §Rocket: Video analytics software stack ¶2, Figure 1).

Regarding claim 12, Ananth discloses: The method of Claim 9, further comprising: defining at least one zone, wherein defining the at least one zone comprises: specifying at least one content source for association with the at least one zone, and specifying at least one content stream associated with the at least one content source, the specified at least one content stream to be processed by an Al engine for the at least one zone (Figs. 1, 3 and 4).

Regarding claim 13, Ananth discloses: The method of Claim 12, further comprising: specifying at least one alert parameter for each of a plurality of zones, and communicating the target object detection data comprises communicating the target object detection data when the at least one parameter is met for the at least one content stream associated with the at least one zone (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1).

Regarding claim 14, Ananth discloses: A non-transitory computer readable medium comprising a set of instructions which when executed by a computer perform a method, the method comprising: 
receiving a plurality of content streams (“The video pipeline optimizer converts high-level video queries to video-processing pipelines composed of many vision modules; for example, a video decoder, followed by an object detector and then an object tracker. Each module implements predefined interfaces to receive and process events (or data) and then sends its results downstream.” P61 §Rocket: Video analytics software stack ¶2, Figure 3&4); 
defining at least zone to be associated with each of the plurality of content streams (Figure 1, 3, 4, note: the circled area in the intersection is interpreted as a zone); 
specifying at least one target object to detect within each zone (Figs 3&4, “Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1); 
(“We profiled 300 parameter configurations (such as frame sampling, resolution, and implementation choices) and compared it to the ground truth of the tracks obtained using crowdsourcing. Note the vast spread in accuracy as well as the CPU demand and data rates (which represent network demands). For each of the configurations, if the allocated resource is less than the demand, the analytics cannot keep up with the video stream’s incoming rate.” P62 last ¶): 
analyzing a content stream within the at least one zone in accordance to at least one target object profile associated with the at least one target object specified to be detected within the at least one zone (“As mentioned earlier, the resource demands of certain vision algorithms can make running them on every frame prohibitively expensive. Fortunately, most video streams have temporal redundancy among frames, making it possible to sub-sample them without losing accuracy. We developed a suite of content-aware and application-aware techniques to sample only a few key frames for processing without compromising the application’s accuracy.” §intelligent frame selection ¶1); 
detecting the at least one target object within at least one frame of by matching aspects of the at least one frame to aspects of the at least one target object profile (“Each frame in each video stream has some number of objects. Our system’s goal is to use the minimum amount of bandwidth while maximizing the number of query-specified objects (of interest in the scene) delivered to the cloud. A smart traffic-scheduling algorithm uploads frames from only the cameras that have video frames containing the greatest number of relevant objects to the user’s query.” §intelligent feed selection ¶3, note: the system can be in the configuration profile for detecting vehicles in this example.); and 
communicating target object detection data, wherein communicating the target object detection data comprises (“The video pipeline optimizer converts high-level video queries to video-processing pipelines composed of many vision modules; for example, a video decoder, followed by an object detector and then an object tracker. Each module implements predefined interfaces to receive and process events (or data) and then sends its results downstream.” P61 §Rocket: Video analytics software stack ¶2):
transmitting the at least one frame along with annotations associated with the detected at least one target object, wherein the annotations correspond to the at least one parameter (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1, Figure 4, P63 ¶2).

However, Ananth does not explicitly disclose: specifying at least one of the following: 

further specifying a score based on a character of the physical attributes for the at least on target object.
Brust teaches: a species of the at least one detected target object, a sub-species of the at least one detected target object, a gender of the at least one target object, an age of the at least one target object, a health of the at least one target object (“We propose two heuristics: selecting the face with the highest detection score and selecting the bounding box with the largest area.” Fig 1), 
further specifying a score based on a character of the physical attributes for the at least on target object (“We propose two heuristics: selecting the face with the highest detection score and selecting the bounding box with the largest area.” P2826 §7.2 ¶2); and

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses identifying target objects using profiles and Brust explicitly teaches the use of features and updating profiles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the object profiles as taught by Ananth with features and updating as taught by Brust to yield predictable results.

claim 15, Ananth discloses The computer readable medium of Claim 14, further comprising processing the at least one detected target object through a neural net for a detection of learned features associated with the at least one detected target object (“Rocket, our video analytics software stack. The video pipeline optimizer converts video queries to pipelines of vision modules invoking deep neural network (DNN) models, along with estimating the resource–accuracy profiles of the pipelines. The pipelines and their corresponding profiles are then passed on to the resource manager, which executes them over the geo-distributed collection of edges, private clusters, and the cloud.” Fig 2), determine at least one of the following:
the sub-species of the at least one detected target object, the gender of the at least one detected target object, the age of the at least one detected target object, the health of the at least one detected target object, the score for the at least one detected target object (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2, note: accuracy in this implementation is interpreted as a score for the target object. Applicant may choose to remove the alternative language to overcome this interpretation.).

Ananth does not explicitly disclose: wherein the learned features are specified by the at least one learned target object profile associated with the at least one detected target object 
update the at least one learned target object profile with the detected learned features.

However, Brust teaches: wherein the learned features are specified by the at least one learned target object profile associated with the at least one detected target object (“Our identification pipeline consists of two sequential components (see Fig. 2): first, a detector based on the YOLO model [60] detects and locates gorilla faces in images. In a second step, each candidate face region is processed up to the pool5 layer of the BVLC AlexNet Model [28] for feature extraction, before a linear SVM [11] component trained on facial reference images of the gorilla population performing classification of the extracted features to yield a ranked list of identification proposals.” §6 ¶1)
the species of the at least one detected target object, the gender of the at least one detected target object, the age of the at least one detected target object Fig 1,
update the at least one learned target object profile with the detected learned features (“The results presented exemplify that deep learning pipelines constructed for a biometric entity, species and setup (e.g. chimpanzee identification on bounding box labelled face images [16]) open up new possibilities to transfer both system design and parameterisation parts across to similar species and application scenarios (e.g. gorilla facial identification without bounding box information).” P2826 §8.1 ¶1)

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses identifying target objects using profiles and Brust explicitly teaches the use of features and updating profiles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the object profiles as taught by Ananth with features and updating as taught by Brust to yield predictable results.

Regarding claim 16, Ananth discloses: The computer readable medium of Claim 14, further comprising: retrieving the at least one target object profile from a database of learned target object profiles, wherein the at least one learned target object profile is associated with the at least one target object to detect, and wherein the database of learned target object profiles is associated with target objects that have been trained for detection within at least one frame of the content stream (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2).

Regarding claim 17, Ananth discloses: The computer readable medium of Claim 14, wherein communicating the target object detection data comprises communicating the target object detection data when the at least one parameter is met (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1, Figure 4, P63 ¶2, note: results are transmitted when there is sufficient accuracy).

Regarding claim 18, Ananth discloses: The computer readable medium of Claim 14, further comprising: specifying at least one alert parameter, and communicating the target object detection data comprises communicating the target object detection data when the at least one parameter is met  (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1).

Claims 19, 20, 23, and 24 are largely the same subject matter as claims 5, 6 9 and 10 and are rejected under the same reasoning.

Regarding claim 25, Ananth discloses: The computer readable medium of Claim 17, further comprising: receiving the content stream from a content source, the content source comprising at least one of the following: a capturing device, and a uniform resource locator (“The video pipeline optimizer converts high-level video queries to video-processing pipelines composed of many vision modules; for example, a video decoder, followed by an object detector and then an object tracker. Each module implements predefined interfaces to receive and process events (or data) and then sends its results downstream.” P61 §Rocket: Video analytics software stack ¶2, Figure 1).

Regarding claim 27, Ananth discloses: The method of Claim 9, further comprising: defining at least one zone, wherein defining the at least one zone comprises: specifying at least one content source for association with the at least one zone, and specifying at least one content stream associated with the at least one (Figs. 1, 3 and 4).

Regarding claim 28, Ananth discloses: A system comprised of a plurality of software modules, the system comprising: 
at least one end-user device module configured (Fig 4) to: 
select from a plurality of content sources for providing a content stream associated with each of the plurality of content sources (“The video pipeline optimizer converts high-level video queries to video-processing pipelines composed of many vision modules; for example, a video decoder, followed by an object detector and then an object tracker. Each module implements predefined interfaces to receive and process events (or data) and then sends its results downstream.” P61 §Rocket: Video analytics software stack ¶2, Figs 1, 3 and 4), 
specify at least one zone for each selected content source (Figure 1, 3, 4, note: the circled area in the intersection is interpreted as a zone), 
specify at least one content source for association with the at least one zone (fig 4, one camera (content source) has an associated zone), and 
specify a first zone detection parameter, wherein the first zone parameter is specifying at least one target object from a plurality of selectable target object designations for detection within the at least one zone, the target object designations being associated with a plurality of learned target object profiles trained by the Al engine (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2); and 
an analysis module configured to:
process at least one frame of the content stream for a detection of [data] associated with the at least one target object (“As mentioned earlier, the resource demands of certain vision algorithms can make running them on every frame prohibitively expensive. Fortunately, most video streams have temporal redundancy among frames, making it possible to sub-sample them without losing accuracy. We developed a suite of content-aware and application-aware techniques to sample only a few key frames for processing without compromising the application’s accuracy.” §intelligent frame selection ¶1),
detect the at least one target object within at least one frame of by matching aspects of the at least one frame to aspects of the at least one target object profile (“Each frame in each video stream has some number of objects. Our system’s goal is to use the minimum amount of bandwidth while maximizing the number of query-specified objects (of interest in the scene) delivered to the cloud. A smart traffic-scheduling algorithm uploads frames from only the cameras that have video frames containing the greatest number of relevant objects to the user’s query.” §intelligent feed selection ¶3, note: the system can be in the configuration profile for detecting vehicles in this example.) 


However, Ananth does not explicitly disclose: learned features
wherein the learned features are specified by at least one learned target object profile associated with the at least one target object.
determine, based on the processing, at least one of the following attributes of the at least one detected target object:
a species of the at least one detected target object, a sub-species of the at least one detected target object, a gender of the at least one target object, an age of the at least one target object, a health of the at least one target object, 
further specifying a score based on a character of the physical attributes for the at least on target object; and
Tracking based on the at least one frame and the annotations associated with the detected at least one target object the detected at least one target object as it moves from the content stream to another content stream.

Brust teaches: 
learned features (“Our identification pipeline consists of two sequential components (see Fig. 2): first, a detector based on the YOLO model [60] detects and locates gorilla faces in images. In a second step, each candidate face region is processed up to the pool5 layer of the BVLC AlexNet Model [28] for feature extraction, before a linear SVM [11] component trained on facial reference images of the gorilla population performing classification of the extracted features to yield a ranked list of identification proposals.” §6 ¶1)
wherein the learned features are specified by at least one learned target object profile associated with the at least one target object (“Our identification pipeline consists of two sequential components (see Fig. 2): first, a detector based on the YOLO model [60] detects and locates gorilla faces in images. In a second step, each candidate face region is processed up to the pool5 layer of the BVLC AlexNet Model [28] for feature extraction, before a linear SVM [11] component trained on facial reference images of the gorilla population performing classification of the extracted features to yield a ranked list of identification proposals.” §6 ¶1),
a species of the at least one detected target object, a sub-species of the at least one detected target object, a gender of the at least one target object, an age of the at least one target object, a health of the at least one target object (“We propose two heuristics: selecting the face with the highest detection score and selecting the bounding box with the largest area.” Fig 1), 
further specifying a score based on a character of the physical attributes for the at least on target object (“We propose two heuristics: selecting the face with the highest detection score and selecting the bounding box with the largest area.” P2826 §7.2 ¶2); and
 (“Building effective detection and identification frameworks are only first steps towards integrating these computational tools into field practitioners day-to-day work. Whilst speeding up the processing of incoming photographic datasets and allowing for quicker identification of encountered individuals may be the main immediate purpose for using visual animal biometric systems, the availability of independent filtering and validation procedures for accuracy, misclassifications and completeness of encounters provides a further tool for building and maintaining socio-demographic datasets. In particular, the integration of spatially-explicit data from camera trap monitoring with capture-recapture or distance sampling approaches via animal biometric systems may provide an opportunity to generate important and conservation-relevant information on population status, trends and socio-ecology for Mbeli Bai as well as in other settings.” P2827 §8.3, see also §8.4 where individuals can be tracked between different spatial areas over time which is interpreted as separate content streams)

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses identifying target objects using profiles and Brust explicitly teaches the use of features and updating profiles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the 

Regarding claim 30, Ananth discloses:
The system of Claim 28, further comprising: 
an Al module (Fig 4) configured to: 
match aspects of the content stream to at least one learned target object profile in a database of the plurality of learned target object profiles trained by the Al engine to detect target objects within the content (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2), and 
upon a determination that at least one of the detected target objects corresponds to the at least one learned target object profile: 
classify the at least one detected target object based on the at least one learned target object profile (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2, Fig 4. “object classifier”), and 
determine whether the at least one detected target object corresponds to at least one of the target object designations associated with the zone specified at the end-user device (“Generating the resource–accuracy profile is challenging. Unlike SQL queries, there are no well-known analytical models to capture resource–accuracy relationships as they often depend on the specific camera view. Figure 3b shows how different object detection implementations can change efficacy in different cameras. Likewise, reducing video resolution might not reduce the accuracy of a license plate reader if the camera is zoomed in enough, but it would otherwise impact accuracy. Therefore, we need to generate the profile using a labeled dataset of ground truths for each video query pipeline for each camera. However, exhaustively running through all the configurations can be prohibitive. For instance, generating the profile for a license plate query consumed 20 CPU days.” P63 ¶2, Figs 3 and 4), and 
(“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1).

However, Ananth does not explicitly disclose: update the at least one learned target object profile with at least one aspect of the at least one detected target object; 
update the learned target object profile with the detected learned features.

update the at least one learned target object profile with at least one aspect of the at least one detected target object (“Our identification pipeline consists of two sequential components (see Fig. 2): first, a detector based on the YOLO model [60] detects and locates gorilla faces in images. In a second step, each candidate face region is processed up to the pool5 layer of the BVLC AlexNet Model [28] for feature extraction, before a linear SVM [11] component trained on facial reference images of the gorilla population performing classification of the extracted features to yield a ranked list of identification proposals.” Brust §6 ¶1, “The results presented exemplify that deep learning pipelines constructed for a biometric entity, species and setup (e.g. chimpanzee identification on bounding box labelled face images [16]) open up new possibilities to transfer both system design and parameterisation parts across to similar species and application scenarios (e.g. gorilla facial identification without bounding box information).” P2826 §8.1 ¶1); 
update the learned target object profile with the detected learned features (“The results presented exemplify that deep learning pipelines constructed for a biometric entity, species and setup (e.g. chimpanzee identification on bounding box labelled face images [16]) open up new possibilities to transfer both system design and parameterisation parts across to similar species and application scenarios (e.g. gorilla facial identification without bounding box information).” P2826 §8.1 ¶1).

Ananth and Brust are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses identifying target objects using profiles and Brust explicitly teaches the use of features and updating profiles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the object profiles as taught by Ananth with features and updating as taught by Brust to yield predictable results.

Claims 7 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Ananth in view of Brust and further in view of Christin et al. (Applications for Deep Learning in Ecology).



Christin teaches: wherein specifying the at least one alert parameter comprises defining a diseased status of the at least one detected target object (“Detecting symptoms of diseases is a large potential provided by deep learning. For example, CNNs already help detect plant diseases in olive trees41, cassavas (Manihot esculenta) 42 or various crops43. While the primary use has been directed towards agricultural applications, this could also be widely applied to wild plant and animal populations to help find hints of scars, malnutrition or the presence of visible diseases like mange.” P7 lines 143-147).

Ananth, Brust and Christin are both in same field of endeavor of using neural networks to detect objects, identify attributes of those objects and annotate frames including the object and are analogous. Ananth discloses issuing an alert when a parameter is met. Christin teaches specific parameters regarding a disease status. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the parameters for alert as taught by Ananth with specific parameters for disease status as taught by Christin to yield predictable results. Christin provides motivation as deep learning can be used to find signs of disease for ecological applications (Abstract, P7 lines 143-147).
 
Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Ananth in view of Brust and further in view of Ashani (US 20160139977 A1).

Regarding claim 29, Ananth discloses: The system of Claim 28, wherein the at least one end-user device module is further configured to: specify at least one alert parameter from a plurality of alert parameters for the at least one zone, wherein the alert parameters comprise: triggers for an issuance of an alert, recipients that receive the alert, actions to be performed when an alert is triggered (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1),
receive the alert from the Al engine, and display the detected target object related data associated with the alert (“Our traffic analytics solutions based on the Rocket software stack (shown in Figure 2) are being actively deployed. Since December 2016, a multimodal object counter has run 24/7 using live traffic cameras in Bellevue, Washington, to help the city understand and track volumes of cars, pedestrians, and bikes. The system raises alerts on anomalous traffic patterns used by traffic control operators.” P65 §traffic video analytics ¶1, Fig 4), 
wherein the detected target object related data comprises at least one frame from the at least one content stream (“As mentioned earlier, the resource demands of certain vision algorithms can make running them on every frame prohibitively expensive. Fortunately, most video streams have temporal redundancy among frames, making it possible to sub-sample them without losing accuracy. We developed a suite of content-aware and application-aware techniques to sample only a few key frames for processing without compromising the application’s accuracy.” §intelligent frame selection ¶1).

However, Ananth does not explicitly disclose: and restrictions on issuing the alert.
Ashani teaches: restrictions on issuing the alert (“Additionally, the present invention provides a unique thresholding technique enabling accurate abnormality detection for control over a certain number of detected abnormal data pieces.  The thresholding technique may utilize statistical parameters of input data pieces, as well as flow rate of input data to determine an abnormality threshold to be used by at least one of the modules, and preferably all of the modules in the cascade.  For example, the system may be configured to generate a desired rate of abnormality alerts to thereby avoid alarm overflowing, e.g. 2-20 alert per day.  To this end the abnormality detection system may set an appropriate abnormality detection threshold in accordance with the desired average (or maximal) alerts per day and an average input rate of data pieces.” [0006]).

.

Response to Arguments
Applicant’s arguments, see Items I and III on p11-12, filed 04 March, 2021, with respect to the objection of claim 14 and rejection of claims 1-30 under 35 USC 112b have been fully considered and are persuasive.  The objection of claim 14 and rejection of claims 1-30 under 35 USC 112b has been withdrawn. 
Applicant’s arguments, see Item II on p11-12, filed 04 March, 2021 with respect to the rejection of claims 14-25 under 35 USC 101 for signal per se is persuasive and has been withdrawn. 
Applicant’s arguments, see Item II on p11-12, filed 04 March, 2021 with respect to the rejection of claims 28-30 under 35 USC 101 for software per se has been maintained. Software per se is not a patent eligible statutory category. Please see MPEP 2106.I, “Non-limiting examples of claims that are not directed to one of the statutory categories:

Applicant has not explicitly argued the abstract idea rejection beyond a cursory statement that the claims have been amended. However, upon further review examiner has reconsidered the rejection for claims 2, 16, 17 and 30 and withdrawn the abstract idea rejection for those claims.
Applicant’s arguments, see Items IV and V on p12-16, filed 04 March, 2021 with respect to the rejection of claims 1-6, 9-20, 23-28 and 30 under 25 USC 102(a)(1) and 103 have been considered and are persuasive. However, a new grounds of rejection under Ananth and Brust has been made necessitated by the amendments. Please see the above rejections for details.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC NILSSON whose telephone number is (571)272-5246.  The examiner can normally be reached on M-F: 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/ERIC NILSSON/Primary Examiner, Art Unit 2122