DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “A video analytics system having metadata based anomaly detection to detect … a metadata anomaly detection module configured to receive… an instantaneous metrics extraction module configured to sequentially receive …  a statistical model update module configured to sequentially receive… and an anomaly formulation module configured to sequentially receive in claim 1 (and by dependency claims 2-20).

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, and 6-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hogg et al. (US 20160042621 A1) in view of Alcock et al. (IDS: US 20180285633 A1).
Regarding claim 1, Hogg et al. disclose a video analytics system having metadata based anomaly detection to detect an anomaly within a scene of a video based on metadata associated with corresponding frames of the video, the video analytics system comprising: a metadata anomaly detection module configured to receive, for each of a plurality of frames of a first video, corresponding target-related metadata (metadata about moving objects in the video is generated in step with the video, [0051], The approach as described thus far works well if, for example, each person that walks up the front pathway stays on main part of the pathway.  In practice, some people don't walk down the middle of the path, but instead cut corners.  Similarly, someone stepping momentarily on your front lawn to let a car pass would trigger an unwanted notification.  In both examples, you would not want to be notified about a minor incursion, [0118], intrusion factor, [0149], car exceeds speed, [0257]), the target-related metadata including, for each target identified by the target-related metadata in a particular frame of a plurality of frames of the first video: target classification identifying a type of the target, target location identifying a location of the target, and a first target feature of the target (“One example of metadata generated by the video analytics processor, but not limited to, is the size and position of any moving object(s) detected.  In the example in FIG. 1, a delivery person 001 has been detected moving across the field of view of the camera by the video analytics processor and metadata has been generated that describes the delivery person as an object in terms of a rectangular box with width and height located at a specific location in the camera's field of view.  This metadata is then illustrated in the image in FIG. 1 by a white rectangle outline 002 using the height, width and x,y location position description of the object as determined by the video analytics processor”, [0055], “The video analytics processor determined that an object had moved in to the field of view and generated metadata describing the moving object detected.  In FIG. 3A, the position and size of the detected object is shown 
metadata and any other source could also be used to update the learning map”, [0076],  accumulate information from past motion events, which is then used to analyze or compare information from a new motion event, [0081]-[0084], statistics about object, determine car exceeds speed, [0257]);  and an anomaly formulation module configured to sequentially receive 

While Hogg et al. describes using statistics and the creation and updating of learning maps likely necessitate using a statistical model, another reference is provided to further teach a statistical model update module configured to sequentially receive the instantaneous metadata metrics associated with different frame sets of the first video from the instantaneous metrics extraction model, and to provide statistical models derived from the instantaneous metadata metrics associated with the different frame sets of the first video for each of the plurality of cells dividing the scene of the first video.

Alcock et al. teach a statistical model update module configured to sequentially receive the instantaneous metadata metrics associated with different frame sets of the first video from the instantaneous metrics extraction model, and to provide statistical models derived from the instantaneous metadata metrics associated with the different frame sets of the first video for each of the plurality of cells dividing the scene of the first video (“In this embodiment, the recorded videos 925 are stored with associated metadata of unusual motions detected in the 

Hogg et al. and Alcock et al. are in the same art of using metadata to analyze video (Hogg et al., [0051]; Alcock et al., [0062]). The combination of Alcock et al. with Hogg et al. will enable the use of a statistical model. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the statistical model of Alcock et al. with the invention of Hogg et al. as this was known at the time of filing, the combination would have predictable results, and as Alcock et al. indicate, “It would assist such security personnel if alerts or indications are generated in real-time, when events of interest, such as unusual motions, are detected” ([0004]) which the statistics help with ([0006]), indicating the efficiency improvement to the surveillance application described by Hogg et al..

Regarding claim 2, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Hogg et al. and Alcock et al. further disclose the instantaneous metrics extraction module is configured to generate at the cell level, with respect to each of the different frame sets, a corresponding first instantaneous metadata metric reflecting its most recent value within the timeline of the first video (Hogg et al., video analytics processor operates in real or near real-
 
Regarding claim 4, Hogg et al. and Alcock et al. disclose the video analytics system of claim 2. Hogg et al. and Alcock et al. further disclose the first target feature comprises speed and the first instantaneous metadata metric of each of the different frame sets represents speeds of a first target type in each cell in a most recent predetermined interval within the timeline of the first video (Hogg et al., provide a more detailed description of objects detected including 
properties such as but not limited to speed, velocity, acceleration, [0056], [0074], ten second motion event, finds speed, [0078], speed of an object travelling between two distances, [0253] collect speed statistics on any object driving by over a minimum speed, [0257]; Alcock et al., The pattern interval #1 (750) may be, for example, the morning rush hour (8-9 am) and the evening rush hour (5-6 pm), which share similar activity or motion patterns if motion direction is not of interest, but speed of motion is of interest, [0106], The system learns the motion probability histograms for each cell, one for motion direction and another for motion speed.  If the probability of current motion direction or motion speed for a cell is lower than a pre-defined threshold, the current motion is treated as an anomaly, i.e. unusual for that cell, in which case the cell is considered an unusual motion block, [0114], The time intervals or time periods of the statistical intervals and pattern intervals may be selected using another interface (not shown).  The search results 920 may further be filtered by activities, for example unusual speed 940, [0116], speed, [0158]) [recent time interval also described above in claim 2]

Regarding claim 6, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Hogg et al. and Alcock et al. further disclose the instantaneous metadata metrics associated with a first frame set of the different frame sets comprises, at the cell level, the first target feature for each instance of several different target types present in each cell within a first predetermined duration preceding the frame corresponding to the first frame set (Hogg et al., 

Regarding claim 7, Hogg et al. and Alcock et al. disclose the video analytics system of claim 6. Hogg et al. and Alcock et al. further disclose the first target feature is one of target location, target velocity, target trajectory, target speed, target size, target orientation, target appearance and target disappearance (Hogg et al., a description of the size and position of the detected object(s), [0009], size and position of any moving object(s) detected, a delivery person 001 has been detected moving across the field of view of the camera by the video analytics processor and metadata has been generated that describes the delivery person as an object in terms of a rectangular box with width and height located at a specific location in the camera's field of view.  This metadata is then illustrated in the image in FIG. 1 by a white rectangle outline 002 using the height, width and x,y location position description of the object as determined by the video analytics processor, [0055], detailed description of objects detected including properties such as but not limited to speed, velocity, acceleration, colour, temperature, texture, or position in the third axis if a 3D camera were used.  Additional information generated by the video analytics processor could also include a more accurate object size description using more 
Regarding claim 8, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Hogg et al. further disclose the instantaneous metadata metrics associated with a first frame set of the different frame sets comprises, at the cell level, the first target feature for each instance of a first target type present in each cell within a first predetermined duration corresponding to the first frame set, and wherein the first target feature describes a relationship of each of instance of a first target to other features and/or events identified in the video (“An embodiment of this invention is that with the exception of flying and hovering objects, there exists a one to one relationship between the lower edge of a detected object and its placement in the scene being captured by the camera's field of view.  This relationship allows the description and characterization of moving objects in a specific location in the camera's image frame to be used as a basis for comparison with other objects detected to be moving at that same location in the camera's image frame without specific knowledge of the scene being observed.  Hence, an advantageous aspect of this invention is that the camera's monitoring and learning algorithms do not require knowledge of the scene being monitored.  One example of 
 
Regarding claim 9, Hogg et al. and Alcock et al. disclose the video analytics system of claim 8. Alcock et al. further teach the first target feature is one of object ported by target, object left behind by target, target entering, target exiting, target loitering, target lying down, target running, target walking and target waiting in queue (Other examples of determinations made by the video analytics module 224 may include one or more of foreground/background segmentation, object detection, object tracking, motion detection, object classification, virtual tripwire, anomaly detection, facial detection, facial recognition, license plate recognition, identifying objects "left behind", monitoring objects (i.e. to protect from stealing), and business intelligence, [0070]).

Regarding claim 10, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Alcock et al. further teach the metadata anomaly detection module is configured to detect all anomalies in the scene of the first video based only on analysis of the received target-related metadata (The search for unusual motion may only be a database search of the metadata instead of a time consuming processing of the video for the search results, [0119]).
 
Regarding claim 11, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Alcock et al. further teach the metadata anomaly detection module is configured to detect all anomalies in the scene of the first video without analysis of images of the first video (In this embodiment, the recorded videos 925 are stored with associated metadata of unusual motions detected in the video and their time of detection.  The search for unusual motion may only be a database search of the metadata instead of a time consuming processing of the video for the search results, [0119]).

Regarding claim 12, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Alcock et al. further teach the instantaneous metrics extraction module provides at least some of the received target-related metadata as instantaneous metadata metrics to the anomaly formulation module, and wherein the anomaly formulation module is configured to compare, at the cell level, target-related metadata with the with the statistical models provided by the statistical model update module to detect an anomaly in the scene of the first video (Statistical models of these example activities or motions (direction, magnitude, presence, and absence) are created (learned) over time from the motion vectors.  For a given example activity, a probability can be provided from the statistical model to indicate how common or uncommon is a given activity, [0090], If the day in the learning period is "the day" then the 

Regarding claim 13, Hogg et al. and Alcock et al. disclose the video analytics system of claim 12. Hogg et al. and Alcock et al. further disclose the anomaly formulation module is configured to identify an anomalous target as a target associated with target-related metadata responsible for the detection of an anomaly by the anomaly formulation module (Hogg et al., mathematical model consisting of an array of cells, or learning map, is used to describe the motion of any object(s) detected by the camera.  When an object(s) is detected, its positional location(s) for a period of time, or motion event, is recorded in a learning map.  This learning map is then compared to a reference learning map where the camera determines whether to alert the user or not that an object of interest was detected, abstract, “This invention anticipates that using multi-dimensional learning maps or multiple learning maps may also be advantageous.  This 


Regarding claim 14, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Hogg et al. and Alcock et al. further indicate for at least a first target identified by the target-related metadata, the instantaneous metrics extraction model is configured to estimate a path of the first target from target-related metadata of the first target, and to associate target-related metadata of the first target for cells through which the path of the first target extends (Hogg et al., create a motion event learning map that describes the path that the vehicle took, [0106], To continue this example, when a second person walks up the pathway (using the same example illustrated in FIG. 1), but takes a slightly different route, the resulting second motion event learning map would have a slightly different described path than the first motion event learning map, [0116], “The approach as described thus far works well if, for example, each person that walks up the front pathway stays on main part of the pathway.  In practice, some people don't walk down the middle of the path, but instead cut corners.  Similarly, someone stepping momentarily on your front lawn to let a car pass would trigger an unwanted notification.  In both examples, you would not want to be notified about a minor incursion.  However it would also not be desirable to mark off part of the lawn as belonging to the road or pathway.  Thus a means is required to determine to what degree a motion event occurred inside an area of interest and respond appropriately.  For example, if a person took twenty steps up a pathway and stepped on the lawn once, it would be reasonable to not notify the user since the vast majority of time the person stayed on the walkway as you would prefer”, [0118]; Alcock et al., “An approach in detecting anomalies is to learn a statistical model based on features.  Features are information such as motion vectors, optical flow, detected object 

Regarding claim 15, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Hogg et al. further indicate for at least a first target identified by the target-related metadata, the instantaneous metrics extraction model is configured: to estimate a path of the first target based upon a first target location of the first target within a first cell and a second target location of the first target within a second cell, the first and second target locations being identified in the target-related metadata received by the instantaneous metrics extraction model respectively associated with first and second frames of the plurality of frames, and to estimate a third target location of the first target within a third cell based on the estimated path of the first target (weighted master learning map in FIG. 11 after updating it with a third motion event learning map where a person walking up took yet another slightly different route, [0030], For example, a homeowner may want to be notified whenever someone walks up the front pathway, while a security company may only want to be notified when someone walks off the pathway, [0085], camera would still alert the user if a car drove by in the other lane, a pedestrian walked by on the far sidewalk or if a neighbor across the street were to drive up in to their own driveway, the user would update the master learning map every time a car or person passed by on or across the street is a fashion that wasn't previously captured, [0105], becomes less likely that someone walking up the path would step on a region not already marked as being on the pathway in the master learning map after each successive learning episode, learning from user's responses to viewing additional motion events, [0116], “The approach as described thus far works well if, for example, each person that walks up the front 

Regarding claim 16, Hogg et al. and Alcock et al. disclose the video analytics system of claim 15. Hogg et al. further indicate the estimation of the third target location within the third cell is used by the instantaneous metrics extraction module to generate instantaneous metadata metrics associated with the third cell (FIG. 10 is a graphical representation of a weighted master learning map with spatial coordinates aligned to a camera with a field of view shown in 
FIG. 1 after updating for the motion event learning map shown in FIG. 4, where the value of each cell in the weighted master learning map has been increased by a value of one where its 
 
Regarding claim 17, Hogg et al. and Alcock et al. disclose the video analytics system of claim 16. Hogg et al. further indicate the first target feature of the first target associated with the first frame and the first target feature of the first target associated with the second frame are used to estimate the first target feature of the first target within the third cell (“In an alternate embodiment of this invention, a sample set of measurements may be used to interpolate and extrapolate the appropriate value for all positions in the small object learning map.  For example, apparent size measurements of the dog in the example at the same distances from the camera or positions on the same learning map row would have the same apparent size.  Thus one embodiment would have one apparent size measurement being used for the value of all cells in a small object learning map row.  In an alternate embodiment, the apparent size of 
 
Regarding claim 18, Hogg et al. and Alcock et al. disclose the video analytics system of claim 17. Hogg et al. further indicate the first target feature of the first target within the third cell is estimated from an interpolation of at least the first target feature associated with the first frame and the first target feature associated with the second frame (“In an alternate embodiment of this invention, a sample set of measurements may be used to interpolate and extrapolate the appropriate value for all positions in the small object learning map.  For example, apparent size measurements of the dog in the example at the same distances from the camera or positions on the same learning map row would have the same apparent size.  Thus one embodiment would have one apparent size measurement being used for the value of all cells in a small object learning map row.  In an alternate embodiment, the apparent size of an object at different locations on a learning map could be calculated by taking two measurements of the same object's apparent size at two different locations and interpolating values using a linear or other arithmetic function between the two measured points”, [0188]).
 
Regarding claim 19, Hogg et al. and Alcock et al. disclose the video analytics system of claim 18. Hogg et al. further indicate the first target feature comprises speed (For example, a car could be driven down the street in front of a house at a constant speed, The speed of an object travelling between two distances from the camera could be interpolated from the two calibration points similar to calculating apparent size of an object as previously disclosed, 

Regarding claim 20, Hogg et al. and Alcock et al. disclose the video analytics system of claim 16. Hogg et al. and Alcock et al. further indicate the instantaneous metrics extraction module associates the target classification of the first target with the third cell upon estimating the first target being located within the third cell (generated data about moving objects detected in the video frame is often referred to as metadata or data derived from data, [0051], When a motion event occurs, a preferred embodiment of this invention has the camera making a recording of the streaming video and associated metadata generated by the camera for the period of the motion event, as well as other information generated by the camera and associating them together under a common motion event record, [0059], Thus each cell in the motion event learning map would have a number recorded in it that is associated with the number of video frames an object was detected in that location, [0080]; Alcock et al., Temporal filtering may be used to check consistency of the motion vectors in location and time and to filter out the 
motion vectors that are noise and do not correspond to real moving objects, [0136]) [motion event vs not motion event of Hogg and real moving object vs noise interpreted as the target classification claimed].

 Claims 3 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hogg et al. (US 20160042621 A1) and Alcock et al. (IDS: US 20180285633 A1) as applied to claims 1 and 2 above, further in view of Wang et al. (IDS: US 20160133025 A1).

Regarding claim 3, Hogg et al. and Alcock et al. disclose the video analytics system of claim 2. Hogg et al. and Alcock et al. further imply the first metric of each of the different frame sets represents how many people were present in each cell in a most recent predetermined interval within the timeline of the first video (Hogg et al., Pedestrians walking up the home's walkway, along the side path and down the user's own driveway were identified as walking along a Pathway and denoted by a `P` 093 on the master learning map, [0109], monitoring automotive or pedestrian traffic flow, [0257]; Alcock et al., Other filters that may be selected include: crowd gathering, filter the displayed results by classification, for example as a person or vehicle, [0158]), however, another reference is added to make this limitation more explicit.

Wang et al. teach the first metric of each of the different frame sets represents how many people were present in each cell in a most recent predetermined interval within the timeline of the first video (“According to another aspect of the present invention, a method for detecting crowd density includes projecting a depth image obtained by photographing onto a height-top-view, the depth image including a crowd; dividing the height-top-view into cells with a predetermined size; for each cell, extracting a density detection feature indicating distribution of differences in height between pixels in the cell; and detecting, based on the density detection feature, using a density model previously constructed by a statistical learning method, number of persons in each of the cells”, [0009], number of persons in each cell is detected based on the density detection feature, using a density model previously constructed by a statistical learning method, [0047], [0055], moving speed of the crowd in the cell, [0100] [speed in Wang indicates relative to time, and Alcock above also teaches determining a parameter per set time interval]).



Regarding claim 5, Hogg et al. and Alcock et al. disclose the video analytics system of claim 1. Hogg et al. and Alcock et al. partly further teach the instantaneous metadata metrics associated with a first frame set of the different frame sets comprises, for each cell of the scene 

Wang et al. teach the instantaneous metadata metrics associated with a first frame set of the different frame sets comprises, for each cell of the scene of the first video and for each of several different target types, a number of each of the different target types present in each cell within a first predetermined duration corresponding to the first frame set (“In step S1301, the density detection feature (the LBP feature) is extracted in the cell, and the LBP code is calculated for pixels in the cell.  It should be noted that, the binary codes of different pixels in the cell may be the same.  Here, classification and statistical processing are performed for the 

Hogg et al. and Alcock et al. and Wang et al. are in the same art of detecting objects and motion (Hogg et al., abstract; Alcock et al., abstract; Wang et al., abstract). The combination of Wang et al. with Hogg et al. and Alcock et al. will enable the finding target types. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the counting of target types of Wang et al. with the invention of Hogg et al. and Alcock et al. as this was known at the time of filing, the combination would have predictable results, and as Wang et al. indicate, “The technology of detecting an interest degree of a crowd in a target position is widely used in many fields such as building internal layout, security monitoring, etc. For example, such technology is usually used in the field of building internal layout to detect an interest degree of a crowd of visitors in an exhibition item at a target position in an exhibition hall, by which an important basis for decision-making can be provided for a decision maker to rationally arrange display counters, perform crowd-control and efficiently utilize the space of the exhibition hall.  As another example, such technology is usually used in the field of security monitoring to detect an interest degree of a crowd in a target position of the security monitoring such as a train station or a government building, who gathers around the target position of the security monitoring; so that a basis for decision-making can be provided for a decision maker to determine whether such crowd might disturb public order (for example, hold a demonstration) or endanger public security” ([0004]) and in the described method the human body can be represented better and the crowd density can be accurately detected ([0041]) indicating the detection improvement to the surveillance application described by Hogg et al. and Alcock et al.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US 20210279475 A1 (The engine outputs a real-time stream of metadata from the pixel stream, the metadata describing the instantaneous attributes or characteristics of each object in the scene it has been trained to search for; real-time metadata stream to accompany the output video stream providing an index of video content frame by frame; creation of a Track Record, which is the reformatting of real-time metadata into a per-object (per-person) record of their trajectory; pose (and, possibly, identity).  The Track Records are stored in a MySQL-type database, optionally correlated with a video database [0652]); US 20070248244 A1 (An object processing unit divides each image frame of received image data into block areas, extracts an object from each image frame, and extracts, as metadata, the features of this object for each block area, abstract; intrusion surveillance system, [0004], video source, [0038], object metadata, traveling path, [0043], holding metadata about the traveling path information, information about the traveling direction, the traveled distance, and the travel time is held in each element of an information array which corresponds to each of the plurality of block areas, [0045], [0110]; “In accordance with the present invention, there is provided an .
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084.  The examiner can normally be reached on 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661