DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 13 objected to because of the following informalities:  the claim recites "The method of .  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-5, 8, 10, 13-14, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Chen (US Patent 9123133 ) in view of Paul (US PG-Pub. 20200005468).
Regarding claim 1: 
Chen discloses: a surveillance system (FIG. 3) comprising: 
a video camera (column 8, lines 57-62; “The recording platform 350 may include a video source 301 which is an apparatus for capturing videos or motion pictures such as one or more surveillance cameras.”); 
a processing resource (column 9, lines 19-25; “The processing unit 312”); 
a non-transitory computer-readable medium, coupled to the processing resource, having stored therein instructions that when executed by the processing resource (column 9, lines 19-25; “The storage unit 314”) cause the processing resource to: 
receive a plurality of video frames captured by the video camera (FIG. 4, S402, column 9, lines 39-42; “Referring to both FIG. 3 and FIG. 4, the processing unit 312 of the moving object detection apparatus 310 may receive a time series of incoming frames of a fixed location (Step S402).”;
for each video frame of the plurality of video frames, partition a plurality of pixels of the video frame into a plurality of cells each representing an X x Y rectangular block of the plurality of pixels (FIG. 4, S406, column 10, lines 62-66; “the processing unit 312 may receive a current frame and partition the current frame into a plurality of current blocks (Step S406). The processing unit 312 may divide the current frame into N.times.N current blocks for eliminating unnecessary current blocks”); 
estimate background cells within a particular video frame of the plurality of video frames by comparing each of the plurality of cells of the particular video frame to a corresponding cell of the plurality of cells of one or more other video frames of the plurality of video frames (FIG. 4, S410, column 11, lines 2-5; “the processing unit 312 may classifying each of the current blocks as either a background block or a moving object block "; also see column 1, lines 54-58; “By the background subtraction technique, moving foreground objects would be able to be segmented from stationary or dynamic background scenes by comparing pixel differences between a current image and a reference background model of the previous image”); 
detect a number of regions of interest (ROIs) within the particular video frame by: identifying active cells within the particular video frame based on the estimated background cells (FIG. 4, S410, column 11, lines 2-5; “the processing unit 312 may classifying each of the current blocks as either a background block or a moving object block "; also see column 1, lines 54-58; “By the background subtraction technique, moving foreground objects would be able to be segmented from stationary or dynamic background scenes by comparing pixel differences between a current image and a reference background model of the previous image”);
Chen does not disclose:  identifying the number of clusters of cells within the particular video frame by clustering the active cells and detection to be performed within the number of ROIs by feeding the number of ROIs to a machine learning model.
However, in a related field, Paul teaches: and identifying the number of clusters of cells within the particular video frame by clustering the active cells (¶ [0036] “…The resulting clusters of events are pixel locations indicating motion and that are closely located on a frame [similar to active cells] but are still too disjoint to identify any particular object because the clusters are usually too small.”; ¶ [0045] “Process 300 may include “form cluster groups depending, at least in part, on the position of the clusters relative to each other on a grid of pixel locations [cells] forming the frames and without tracking all pixel locations forming the frames” 306.”; ¶ [0047] “While providing relatively cohesive areas of motion, the cluster groups still are not large enough to indicate object segments. Thus, the cluster groups are eventually formed into regions-of-interest (RoI).”); 
and cause object detection to be performed within the number of ROIs by feeding the number of ROIs to a machine learning model (¶ [0051] “Process 300 may include “provide the regions-of-interest to applications associated with object segmentation” 310. This may involve applications that finalize the object segmentation such as with DL algorithms that use the RoIs as input to neural networks and either confirm that an RoI is an object segment or group the RoIs to form an object segment. Other such applications that either may receive an RoI directly or from a finalizing object segmentation application may perform object recognition, such as those providing semantic labels, and object tracking that tracks the position of the segmented object from frame to frame”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen to incorporate the teachings of Paul by including: identifying the number of clusters of cells within the particular video frame by clustering the active cells and detection to be performed within the number of ROIs by feeding the number of ROIs to a machine learning model  in order to identify objects by merging neighboring blocks (clusters) since the blocks maybe too small to provide meaningful information regarding the object.

Regarding claim 4: 
Chen in view of Paul teaches: the limitations of claim 1 as applied above. 
Paul further teaches: wherein the instructions further cause the processing resource to prior to partitioning, preprocess the plurality of video frames (¶ [0068] “The process 600 may include “pre-process image data at least sufficiently for event-driven object segmentation” 604. This may include any pre-processing necessary to provide pixel intensity values (or other image values being used to perform the analysis) such as de-mosaicing, noise reduction, lens shading correction, and so forth. Such pre-processing may be provided for other image quality or performance reasons as well.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen to incorporate the teachings of Paul by including: wherein the instructions further cause the processing resource to prior to partitioning, preprocess the plurality of video frames in order to improve the image quality or for performance reasons as disclosed by Paul. Pre-processing images is a standard practice and is well-understood by a person of ordinary skill in the art.


Regarding claim 5: 
Chen in view of Paul teaches: the limitations of claim 4 as applied above. 
Chen in view of Paul does not specifically teach: wherein preprocessing of the plurality of video frames comprises for each video frame of the plurality of video frames: converting Red, Green, Blue (RGB) values to grayscale; performing image smoothing; and performing whitening.
However, prepressing an image including grayscale conversion, smoothing, and whitening is a standard practice in the field of image analysis, and one skilled in the art would choose the appropriate prepressing processes to have a more suitable image for the goal to be achieved by the main process such as object detection. 
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul to incorporate the teachings of Paul by including: wherein the instructions further cause the processing resource to prior to partitioning, preprocess the plurality of video frames and further to specifically include preprocessing the plurality of video frames comprises for each video frame of the plurality of video frames: converting Red, Green, Blue (RGB) values to grayscale; performing image smoothing; and performing whitening in order to improve the image quality or for performance reasons as disclosed by Paul. Pre-processing images is a standard practice and is well-understood by a person of ordinary skill in the art.


Regarding claim 8: 
Chen in view of Paul teaches: the limitations of claim 1 as applied above. 
While Chen further teaches: in column 1, lines 52-54; “a background Subtraction related technique has been a commonly used technique in video surveillance and target recognitions.”; and a person skilled in the art recognizes that a target recognition when using surveillance systems may include facial recognition to identify people in the monitored area. 
And, Paul further teaches: ¶ [0111] “Process 600 may include “recognize objects” 662, where objects may be recognized semantically, such as people versus vehicles, and so forth. This application also either may use the RoIs directly or the refined object segments.”
Chen in view of Paul does not specifically teach: wherein the object detection comprises facial recognition.
However, considering that the use of surveillance system and identifying people is taught by Chen and Paul, facial recognition becomes a matter of obviousness to one of ordinary skill in art. 
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul to incorporate the teachings of Paul by including: wherein the object detection comprises facial recognition in order to identify the people being tracked by the surveillance system to verify whether the person belongs to the area where is the monitoring system is installed the person is an intruder for example.
Regarding claims 10 and 18: the claims limitations are similar to those of claim 1; therefore, rejected in the same manner as applied above. 
Regarding claims 13-14: the claims limitations are similar to those of claims 4-5; therefore, rejected in the same manner as applied above. 
Regarding claim 17: the claim limitations are similar to those of claim 8; therefore, rejected in the same manner as applied above. 

Claims 2-3 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Chen (US Patent 9123133 ) in view of Paul (US PG-Pub. 20200005468) and Vitek (US PG-Pub. 20220147751).
Regarding claim 2: 
Chen in view of Paul teaches: the limitations of claim 1 as applied above. 
Chen in view of Paul does not teach: wherein the instructions further cause the processing resource to prior to the object detection, crop each ROI of the number of ROIs
However, in a related art Vitek teaches: wherein the instructions further cause the processing resource to prior to the object detection, crop each ROI of the number of ROIs (abstract “Overlapping ROIs are then merged to reduce the aggregate size of the ROIs, and merged ROIs are downscaled to a reduced set of pre-defined resolutions…For example, fully-convolutional, high-accuracy object detectors may operate on a subset of the entire image (e.g., cropped images based on ROIs) thus reducing computations otherwise performed over the entire image.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul to incorporate the teachings of Vitek by including: wherein the instructions further cause the processing resource to prior to the object detection, crop each ROI of the number of ROIs in order to reduce computations needed to process the object detection compared to the computations needed when using an entire image.

Regarding claim 3: 
Chen in view of Paul and Vitek teaches: the limitations of claim 1 as applied above. 
Vitek further teaches: wherein the instructions further cause the processing resource to merge overlapping portions, if any, of the number of ROIs (abstract “Overlapping ROIs are then merged to reduce the aggregate size of the ROIs, and merged ROIs are downscaled to a reduced set of pre-defined resolutions…object detectors may operate on a subset of the entire image (e.g., cropped images based on ROIs) thus reducing computations otherwise performed over the entire image.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul and Vitek  to incorporate the teachings of Vitek by including: merge overlapping portions, if any, of the number of ROIs in order to reduce the aggregate size of the ROIs.

Regarding claims 11-12: the claims limitations are similar to those of claims 2-3; therefore, rejected in the same manner as applied above. 


Claims 6, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chen (US Patent 9123133 ) in view of Paul (US PG-Pub. 20200005468) and Tojo  (US PG-Pub. 20120288153).
Regarding claim 6: 
Chen in view of Paul teaches: the limitations of claim 1 as applied above. 
Chen in view of Paul does not specifically teach: wherein estimation of the background cells comprises determining those of the plurality of cells that are inactive for greater than a predetermined threshold of time or number of frames by comparing corresponding cells of the plurality of cells among the plurality of video frames.
However, in a related art Tojo teaches: wherein estimation of the background cells comprises determining those of the plurality of cells that are inactive for greater than a predetermined threshold of time or number of frames by comparing corresponding cells of the plurality of cells among the plurality of video frames (FIG. 9A, ¶ [0083] “In the present exemplary embodiment, the appearance time is used. The foreground/background determination unit 206 compares the appearance time with the threshold value C. At this time, if the appearance time is the threshold value C or more (YES in step S902), the foreground/background determination unit 206 can consider the current pixel as a background because the pixel has existed for a sufficiently long time, and thus the processing then proceeds to step S903.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul  to incorporate the teachings of Tojo by including: wherein estimation of the background cells comprises determining those of the plurality of cells that are inactive for greater than a predetermined threshold of time or number of frames by comparing corresponding cells of the plurality of cells among the plurality of video frames  in order to distinguish the background from a moving object in the foreground for object detection purposes.

Regarding claims 15 and 19: the claim limitations are similar to those of claim 6; therefore, rejected in the same manner as applied above. 


Claims 7, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen (US Patent 9123133 ) in view of Paul (US PG-Pub. 20200005468) and Peng (US PG-Pub. 20150117703).
Regarding claim 7: 
Chen in view of Paul teaches: the limitations of claim 1 as applied above. 
Chen in view of Paul does not specifically teach: wherein estimation of the background cells comprises determining those of the plurality of cells that are inactive for greater than a predetermined threshold of time or number of frames by comparing corresponding cells of the plurality of cells among the plurality of video frames.
However, in a related art Peng teaches: wherein said clustering the active cells involves application of a K-means clustering algorithm and wherein K represents the number of ROIs. (¶ [0062] “k-means clustering algorithms may be applied using identify-sensitive features (e.g., local binary patterns for face) on all detected objects to form k clusters, where k equals to the number of objects need to be identified in the video. The k-means clustering aims to partition all detected objects into k clusters in which each detected object belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Further, the frames which contain the k centroid objects are selected.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul  to incorporate the teachings of Peng by including: wherein said clustering the active cells involves application of a K-means clustering algorithm and wherein K represents the number of ROIs in order to partition all detected objects into k clusters in which each detected object belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Regarding claim 16: the claim limitations are similar to those of claim 7; therefore, rejected in the same manner as applied above. 

Regarding claim 20: 
Chen in view of Paul teaches: the limitations of claim 10 as applied above. 
While Chen further teaches: in column 1, lines 52-54; “a background Subtraction related technique has been a commonly used technique in video surveillance and target recognitions.”; and a person skilled in the art recognizes that a target recognition when using surveillance systems may include facial recognition to identify people in the monitored area. 
And, Paul further teaches: ¶ [0111] “Process 600 may include “recognize objects” 662, where objects may be recognized semantically, such as people versus vehicles, and so forth. This application also either may use the RoIs directly or the refined object segments.”
Chen in view of Paul does not specifically teach: wherein the object detection comprises facial recognition.
However, considering the use of surveillance system and identifying people is taught by Chen and Paul. Facial recognition becomes a matter of obviousness to one of ordinary skill in art. 
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul to incorporate the teachings of Paul by including: wherein the object detection comprises facial recognition in order to identify the people being tracked by the surveillance system to verify whether the person belongs to the area where is the monitoring system is installed the person is an intruder for example.
Chen in view of Paul does not specifically teach: wherein estimation of the background cells comprises determining those of the plurality of cells that are inactive for greater than a predetermined threshold of time or number of frames by comparing corresponding cells of the plurality of cells among the plurality of video frames.
However, in a related art Peng teaches: wherein said clustering the active cells involves application of a K-means clustering algorithm and wherein K represents the number of ROIs. (¶ [0062] “k-means clustering algorithms may be applied using identify-sensitive features (e.g., local binary patterns for face) on all detected objects to form k clusters, where k equals to the number of objects need to be identified in the video. The k-means clustering aims to partition all detected objects into k clusters in which each detected object belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Further, the frames which contain the k centroid objects are selected.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul  to incorporate the teachings of Peng by including: wherein said clustering the active cells involves application of a K-means clustering algorithm and wherein K represents the number of ROIs in order to partition all detected objects into k clusters in which each detected object belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Chen (US Patent 9123133 ) in view of Paul (US PG-Pub. 20200005468) and Ahmed (US PG-Pub. 20170262695).
Regarding claim 9: 
Chen in view of Paul teaches: the limitations of claim 1 as applied above. 
While Chen teaches: in column 11 line 1 that the block size N herein may also be empirically set to 16.
Chen in view of Paul does not specifically teach: wherein X and Y are multiples of 3.
However, in a related art, Ahmed teaches: wherein X and Y are multiples of 3 (¶ [0051] “…Sub-sampling layer 406 divides the image into small rectangular blocks. The size of each block can vary. In one embodiment, deep neural network 400 uses a 3×3 pixel block size.” The current application also discloses in ¶ [0045] “…Based on the size of the video frames at issue, an appropriate cell size may be determined empirically. In one embodiment, X and Y are multiples of three, which may produce non-limiting examples of cells of size 3×6, 15×18, 27×27, 30×60, etc. Empirical evidence suggests a cell size of 30×60 pixels produces good performance.” Therefore, the cell size is merely a design choice made empirically by one of ordinary skill in the art to achieve a desired performance based on the size of the input frames especially in view of Chen’s teaching that also suggests choosing the block size empirically).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Chen and Paul  to incorporate the teachings of Ahmed by including: wherein X and Y are multiples of 3 in order achieve desired performance and balance speed and quality.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Itokawa (US PG-Pub. 20020080425) discloses: a frame to be processed (FIG. 5) is divided into blocks of a predetermined size as shown in FIG. 7 (S11) and the amount of motion of each block is calculated (S12). The calculation of the amount of motion is a process in which the position in the next frame (FIG. 6) to which each block of the processed frame corresponds is identified by a method generally known as block pattern matching, and the shift of each block from the corresponding position in the frame is obtained as a motion vector. A function for evaluating a match in pattern matching is, for example, addition of the squares of the differences between the blocks or addition of the absolute values of the differences.
Andersson (US PG-Pub. 20200137395) discloses: a motion meta data deriving operation comprises: a dividing function configured to divide a current image frame into a mesh of cells, wherein each cell comprises multiple image pixels, a comparison function configured to determine a metric of change for each cell by comparing pixel data of each cell with pixel data of a correspondingly positioned cell of a previous and/or subsequent image frame, and a storing function configured to store the metric of change for each cell as the motion metadata related to the current image frame. See FIG. 4.
Dharus (US PG-Pub. 20200193609) discloses: a deep learning-based network that can be used by the segmentation engine 104 to segment frames includes the You only look once (YOLO) detector, which is an alternative to the SSD object detection system. FIG. 14A includes an image and FIG. 14B and FIG. 14C include diagrams illustrating how the YOLO detector operates. The YOLO detector can apply a single neural network to a full image. As shown, the YOLO network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. For example, as shown in FIG. 14A, the YOLO detector divides up the image into a grid of 13-by-13 cells. Each of the cells is responsible for predicting five bounding boxes.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WASSIM MAHROUKA whose telephone number is (571)272-2945. The examiner can normally be reached Monday-Thursday 7:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward Urban can be reached on (571)272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WASSIM MAHROUKA/Examiner, Art Unit 2665                                                                                                                                                                                           
/EDWARD F URBAN/Supervisory Patent Examiner, Art Unit 2665