DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/28/2021 is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because all claims directed to a computer readable medium may encompass transitory signals. In an effort to assist the patent community in overcoming a rejection or potential rejection under 35 U.S.C. § 101 in this situation, the USPTO suggests the following approach. A claim drawn to such a computer readable medium that covers both transitory and non-transitory embodiments may be amended to narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. § 101 by adding the limitation "non-transitory" to the claim. See http://www.uspto.gov/web/offices/com/sol/og/2010/week08/TOC.htm#ref20
Amending to a “non-transitory computer-readable medium” would resolve this issue.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-9 and 17 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by YEN et al. (US 20190034734 A1, hereinafter YEN).

Regarding claim 1. YEN discloses an image processing system configured to receive a first stream of images at a first resolution from a first image source (4k) with a field of view of an environment and a second stream of images at a second resolution lower than said first resolution from a second image source (720) with substantially the same field of view as said first image source (0109-0130; Figures 8 and 9; video frames from the 4k and 720 p, frames 914 and 916, “[0114] FIG. 9 is a diagram of the data flow 800 with visual representations of the different steps of the real-time object detection process 910 performed by the video analytics system 600. At the second frame of the video sequence (as indicated by block 904), a first iteration of the real-time process 910 is invoked. The second video frame is referred to as the current video frame, which is the video frame that is currently being processed by the video analytics system 600. As shown, a video frame 914 having a first resolution (4K in the example of FIG. 9) is provided to a video analytics (VA) framework engine 918 along with a video frame 916 having a second resolution (720 p in the example of FIG. 9). In some examples, the video frame 916 can be a downsampled version of the video frame 914, and can be generated using any suitable downsampling technique. In some examples, the video frame 914 can be a separate video frame than the video frame 916, in which case the video frames 914 and 916 capture the same scene at the same instance of time and from the same perspective (the same angle and orientation). In any event, the video frame 914 and the video frame 916 capture the same image of the scene and can thus be considered as being different versions of the same current video frame (one having a lower resolution than the other) that is being processed by the real-time process 910.”), said system comprising: 
a localizer component configured to provide a location for any object of interest independently of class within successive images of said second stream of images (0115, 0110; Figures 8 and 9; Objecting tracking OT, “[0115] The VA framework engine 918 can process the video frames 914 and 916 to determine which component of the video analytics system 600 will be provided with the video frames 914 and 916. The lower resolution video frame 916 is provided to the object detection and tracking system 922 (OT 922). The OT system 922 can perform object detection to determine one or more foreground blobs for the video frame 916. Object tracking can then be performed to associate (or match) object trackers with the one or more blobs. The blob detection and object tracking processes performed by the OT system 922 can be performed by the blob detection 604 and the object tracking system 606, and are described in further detail above with respect to FIG. 1-FIG. 4. “, “[0110] A first deep learning process (DL-1) is first applied at a given frame and can then be performed again every P number of frames after the given frame, where P is an integer value greater than or equal to 1. As described in more detail herein, the DL-1 process can utilize a first trained network (e.g., a first deep learning classification network) to classify and/or localize one or more objects in one or more of the video frames. The period P at which the DL-1 process is performed can depend on the amount of time the DL-1 process is designed to run on a given video frame. The amount to time can be fixed so that it takes P number of frames for every iteration of the DL-1 process. The value of P will typically be greater than one video frame due to the DL-1 process requiring multiple video frames to classify and localize objects in the regions of interest (ROIs) determined from the bounding boxes provided from object tracking.”); 
a classifier (0124; Figure 9; “[0124] The deep learning network engine 726 applies a deep learning network to the cropped video frame to determine classes for the one or more objects in the cropped frame. If one or more classes are determined for the one or more objects in the cropped frame, the deep learning network engine can output class information 928 for the objects to a storage device (not shown) that maintains metadata 929 for objects classified for the current video frame (corresponding to frames 914 and 916). The class information 928 is used to update the metadata 929 for the objects that have been classified. In some cases, the deep network can also identify the location of one or more of the objects, in which case the metadata 929 is also updated to include the localization information. For example, as noted above, each of the bounding boxes is associated with a tracker ID. Each bounding box that is within a ROI generated by the DL system 926 is monitored to determine if a class (and/or a location) has been determined for the object associated with the bounding box. If a class (and/or a location) is determined for an object associated with a bounding box, the metadata 929 can be updated to indicate that the object has been classified (and/or a localized) by the DL system 926.”) configured to: receive one or more locations selectively provided by said localizer (0116; Figure 9; bonding boxes (objects) 924), identify a corresponding portion of an image (0123; Figure 9; ROI clipping 927) acquired from said first stream at substantially the same time at which an image from said second stream in which an object of interest was identified and return a classification for the type of object within said identified portion of said image from said first stream (0124; Figure 9; object class information 928); and 
a tracker configured to associate said classification with said location through acquisition of successive images in said second stream (0124; Figure 9; “[0124] The deep learning network engine 726 applies a deep learning network to the cropped video frame to determine classes for the one or more objects in the cropped frame. If one or more classes are determined for the one or more objects in the cropped frame, the deep learning network engine can output class information 928 for the objects to a storage device (not shown) that maintains metadata 929 for objects classified for the current video frame (corresponding to frames 914 and 916). The class information 928 is used to update the metadata 929 for the objects that have been classified. In some cases, the deep network can also identify the location of one or more of the objects, in which case the metadata 929 is also updated to include the localization information. For example, as noted above, each of the bounding boxes is associated with a tracker ID. Each bounding box that is within a ROI generated by the DL system 926 is monitored to determine if a class (and/or a location) has been determined for the object associated with the bounding box. If a class (and/or a location) is determined for an object associated with a bounding box, the metadata 929 can be updated to indicate that the object has been classified (and/or a localized) by the DL system 926.”).

Regarding claim 2. YEN discloses the image processing system of claim 1 wherein said localizer is configured to identify temporal movement of an object over a number of images of said second stream based on generic object spatial features (0061; Figure 1; “the blob detection system 104 can perform background subtraction for a frame, and can then detect foreground pixels in the frame. Foreground blobs are generated from the foreground pixels using morphology operations and spatial analysis”).

Regarding claim 3. YEN discloses the image processing system of claim 1 wherein said tracker is configured either:
to provide a location to said classifier for an object newly located in an image of said second stream (0086; “[0086] The status or state of a blob tracker can include the tracker's identified location (or actual location) in a current frame and its predicted location in the next frame. The location of the foreground blobs are identified by the blob detection system 104. However, as described in more detail below, the location of a blob tracker in a current frame may need to be predicted based on information from a previous frame (e.g., using a location of a blob associated with the blob tracker in the previous frame). After the data association is performed for the current frame, the tracker location in the current frame can be identified as the location of its associated blob(s) in the current frame. The tracker's location can be further used to update the tracker's motion model and predict its location in the next frame. Further, in some cases, there may be trackers that are temporarily lost (e.g., when a blob the tracker was tracking is no longer detected), in which case the locations of such trackers also need to be predicted (e.g., by a Kalman filter). Such trackers are temporarily not shown to the system. Prediction of the bounding box location helps not only to maintain certain level of tracking for lost and/or merged bounding boxes, but also to give more accurate estimation of the initial position of the trackers so that the association of the bounding boxes and trackers can be made more precise”); or 
only to provide a location to said classifier for an object which has been identified to have stopped moving within the field of view of the second image source (0067; “[0067] An equation of the GMM model is shown in equation (1), wherein there are K Gaussian models. Each Guassian model has a distribution with a mean of μ and variance of Σ, and has a weight ω. Here, i is the index to the Gaussian model and t is the time instance. As shown by the equation, the parameters of the GMM change over time after one frame (at time t) is processed. In GMM or any other learning based background subtraction, the current pixel impacts the whole model of the pixel location based on a learning rate, which could be constant or typically at least the same for each pixel location. A background subtraction method based on GMM (or other learning based background subtraction) adapts to local changes for each pixel. Thus, once a moving object stops, for each pixel location of the object, the same pixel value keeps on contributing to its associated background model heavily, and the region associated with the object becomes background.”).

Regarding claim 4. YEN discloses the image processing system of claim 1 wherein said tracker is responsive to said classifier failing to classify an object, either: 
to provide said object location to said classifier in response to locating said object in a subsequent image of said second stream with a higher level of confidence (0126, 0089; “[0126] In some cases, the deep learning network applied by the deep learning network engine 726 can provide confidence levels when classifying an object. For example, as described in more detail below, a deep learning network can generate a probability vector (or other representation of a set of probabilities) that includes probabilities indicating that an object is a certain class of object (e.g., a person, a dog, a car, or other suitable class), with a probability for each class being included in the vector. A probability that an object is a certain class can be used as a confidence level that the object is part of the class. A threshold confidence level can be defined, which sets a minimum confidence level for considering an object as being classified. In one illustrative example, the threshold can be set to 0.6, indicating that an object must have a probability for a class of at least 60% to be considered as being a part of that class. When a current video frame is being processed by the DL system 926, the metadata 929 for the tracked objects associated with the bounding boxes provided for the frame can be checked to determine if a confidence level for an object exceeds (or is equal to in some cases) the threshold. If the confidence level exceeds the threshold, the bounding box for that object can be disregarded. However, if the confidence level does not exceed the threshold, the bounding box can be considered when generating ROIs for the current video frame. In such cases, the DL-1 process can run the deep learning network on the object again in an attempt to re-classify the object with a higher confidence level.”, “[0089] There may be other state or status information needed for updating the tracker, which may require a state machine for object tracking. Given the information of the associated blob(s) and the tracker's own status history table, the status also needs to be updated. The state machine collects all the necessary information and updates the status accordingly. Various statuses of trackers can be updated. For example, other than a tracker's life status (e.g., new, lost, dead, or other suitable life status), the tracker's association confidence and relationship with other trackers can also be updated. Taking one example of the tracker relationship, when two objects (e.g., persons, vehicles, or other objects of interest) intersect, the two trackers associated with the two objects will be merged together for certain frames, and the merge or occlusion status needs to be recorded … .”); or 
to provide said object location to said classifier in response to locating said object with a larger size in a subsequent image of said second stream.

Regarding claim 5. YEN discloses the image processing system of claim 1 wherein said classifier is configured to vary the class of object it attempts to identify at a portion of an image acquired from said first stream from time to time.

Regarding claim 6. YEN discloses the image processing system of claim 1 wherein either: the second stream comprises a sub-sampled version of the first stream (0114; Figure 9; ‘video frame 914’ and ‘video frame 916’); the first image source comprises a distinct image source separate from the second image source; or the first image source comprises a visible wavelength camera and the second image source comprises an infra-red image source.

Regarding claim 7. YEN discloses the image processing system of claim 1 wherein the second image source comprises one of: a thermal infra-red camera; a near infra-red camera; a LIDAR transceiver; an ultrasound transceiver; or an event camera (0056; “[0056] In some embodiments, the video analytics system 100 and the video source 130 can be part of the same computing device. In some embodiments, the video analytics system 100 and the video source 130 can be part of separate computing devices. In some examples, the computing device (or devices) can include one or more wireless transceivers for wireless communications. The computing device (or devices) can include an electronic device, such as a camera (e.g., an IP camera or other video camera, a camera phone, a video phone, or other suitable capture device), a mobile or stationary telephone handset (e.g., smartphone, cellular telephone, or the like), a desktop computer, a laptop or notebook computer, a tablet computer, a set-top box, a television, a display device, a digital media player, a video gaming console, a video streaming device, or any other suitable electronic device.”).

Regarding claim 8. YEN discloses a static security camera comprising the image processing system of claim 1, said first image source and said second image source integrated within a common housing (0050).

Regarding claim 9. YEN discloses the image processing system according to claim 1 wherein said localizer is adapted to receive inputs from a plurality of previously acquired images of said second stream when analysing a given image of said second stream (0115, 0110; Figures 8 and 9).

Regarding claim 17. Computer-readable storage medium claim 17 is drawn to the computer-readable storage medium of using the corresponding to the system of using the same as claimed in claim 1. Therefore, computer-readable storage medium claim 17 corresponds to the system claim 1, and is rejected for the same reasons of anticipation as used above.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over YEN as applied to claims 10 and 12 above, and in view of Vallespi-Gonzalez et al. (US 20180307921 A1, hereinafter “Vallespi-Gonzalez”).

Regarding claim 10. YEN discloses the image processing system according to claim 9, but failed to disclose wherein an interval (K1-K4) between successive pairs of said plurality of previously acquired images and said given image is relatively short when tracking objects detected as moving quickly across the field of view of the environment, and said interval is relatively longer when tracking objects detected as moving slowly across the field of view of the environment.
However, in the same field of endeavor, before the effective filing date of the invention, Vallespi-Gonzalez shows wherein an interval between successive pairs of said plurality of previously acquired images and said given image is relatively short when tracking objects detected as moving quickly across the field of view of the environment, and said interval is relatively longer when tracking objects detected as moving slowly across the field of view of the environment (0052, 0059, 0060; “[0059] The systems and methods described herein can also provide an additional technical effect and benefit of improving the classification and tracking of objects of interest in a perception system of an autonomous vehicle. For example, performing more accurate segmentation provides for improved tracking by having cleaner segmented objects and provides for improved classification once objects are properly segmented. Such improved object detection accuracy can be particularly advantageous for use in conjunction with vehicle computing systems for autonomous vehicles. Because vehicle computing systems for autonomous vehicles are tasked with repeatedly detecting and analyzing objects in sensor data for tracking and classification of objects of interest (including other vehicles, cyclists, pedestrians, traffic control devices, and the like) and then determining necessary responses to such objects of interest, improved object detection can lead to faster and more accurate object tracking and classification. Improved object tracking and classification can have a direct effect on the provision of safer and smoother automated control of vehicle systems and improved overall performance of autonomous vehicles.”).
It would have been obvious to the person of having ordinary skilled in the art, before the effective filing date of the invention, to combine the teaching of Vallespi-Gonzalez allocating different tracking criterion based on the speed of the detected object in the teaching of YEN in order to better utilizing the computing resources and reducing bandwidth requirement. 

Regarding claim 11. Vallespi-Gonzalez shows wherein said interval is of varying length when tracking a plurality of objects detected as moving at varying speeds across the field of view of the environment (0052, 0059, 0060; “[0059] The systems and methods described herein can also provide an additional technical effect and benefit of improving the classification and tracking of objects of interest in a perception system of an autonomous vehicle. For example, performing more accurate segmentation provides for improved tracking by having cleaner segmented objects and provides for improved classification once objects are properly segmented. Such improved object detection accuracy can be particularly advantageous for use in conjunction with vehicle computing systems for autonomous vehicles. Because vehicle computing systems for autonomous vehicles are tasked with repeatedly detecting and analyzing objects in sensor data for tracking and classification of objects of interest (including other vehicles, cyclists, pedestrians, traffic control devices, and the like) and then determining necessary responses to such objects of interest, improved object detection can lead to faster and more accurate object tracking and classification. Improved object tracking and classification can have a direct effect on the provision of safer and smoother automated control of vehicle systems and improved overall performance of autonomous vehicles.”).
It would have been obvious to the person of having ordinary skilled in the art, before the effective filing date of the invention, to combine the teaching of Vallespi-Gonzalez allocating different tracking criterion based on the speed of the detected object in the teaching of YEN in order to better utilizing the computing resources and reducing bandwidth requirement. 



Claims 12-16 are rejected under 35 U.S.C. 103 as being unpatentable over YEN as applied to claims 10 and 12 above, and in view of BIGIOI et al. (US 20190065410 A1, hereinafter “BIGIOI”).

Regarding claim 12. YEN discloses the image processing system according to claim 9, but failed to disclose wherein said localizer comprises a neural network comprising a plurality of layers configured to produce a map of movement locations for said given image.
However, in the same field of endeavor, before the effective filing date of the invention, BIGIOI shows localizer comprises a neural network comprising a plurality of layers configured to produce a map of movement locations for said given image (0055-0056, 0075; Figures 6 and 7; “[0056] 4. The required PCNN 30-A . . . 30-D can now be enabled and once enabled, the controller within any enabled PCNN reads input maps from shared system memory 40′ in accordance with its register settings and processes the maps as specified by the neural network program. Intermediate maps may be stored locally within each CNN 30-A in an image cache 31 as disclosed in PCT Publication No. WO 2017/129325 (Ref: FN-481-PCT) or temporarily in system memory 40′ at configurable addresses.”).
It would have been obvious to the person of having ordinary skilled in the art, before the effective filing date of the invention, to combine the teaching of Vallespi-Gonzalez allocating different tracking criterion based on the speed of the detected object in the teaching of YEN in order to better utilizing the computing resources and reducing bandwidth requirement by localizing objects regardless their proximity and variety of feature and classifies the object at a better resolution.

Regarding claim 13. YEN in view of BIGIOI shows the image processing system according to claim 12. YEN further shows wherein a first plurality of said layers comprise convolutional layers and wherein said network comprises a plurality of appendix layers connected to the output of said first plurality of layers and configured to produce an indication of whether movement may be present within said given image, said localizer being responsive to detection of movement to execute a second plurality of layers of said network following said first plurality of layers to produce said map (0147, 0149, 0151, 0152, 0155; Figure 12; “[0147] The first layer of the CNN 1200 is the convolutional hidden layer 1222a. The convolutional hidden layer 1222a analyzes the image data of the input layer 1220. Each node of the convolutional hidden layer 1222a is connected to a region of nodes (pixels) of the input image called a receptive field. The convolutional hidden layer 1222a can be considered as one or more filters (each filter corresponding to a different activation or feature map), with each convolutional iteration of a filter being a node or neuron of the convolutional hidden layer 1222a. …”).

Regarding claim 14. YEN further shows the image processing system according to claim 13 wherein said appendix layers are configured to produce an indication of whether movement may be present within one or more sub-regions of said given image, said localizer being responsive to detection of movement in a sub-region to execute a further plurality of layers of said network on an output of said first plurality of layers to produce said map for said sub-region (0152; Figure 12; “[0152] In some examples, max-pooling can be used by applying a max-pooling filter (e.g., having a size of 2×2) with a step amount (e.g., equal to a dimension of the filter, such as a step amount of 2) to an activation map output from the convolutional hidden layer 1222a. The output from a max-pooling filter includes the maximum number in every sub-region that the filter convolves around. Using a 2×2 filter as an example, each unit in the pooling layer can summarize a region of 2×2 nodes in the previous layer (with each node being a value in the activation map). For example, four values (nodes) in an activation map will be analyzed by a 2×2 max-pooling filter at each iteration of the filter, with the maximum value from the four values being output as the “max” value. If such a max-pooling filter is applied to an activation filter from the convolutional hidden layer 1222a having a dimension of 24×24 nodes, the output from the pooling hidden layer 1222b will be an array of 12×12 nodes.”).

Regarding claim 15. YEN further shows the image processing system according to claim 14 wherein said further plurality of layers comprises said second plurality of layers operating on a dilated output of said first plurality of layers (0070-0071; Figure 3; “[0071] A dilation operation can be used to enhance the boundary of a foreground object. For example, the morphology engine 314 can apply a dilation function (e.g., FilterDilate3×3) to a 3×3 filter window of a center pixel. The 3×3 dilation window can be applied to each background pixel (as the center pixel) in the foreground mask. One of ordinary skill in the art will appreciate that other window sizes can be used other than a 3×3 window. The dilation function can include a dilation operation that sets a current background pixel in the foreground mask (acting as the center pixel) as a foreground pixel if one or more of its neighboring pixels in the 3×3 window are foreground pixels. The neighboring pixels of the current center pixel include the eight pixels in the 3×3 window, with the ninth pixel being the current center pixel. In some examples, multiple dilation functions can be applied after an erosion function is applied. In one illustrative example, three function calls of dilation of 3×3 window size can be applied to the foreground mask before it is sent to the connected component analysis engine 316. In some examples, an erosion function can be applied first to remove noise pixels, and a series of dilation functions can then be applied to refine the foreground pixels. In one illustrative example, one erosion function with 3×3 window size is called first, and three function calls of dilation of 3×3 window size are applied to the foreground mask before it is sent to the connected component analysis engine 316. Details regarding content-adaptive morphology operations are described below.”).

Regarding claim 16. YEN in view of BIGIOI shows the image processing system according to claim 12. YEN further shows wherein a first plurality of said layers comprise convolutional layers and wherein a first convolutional layer of said first plurality of said layers comprises a convolution of said plurality of previously acquired images and wherein the results of said convolution are employed by said localizer for processing a limited number of successive images in said second stream (0148).



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ASMAMAW G TARKO whose telephone number is (571)272-9205. The examiner can normally be reached Monday -Friday 9:00 Am - 5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chris Kelley can be reached on (571) 272-7331. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


ASMAMAW G. TARKO
Examiner, Art Unit 2482



/NASIM N NIRJHAR/           Primary Examiner, Art Unit 2482