DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 22, 2021 has been entered.

Applicant(s) Response to Official Action
Presented arguments filed on February 22, 2021 in response to the Final Office Action mailed on September 21, 2020 have been made of record.  Claims 1 - 25 are currently pending in the application. Claims 1, 9, 17 and 22 have been amended.

 Response to Arguments
Applicant’s arguments see pages 18 – 19 with respect to the rejection of Claims 1, 4, 5, 7-9, 12, 13, 15-17, 21 and 22 under 35 U.S.C. 102(a)(1) as being anticipated by Manmatha et al., (US 10,528,819 B1) have been fully considered and are not persuasive. 
Applicant argues on page 17 that “The cited portion of Manmatha merely discloses that a P-frame contains motion vectors-it does not disclose performing object detection using a derivative 
Accordingly, the rejection is maintained.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 4, 5, 7 – 9, 12, 13, 15 – 17, 21 and 22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Manmatha et al., (US 10,528,819 B1) referred to as Manmatha hereinafter.
Regarding Claim 1, Manmatha discloses an apparatus (Fig. 5, Col. 10:11, content provider environment 506), comprising: 
a communication interface (Col. 10:31-32, an interface layer 508); and 
processing circuitry (Fig. 6, Col. 11:62-64, the device 600 includes at least one processor 610 for executing instructions that can be stored in a memory device or element 620) to: 
receive, via the communication interface (Fig. 5), a compressed video stream (Col. 2:47-51, the present methods and systems directly process compressed videos and use a feature map and optionally a motion vector from the compressed video to perform specific tasks such as object detection) captured by a camera (Col. 12:18-23, a video or image capture element 640 (i.e. camera) can include any appropriate technology, such as a charge-coupled device (CCD) video or image capture element 640 having a sufficient resolution, focal range, and viewable area, to capture video or an image when the user is operating the device 600), wherein the compressed video stream comprises a plurality of compressed video frames (Col. 12:26-29, it should be understood that video or image capture can be performed using a single image, multiple images (i.e. plurality of compressed video frames), periodic imaging, continuous image capturing, image streaming, etc.), wherein the plurality of compressed video frames comprises: 
a first compressed frame (Fig. 3, Col. 4:2-3, a GoP must have at least one I-frame 300); and 
a second compressed frame (Fig. 3, Col. 4:3, one or more P-frames 310), wherein the second compressed frame is compressed based at least in part on the first compressed frame (Col. 2:15-20, the next frame is a P-frame (Predictive picture). P-frames are obtained by computing the block motion field between the previous frame and the current frame and then storing both the block motion field and the residual between the prediction and the actual frame in the P-frame), and wherein the second compressed frame comprises a plurality of motion vectors (Col. 4:19-21, a P-frame 310 stores motion vectors and residuals, not the actual image, based on how the block or box has moved when compared to the prior I-frame 300); 
decompress the first compressed frame into a first decompressed frame (Fig. 3, Col. 5: an I-frame 300 contains all of the information needed to reconstruct the frame); 
perform pixel-domain object detection (Col. 7:2-4, pixel movements between two adjacent frames and propagates the visual feature (i.e. object detection) following pixel movements) to detect an object at a first position in the first decompressed frame (Col. 5:32-34, extract content information in the form of a feature from an I-frame 300 and then create a feature map 340 used for location-based object detection); and 
perform compressed-domain object detection to detect the object at a second position in the second compressed frame (Col. 5:67-Col.6:1, use here is the Region-based Fully Convolutional Networks (RFCN) object detector to iteratively find the objects in the partial P-frame 310), wherein the object is detected at the second position in the second compressed frame based on: 
the first position of the object in the first decompressed frame (Col. 4:15-16, a P-frame (Predictive picture) 310 holds only the changes in the image (i.e. second position) from the previous frame (i.e. first frame)); and 
a derivative motion vector computed based on interpolation of a subset of motion vectors in the second compressed frame (Fig. 3, Col. 5:49-57, the feature map 340 is estimated by bi-linearly interpolating the I-frame-derived feature vector using a motion vector which is already encoded in that I-frame's respective portion of the compressed video file (i.e. based on interpolation in a compressed frame). The feature map for the next P-frame 310 is then derived by again interpolating the previous feature map 340 using the motion vector 350 from the second P-frame 310), wherein the subset of motion vectors is selected from the plurality of motion vectors in the second compressed frame based on the first position of the object in the first decompressed frame (Col. 4:19-21, a P-frame 310 stores motion vectors (i.e. plurality of motion vectors) and residuals, not the actual image, based on how the block or box has moved when compared to the prior I-frame 300).

Regarding Claim 4, Manmatha discloses Claim 1. Manmatha further discloses wherein: the plurality of compressed video frames further comprises a third compressed frame (Fig. 3, Col. 4:4, zero or more B-frames), wherein the third compressed frame is compressed based at least in part on the first compressed frame, and wherein the third compressed frame comprises a second plurality of motion vectors (Col. 2:20-21, A third type of compressed frame, a B-frame (Bidirectional picture) contains the residual motion vectors (i.e. second plurality of motion vectors) from both preceding and following I (i.e. in part on the first compressed frame) or P frames); and the processing circuitry is further to perform compressed-domain object detection to detect the object at a third position in the third compressed frame (Col. 5:67-Col.6:1, use here is the Region-based Fully Convolutional Networks (RFCN) object detector to , wherein the object is detected at the third position in the third compressed frame based on: the first position of the object in the first compressed frame (Col. 4:48-51, B-frames (Bidirectional pictures) 310 use differences between the current frame and both the preceding and following frames to specify their content (i.e. first position of the object in first compressed frame)); and the second plurality of motion vectors from the third compressed frame (Col. 4:56-58, A B-frame 310 may be viewed as a special type of P-frame 310, where motion vectors (i.e. second plurality of motions vectors)are computed bi-directionally between the B-frame 310 and a P-frame 310 or I-frame 300).  

Regarding Claim 5, Manmatha discloses Claim 1. Manmatha further discloses wherein: the plurality of compressed video frames further comprises a third compressed frame (Fig. 3, Col. 4:4, zero or more B-frames), wherein the third compressed frame is compressed based at least in part on the second compressed frame, and wherein the third compressed frame comprises a second plurality of motion vectors (Col. 2:20-21, A third type of compressed frame, a B-frame (Bidirectional picture) contains the residual motion vectors (i.e. second plurality of motion vectors) from both preceding and following I  or P frames (i.e. in part on the second compressed frame)); and the processing circuitry is further to perform compressed-domain object detection to detect the object at a third position in the third compressed frame (Col. 5:67-Col.6:1, use here is the Region-based Fully Convolutional Networks (RFCN) object detector to iteratively find the objects in the partial P-frame 310, this illustrative approach additionally can be applied to B-frames 310 (i.e. third compressed frame)), wherein the object is detected at the third position in the third compressed frame based on: the second position of the object in the second compressed frame (Col. 4:48-51, B-frames (Bidirectional pictures) 310 use differences between the current frame and both the preceding and following frames to specify their content (i.e. second position of the object in second compressed frame)); and the second plurality of motion vectors from the third compressed frame (Col. 4:56-58, A B-frame 310 may be viewed as a special type of P-frame 310, where motion vectors (i.e. second plurality of motions vectors)are computed bi-directionally between the B-frame 310 and a P-frame 310 or I-frame 300).  

Regarding Claim 7, Manmatha discloses Claim 1. Manmatha further discloses wherein the processing circuitry to perform pixel-domain object detection (Col. 7:2-4, pixel movements between two adjacent frames and propagates the visual feature following pixel movements) to detect the object at the first position in the first decompressed frame (Col. 5:32-34, extract content information in the form of a feature from an I-frame 300 and then create a feature map 340 used for location-based object detection) is further to: process the first decompressed frame using a convolutional neural network (Fig. 3, Step 330, Convolutional Neural Network (CNN)) , wherein the convolutional neural network is trained to perform object detection in a pixel domain (Col. 5:29-34, A CNN 330, which is a heavy-duty, deeply-layered network (such as a Resnet-50) standard in the video recognition and analysis field, may be built and executed to extract content information in the form of a feature from an I-frame 300 and then create a feature map 340 (i.e. pixel domain) used for location-based object detection). 

Regarding Claim 8, Manmatha discloses Claim 1. Manmatha further discloses wherein the second compressed frame (Col. 4:19-21, a P-frame 310 (i.e. second compressed frame) stores motion vectors and residuals, not the actual image, based on how the block or box has moved when comprises a motion-predicted frame (Col. 4:8, the motion predicted P-frame 310), wherein: the motion-predicted frame is compressed based at least in part on the first compressed frame (Col. 4:15-16, a P-frame (Predictive picture) 310 holds only the changes in the image from the previous frame (i.e. first compressed frame)); the motion-predicted frame comprises the plurality of motion vectors (Col. 4:19-21, a P-frame 310 stores motion vectors (i.e. plurality) based on how the block or box has moved when compared to the prior I-frame 300); and the motion-predicted frame does not comprise a prediction residual indicating a difference between the second compressed frame and the first compressed frame  (Col. 5:67-Col.6:1, the Region-based Fully Convolutional Networks (RFCN) object detector to iteratively find the objects in the partial P-frame 310 (i.e. partial P-frame would not comprise a prediction residual)) .

Claims 9, 12, 13, 15 and 16 are drawn to the corresponding apparatus claimed in Claims 1, 4, 5, 7 and 8, respectively.  Therefore Claims 9, 12, 13, 15 and 16 correspond to apparatus Claims 1, 4, 5, 7 and 8, respectively, and are rejected for the same reasons of anticipation as used above.

Method Claims 17 and 21 are drawn to the method of using the corresponding apparatus claimed in Claims 1 and 8, respectively.  Therefore method Claims 17 and 21 correspond to apparatus Claims 1 and 8, respectively, and are rejected for the same reasons of anticipation as used above.

Claim 22 have limitations similar to Claim 1 treated in the above rejection(s), and are met by the reference as discussed above.  Claim 22 however also recites a system (See Manmatha, Fig. 4, a system 400), comprising: a camera (See Manmatha, Col. 12:18-23, a video or image capture element 640 (i.e. camera) can include any appropriate technology, such as a charge-coupled device (CCD) video or image capture element 640 having a sufficient resolution, focal range, and viewable area, to capture video or an image when the user is operating the device 600 (also referenced as 410 in Fig. 4)); and an edge processing device (See Manmatha, Col. 9, 36-39, server 440, any necessary files for performing the functions attributed to the computers, servers, or other network devices may be stored locally and/or remotely, as appropriate (i.e. edge processing device, a computing device connected to the same local network as the camera as described in Par.[0022] of the originally filed specification)).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 14, 20 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Manmatha (US 10,528,819 B1)  in view of Pereira et al., (US 2019/0138813 A1) referred to as Pereira hereinafter.
Regarding Claim 6, Manmatha discloses Claim 1. While Manmatha discloses the first compressed frame (Fig. 3, Col. 4:2-3, a GoP must have at least one I-frame 300) is compressed (Col. 2:47-51, the present methods and systems directly process compressed videos and use a feature map and optionally a motion vector from the compressed video to perform specific tasks such as object detection); the processing circuitry to decompress the first compressed frame into the first decompressed frame (Fig. 3, Col. 5: an I-frame 300 contains all of the information needed to reconstruct the frame); and the processing circuitry to perform pixel-domain object detection (Col. 7:2-4, pixel movements between two adjacent frames and propagates the visual feature following pixel movements) to detect the object at the first position in the first decompressed frame (Col. 5:32-34, extract content information in the form of a feature from an I-frame 300 and then create a feature map 340 used for location-based object detection). 
Manmatha does not specifically disclose the first compressed frame is based at least in part on a background frame. Therefore, Manmatha fails to explicitly disclose the first compressed frame based at least in part on the background frame; and determine a frame difference between the first decompressed frame and the background frame; and detect the object at the first position in the first decompressed frame based on the frame difference.
However, Pereira teaches the first compressed frame based at least in part on the background frame (Par. [0049], at step 406, background information is extracted by identifying pixels not moving for a long time); and determine a frame difference between the first decompressed frame and the background frame (Par. [0049], The background image together with the frame difference image are used to determine a formation of pixels, the foreground pixels as further described below, moving against the background); and detect the object at the first position in the first decompressed frame based on the frame difference (Par. [0046], FIG. 4 illustrates a generic motion segmentation process 400 for object boundary extraction and tracking).
References Manmatha and Pereira are considered to be analogous art because they relate to object detection in video compression. Therefore, it would be obvious to one possessing ordinary skill in the art before the effective filing date of the claimed invention to specifying video frame based in part on the background as suggested by Pereira in the invention of Manmatha. This modification would allow comparing foreground images with background images to distinguish whether the motion segment represents boundaries of a slightly moving object or a trajectory of an object that moves significantly (See Pereira, Par. [0039]).

Claim 14 is drawn to the corresponding apparatus claimed in Claim 6.  Therefore Claim 14 corresponds to apparatus Claim 6 and is rejected for the same reasons of obviousness as used above.

Method Claim 20 is drawn to the method of using the corresponding apparatus claimed in Claim 6.  Therefore method Claim 20 corresponds to apparatus Claim 6 and is rejected for the same reasons of obviousness as used above.

System Claim 25 is drawn to the corresponding apparatus claimed in Claim 6.  Therefore system Claim 25 corresponds to apparatus Claim 6 and is rejected for the same reasons of obviousness as used above.



Allowable Subject Matter
Claims 2, 3, 10, 11, 18, 19, 23 and 24 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter: Claims 2, 10, 18 and 23 specifically defines computing an average motion vector based on the subset of motion vectors and estimating the second position of the object in the second compressed frame based on the first position of the object in the first decompressed frame and the average motion vector which is not readily taught or suggested by the prior art uncovered during search or made of record. Claims 3, 11, 19 and 24 specifically defines the compressed domain input for an artificial neural network in relation to three channels with different information which is not readily taught or suggested by the prior art uncovered during search or made of record. 

Conclusion
Any inquiry concerning this communication should be directed to SUSAN E HODGES whose telephone number is (571)270-0498.  The Examiner can normally be reached on Monday - Friday from 8:00 am (EST) to 4:00 pm (EST).  
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner's supervisor, Brian T. Pendleton, can be reached on (571) 272-7527. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished 

/Susan E. Hodges/Primary Examiner, Art Unit 2425