Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This Office Action is in response to Application No. 16/981,837 filed 09/17/2020.  Claims 1-15 are pending and have been examined.
The information disclosure statement (IDS) submitted on 09/17/2020 was considered by the examiner.
CLAIM INTERPRETATION

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. § 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that § 112(f) (pre-AIA  § 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function. 
Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. § 
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke § 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke § 112(f) except as otherwise indicated in an Office action. 
Claim limitations:
a first video frame intercepting module, which is configured to continuously intercept;
a You Only Look Once (YOLO) detection module, which is configured to detect;
a Kernel Correlation Filter (KCF) tracking module, which is configured to detect;
a second video frame intercepting module, which is configured to keep on continuously intercepting;
a video frame obtaining module, which is configured to continuously intercept;
a first to-be-identified image determination module, which is configured to determine;
a first determination module, which is configured to determine;
a first fusion image obtaining module, which is configured to fuse;
a second to-be-identified image determination module, which is configured to determine;
a second determination module, which is configured to determine;

a video splicing module, which is configured to splice;
a video stream representing module, which is configured to represent;
have been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because they use a generic placeholder “configured to” coupled with functional language without reciting sufficient structure to achieve the function.  Furthermore, the generic placeholder is not preceded by a structural modifier.  
Since the claim limitation(s) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, claims 10 and 11 have been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.  
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation: Figure 9 and paragraphs [0202]-[0205].
If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 
If applicant does not intend to have the claim limitation(s) treated under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, or present a sufficient showing that the claim recites/recite sufficient 
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).
	Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 3 and 7-15 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Sundaresan et al. (US 2019/0114804), herein Sundaresan.
Consider claim 1, Sundaresan clearly teaches an object positioning method, (Fig. 9) comprising:

continuously intercepting a preset number of video frames in a video stream to be detected; (A subset of captured frames are sent to neural network detection system 104, [0058], [0137].)

detecting a first frame image in the intercepted video frames by a You Only Look Once (YOLO) object detection method to obtain a first positioning result corresponding to the first frame image; (Neural network detection system 104 includes a YOLO detector to detect objects in the subset of frames, [0052], [0055], [0135]-[0136].)

detecting each of other frame images in the intercepted video frames by a Kernel Correlation Filter (KCF) object tracking method according to a positioning result corresponding to a frame image prior to the each of the other frame images to obtain respective second positioning results corresponding to the other frame images; (Object positions are sent to lightweight object detector 108 which uses KCF object tracking to determine the positions of the objects for the remaining frames in the video stream, [0059], [0061], [0065], [0076], [0140].) and

keeping on continuously intercepting the preset number of video frames in the video stream to be detected and obtaining corresponding positioning results until the video stream to be detected is finished. (Fig. 4A: Process 400A ends when there is no on-going task in step 402, [0079].)

Consider claim 3, Sundaresan clearly teaches before continuously intercepting the preset number of video frames in the video stream to be detected, the method further comprises: determining the preset number according to an image frame rate of the video stream to be detected and a data processing performance parameter of a current device. (The frames selected from the input stream depend on the number of frames per second in the input stream and the processing power of the system, e.g. if the input stream is 30 fps and the network detection system 104 can process an image in .5 seconds then frames 1, 16, etc. will be selected, [0056], [0069].)

Consider claim 7, Sundaresan clearly teaches continuously intercepting the preset number of video frames in the video stream to be detected comprises: continuously intercepting the preset number of video frames in the video stream to be detected from the first frame image of the video stream to be detected. (A subset of captured frames beginning with frame 1 are sent to neural network detection system 104, [0058], [0059], [0137].)

Consider claim 8, Sundaresan clearly teaches a neural network used by the YOLO object detection method is a 5-layer neural network. (Fig. 6: The neural network can include any number of layers n where n is an integer greater than or equal to 1, [0110], [0120], [0121], [0135].)

Consider claim 9, Sundaresan clearly teaches a video display method, (Fig. 9) comprising:

(A subset of captured frames are sent to neural network detection system 104, [0058], [0137].)

determining, according to a first positioning result corresponding to a first frame image in the intercepted video frames obtained from an object positioning method, a first image to be identified corresponding to the first frame image; (Neural network detection system 104 includes a YOLO detector to detect objects in the subset of frames, [0052], [0055], [0135]-[0136].)

determining whether the first image to be identified is a modifiable image; (Neural network detection system 104 determines if there are any recognized objects in the frame, [0059], [0070].)

in response to determining that the first image to be identified is the modifiable image, fusing a modified image matched with the first image to be identified and the first frame image to form a first fusion image; (Fig. 8C: A bounding box with object class label are combined with frame, [0136].)

determining, according to respective second positioning results corresponding to other frame images obtained from the object positioning method, respective second images to be identified corresponding to the other frame images in the intercepted video frames; (Object positions are sent to lightweight object detector 108 which uses KCF object tracking to determine the positions of the objects for the remaining frames in the video stream, [0059], [0061], [0065], [0076], [0140].)

determining whether the second images to be identified are modifiable images; (Lightweight object tracker 108 performs tracking reliability evaluation on frames not evaluated by the neural network detection system 104 and determines if the frame contains objects, [0083].)

in response to determining that the second images to be identified are modifiable images, fusing modified images matched with the second images to be identified and the other frame images corresponding to the second images to be identified to form second fusion images; (The bounding boxes are shifted by the lightweight object tracker 108 for each frame, [0090]-[0092].)

splicing the first fusion image, the second fusion images, and video frames which are not fused in the preset number of video frames according to the (The bounding boxes are output for display with the video frames, [0049], [0078], [0094].)

wherein the object positioning method comprises:

continuously intercepting a preset number of video frames in a video stream to be detected; (A subset of captured frames are sent to neural network detection system 104, [0058], [0137].)

detecting a first frame image in the intercepted video frames by a You Only Look Once (YOLO) object detection method to obtain a first positioning result corresponding to the first frame image; (Neural network detection system 104 includes a YOLO detector to detect objects in the subset of frames, [0052], [0055], [0135]-[0136].)

detecting each of other frame images in the intercepted video frames by a Kernel Correlation Filter (KCF) object tracking method according to a positioning result corresponding to a frame image prior to the each of the other frame images to obtain respective second positioning results corresponding to the other frame images; (Object positions are sent to lightweight object detector 108 which uses KCF object tracking to determine the positions of the objects for the remaining frames in the video stream, [0059], [0061], [0065], [0076], [0140].) and

keeping on continuously intercepting the preset number of video frames in the video stream to be detected and obtaining corresponding positioning results until the video stream to be detected is finished. (Fig. 4A: Process 400A ends when there is no on-going task in step 402, [0079].)

Consider claim 10, Sundaresan clearly teaches an object positioning apparatus, (Fig. 1) comprising:

a first video frame intercepting module, which is configured to continuously intercept a preset number of video frames in a video stream to be detected; (A subset of captured frames are sent to neural network detection system 104, [0058], [0137].)

a You Only Look Once (YOLO) detection module, which is configured to detect a first frame image in the intercepted video frames by a YOLO object detection method to obtain a first positioning result corresponding to the first frame image; (Neural network detection system 104 includes a YOLO detector to detect objects in the subset of frames, [0052], [0055], [0135]-[0136].)

a Kernel Correlation Filter (KCF) tracking module, which is configured to detect each of other frame images in the intercepted video frames by a KCF object tracking method according to a positioning result corresponding to a frame image prior to the each of the other frame images to obtain respective second positioning results corresponding to the other frame images; (Object positions are sent to lightweight object detector 108 which uses KCF object tracking to determine the positions of the objects for the remaining frames in the video stream, [0059], [0061], [0065], [0076], [0140].) and

a second video frame intercepting module, which is configured to keep on continuously intercepting the preset number of video frames in the video stream to be detected and obtain corresponding positioning results until the video stream to be detected is finished. (Fig. 4A: Process 400A ends when there is no on-going task in step 402, [0079].)

Consider claim 11, Sundaresan clearly teaches a video display apparatus, (Fig. 1) comprising:

a video frame obtaining module, which is configured to continuously intercept a preset number of video frames in a video stream to be detected; (A subset of captured frames are sent to neural network detection system 104, [0058], [0137].)

a first to-be-identified image determination module, which is configured to determine a first image to be identified corresponding to the first frame image according to the first positioning result corresponding to the first frame image in the intercepted video frames obtained from the object positioning method of claim 1; (Neural network detection system 104 includes a YOLO detector to detect objects in the subset of frames, [0052], [0055], [0135]-[0136].)

a first determination module, which is configured to determine whether the first image to be identified is a modifiable image; (Neural network detection system 104 determines if there are any recognized objects in the frame, [0059], [0070].)

a first fusion image obtaining module, which is configured to fuse a modified image matched with the first image to be identified and the first frame image to form a first fusion image according to the first positioning (Fig. 8C: A bounding box with object class label are combined with frame, [0136].)

a second to-be-identified image determination module, which is configured to determine respective second images to be identified corresponding to the other frame images according to respective second positioning results corresponding to other frame images in the intercepted video frames obtained from the object positioning method; (Object positions are sent to lightweight object detector 108 which uses KCF object tracking to determine the positions of the objects for the remaining frames in the video stream, [0059], [0061], [0065], [0076], [0140].)

a second determination module, which is configured to determine whether the second images to be identified are the modifiable images; (Lightweight object tracker 108 performs tracking reliability evaluation on frames not evaluated by the neural network detection system 104 and determines if the frame contains objects, [0083].)

a second fusion image obtaining module, which is configured to fuse modified images matched with the second images to be identified and the other frame images corresponding to the second images to be identified to form second fusion images according to the second positioning results in response to determining that the second images to be identified are modifiable images; (The bounding boxes are shifted by the lightweight object tracker 108 for each frame, [0090]-[0092].)

a video splicing module, which is configured to splice the first fusion image, the second fusion images, and the video frames which are not fused in the preset number of video frames according to the respective generation time of the preset number of video frames to generate a new video stream; and a video stream representing module, which is configured to represent a video corresponding to the new video stream to a user. (The bounding boxes are output for display with the video frames, [0049], [0078], [0094].)

Consider claim 12, Sundaresan clearly teaches a device, comprising: at least one processor; and a storage apparatus, which is configured to store at least one program; wherein when executed by the processor, the program causes the processor to implement the object positioning method of claim 1. ([0146], [0147])

Consider claim 13, Sundaresan clearly teaches a non-transitory computer-readable storage medium, comprising computer executable instructions for ([0146], [0147])

Consider claim 14, Sundaresan clearly teaches a device, comprising: at least one processor; and a storage apparatus, which is configured to store at least one program; wherein when executed by the processor, the program causes the processor to implement the video display method of claim 9. ([0146], [0147])

Consider claim 15, Sundaresan clearly teaches a non-transitory computer-readable storage medium, comprising computer executable instructions for performing the video display method of claim 9 when executed by a computer processor. ([0146], [0147])

Allowable Subject Matter
Claims 2 and 4-6 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
In the case of amending the claimed invention, applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN R SCHNURR whose telephone number is (571)270-1458.  The examiner can normally be reached on M-F 6a-4p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JOHN R SCHNURR/           Primary Examiner, Art Unit 2425