DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/24/2022 has been entered. 

Claim interpretations
Claim 17 recites a computer readable storage medium, which is explicitly defined in the applicant’s specification as not including signals per se, therefore, claim 17 is not rejected under 35 U.S.C. 101.






Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 2, 7, 8, and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Regarding claim 2, the limitation “the correcting the predicted result is generated from a network to analyze tracking results and provide for the correction of the predicted results” is unclear and confusing. It is unclear how the corrected predicted result is able to provide for the correction that already happened. Are there multiple corrections taking place? Please explain.
	Claim 7 recites “the estimating including using initial estimates of the sensed physical entities.” It is unclear to which estimating “the estimating” refers to: “estimating a number of skipping frames” of claim 1, or “estimating sensed physical entities” of claim 6.
Claim 8 recites the limitation “the plurality of sensors”. There is insufficient antecedent basis for this limitation in the claim.
	Claim 8 also recites “the estimating includes estimating sensed physical entities.” It is unclear to which estimating “the estimating” refers to: “estimating a number of skipping frames” of claim 1, or “estimating object states jointly over time” of claim 8.
	Claim 19 recites “the estimating including assuming initial estimates.” It is unclear to which estimating “the estimating” refers to: “estimating a number of skipping frames” of claim 17, or “estimating sensed physical entities” of claim 19.
	Claim 19 also recites “the corrected number of skipping frames”. There is insufficient antecedent basis for this limitation in the claim, because claim 17 rather recites a correction to results predicted based on estimation of a number of skipping frames, and does not recite any correction to the number of skipping frames itself.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 2, 4-8, 11-15, and 17-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Seo et al. (“Effective and efficient human action recognition using dynamic frame skipping and trajectory rejection”).
Regarding claim 1, Seo discloses:
obtaining input data (see fig 1, input video clip of an object);
estimating a number of skipping frames of the input data based on information from the input data (see section 3.1 and fig 1, determining a number of frames to skip (i.e., 0 ≤ i ≤ 5) based on the video clip);
predicting results based on the estimating of the number of skipping frames (see sections 3.1, 3.2, and fig 1, predicting motion of the object by excluding skipped frames, wherein the skipped frames are excluded according to the determined number of frames to skip); and
correcting the predicted results to perform multi-object tracking (see sections 3.1-3.2, correcting the predicted motion of the object by further determining motion of the object in the skipped frames through linear interpolation, depicted in “Dynamic frame skipping & motion interpolation” of fig 1)
to perform multi-object tracking (see section 3.2, the corrected predicted motion is for tracking multiple interest points, each of the interest points depicted as a square in “Dense trajectory detection & rejection” of fig 1).

Regarding claim 2, Seo further discloses:
the correcting the predicted results further comprises correcting the predicted results by linear interpolation (see rejection of claim 1, linear interpolation),
wherein the estimating of the number of skipping frames includes using tracklets based on co-occurrence of multiple parts of each of the plurality of objects (see section 3.1 and fig 2, using motion detected by multiple portions of multiple objects (e.g., motion derived by multiple portions of a car and human)).
wherein the correcting the predicted result is generated from a network to analyze tracking results and provide for the correction of the predicted results (see abstract, the linear interpolation and tracking of the multiple interest points are applied by a computer, which inherently forms a network of memory and processor).

Regarding claim 4, Seo further discloses:
wherein given an initialization in a first frame, extracting features in previous states and using the features to propose candidate object locations and identifies in later frames (see section 3.1 and fig 2, extracting object features in frame Ft for proposing and identifying object motion in frame Ft+1), and
wherein the input data includes trajectories based on co-occurrence of multiple parts of each of the plurality of objects (see section 3.1 and fig 2, the input video clip depicts motion resulting from co-occurring multiple parts of multiple objects).

Regarding claim 5, Seo further discloses:
wherein the input data is derived from sensors by recording images and performing measurements on objects (see section 5.1, video clips of objects recorded by cameras by making light measurements by the cameras), and
wherein the correcting the predicted result is via a correction network used to analyze tracking results (see abstract, the linear interpolation and tracking of the multiple interest points are applied by a computer, which inherently forms a network of memory and processor) and
provide guidance for the correction of the predicted results (see sections 3.1 and 3.2, the computer also guides itself while correcting the predicted motion automatically).

Regarding claim 6, Seo further discloses wherein the estimating includes estimating sensed physical entities from a plurality of measurements sensed from a plurality of sensors (see section 3.1 and fig 2, determining the number of frames to skip includes estimating motion of captured physical objects in video clips generated by cameras sensing light amounts).

Regarding claim 7, Seo further discloses:
the estimating including using initial estimates of the sensed physical entities from at least two measurements from the plurality of sensors (see section 4, estimating object motion and camera motion; and see section 5.1, based on input video clip generated by light measurements by multiple cameras), and
correcting the entity and sensor status from previous results (see section 4, refining the object motion and camera motion for reducing redundancy).

Regarding claim 8, Seo further discloses:
wherein the estimating includes estimating object states jointly over time (see fig 1 and 2, “Dense trajectory detection & rejection”, multiple co-occurring elements of a plurality of objects in the frames (e.g., multiple portions of a car and human) are tracked simultaneously over a period of time), and
wherein the estimating includes estimating sensed physical entities based on outputs of the plurality of sensors (see section 3.1 and fig 2, determining the number of frames to skip includes estimating motion of captured physical objects in video clips generated by cameras sensing light amounts).

Regarding claim 11, Seo discloses:
a network including: a memory storing computer instructions; and a processor configured to execute the computer instructions (see abstract, a computer) to:
obtaining input data (see fig 1, input video clip of an object);
estimating a number of skipping frames of the input data based on information from the input data (see section 3.1 and fig 1, determining a number of frames to skip (i.e., 0 ≤ i ≤ 5) based on the video clip);
predicting results based on the estimating of the number of skipping frames (see sections 3.1, 3.2, and fig 1, predicting motion of the object by excluding skipping frames, wherein the skipped frames are excluded according to the determined number of frames to skip); and
correcting the predicted results by interpolation (see sections 3.1-3.2, correcting the predicted motion of the object by further determining motion of the object in the skipped frames through linear interpolation, depicted in “Dynamic frame skipping & motion interpolation” of fig 1) to perform object tracking (see section 3.2, the corrected predicted motion is for tracking multiple interest points, each of the interest points depicted as a square in “Dense trajectory detection & rejection” of fig 1).

Regarding claim 12, Seo further discloses:
linearly interpolate the predicted results (see rejection of claim 11, linear interpolation),
wherein the correcting includes correcting the predicted results that are linearly interpolated (see section 3.2, applying a filter after the linear interpolation),
wherein the estimating of the number of skipping frames includes using tracklets based on co-occurrence of multiple parts of each of the plurality of objects (see section 3.1 and fig 2, using motion detected by multiple portions of multiple objects (e.g., motion derived by multiple portions of a car and human)).

Regarding claim 13, Seo further discloses:
wherein given an initialization in a first frame, extracting features in previous states and using the features to propose candidate object locations and identifies in later frames (see section 3.1 and fig 2, extracting object features in frame Ft for proposing and identifying object motion in frame Ft+1), and
wherein the input data includes trajectories based on co-occurrence of multiple parts of each of the plurality of objects (see section 3.1 and fig 2, the input video clip depicts motion resulting from co-occurring multiple parts of multiple objects).

Regarding claim 14, Seo further discloses:
wherein the input data is derived from a plurality of sensors by recording images and performing measurements on objects (see section 5.1, video clips of objects recorded by cameras by making light measurements by the cameras),
wherein the estimating includes estimating sensed physical entities from a plurality of measurements sensed from the plurality of sensors (see section 3.1 and fig 2, determining the number of frames to skip includes estimating motion of captured physical objects in video clips generated by cameras sensing light amounts).

Regarding claim 15, Seo further discloses:
wherein the estimating includes estimating object states jointly over time (see fig 1 and 2, “Dense trajectory detection & rejection”, multiple co-occurring elements of a plurality of objects in the frames (e.g., multiple portions of a car and human) are tracked simultaneously over a period of time)
wherein the correcting the predicted result is via a correction network used to analyze tracking results (see abstract, the linear interpolation and tracking of the multiple interest points are applied by a computer, which inherently forms a network of memory and processor) and
provide guidance for the correction of the predicted results (see sections 3.1 and 3.2, the computer also guides itself while correcting the predicted motion automatically)

Regarding claims 17, Seo discloses everything claimed as applied above (see rejection of claim 1).

Regarding claim 18, Seo further discloses:
linearly interpolate the predicted results (see rejection of claim 11, linear interpolation),
wherein the correcting includes correcting the predicted results that are linearly interpolated (see section 3.2, applying a filter after the linear interpolation),
wherein the estimating of the number of skipping frames includes using tracklets based on co-occurrence of multiple parts of each of the plurality of objects (see section 3.1 and fig 2, using motion detected by multiple portions of multiple objects (e.g., motion derived by multiple portions of a car and human)).

Regarding claim 19, Seo further discloses:
wherein the input data is derived from sensors by recording images and performing measurements on objects (see section 5.1, video clips of objects recorded by cameras by making light measurements by the cameras),
wherein the estimating includes estimating sensed physical entities from a plurality of measurements sensed from a plurality of sensors (see section 3.1 and fig 2, determining the number of frames to skip includes estimating motion of captured physical objects in video clips generated by cameras sensing light amounts),
wherein the estimating including assuming initial estimates of the sensed physical entities from the plurality of sensors (see section 4, estimating object motion and camera motion; and see section 5.1, based on input video clip generated by light measurements by multiple cameras), and
correcting the sensed physical entities and sensor status from previous results (see section 4, refining the object motion and camera motion for reducing redundancy), and
wherein the input data includes trajectories based on co-occurrence of multiple parts of each of the plurality of objects (see section 3.1 and fig 2, the input video clip depicts motion resulting from co-occurring multiple parts of multiple objects), and
the output includes the corrected number of skipping frames (see sections 3.1 and 3.2, the number of frames to skip is inherently outputted for succeeding algorithms).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Seo in view of Wu et al. (USPAPN 2011/0193978).
Regarding claim 10, Seo discloses everything claimed as applied above (see rejection of claim 1), however, does not disclose being cloud implemented.
In a similar field of endeavor of video processing, Wu discloses being cloud implemented (see para [70], video storage in cloud computing network).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Seo with Wu, and further provide a cloud computing network for storage for video clips to be processed, as disclosed by Wu, the purpose of easy storage and sharing (see Wu para [70]-[71]).

Allowable Subject Matter
Claims 3, 9, 16, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Seo does not disclose an inference model learning how many frames to skip.

Response to Arguments
Applicant's arguments filed 10/24/2022 have been fully considered but they are not persuasive.
Rejection regarding 35 U.S.C. 112(b)
In view of the applicant’s arguments, all previous rejections under 112(b) have been withdrawn. However, upon further consideration of the claim languages, claims 2, 7, 8, and 19 are rejected under 112(b).

Rejection regarding prior art
Regarding claim 1, the applicant argues that Seo fails to disclose the subject matter of the claim, specifically because:
i) Seo determines a number of frames to skip, however, does not use such skipped frames, therefore, does not disclose the claimed “based on the estimating of the number of skipping frames”.
The examiner respectfully disagrees. The claim does not recite that the results are predicted based on the skipping frames, but rather recites that the results are predicted based on the estimated number of skipping frames. Therefore, Seo’s utilization of the determined number of frames to exclude the skipping frames, reads on this limitation.
ii) Seo, then does not disclose using said skipping frames “to perform multi-object tracking”.
The examiner respectfully disagrees. The object motion, initially only detected in the un-skipped frames, is then further corrected via linear interpolation to include object motion in the skipped frames, thereby providing a complete object motion through the un-skipped frames and skipped frames. Fig 1 and 2 further describes that such tracking is for multiple objects.
Such generation of a complete object motion of multiple objects is considered equivalent to the claimed “multi-object tracking”, as the applicant’s claim does not specify sufficiently enough to be distinguished from Seo’s disclosure.

	Regarding claim 2, the applicant argues that Seo does not disclose “wherein the estimating of the number of skipping frames includes using tracklets based on co-occurrence of multiple parts of each of the plurality of objects”, specifically because:
	iii) There is no utilization of “tracklets” as defined by the applicant in the specification, as elements enabling tracking of objects even when it is not fully in view of the camera.
	The examiner respectfully disagrees. The specific example given by the applicant is not recited in the claim. None of the applicant’s claim recites of any tracking of occluded objects. 
iv) The correction is not generated “from a network to analyze tracking results and provide for the correction of the predicted results”, as amended in the claim.
The examiner respectfully disagrees. Seo discloses a computer, which is inherently a network formed of a processor and a memory. The applicant’s claim 11 also utilizes such “network” comprising a memory and processor. Seo’s computer is used for tracking the multiple objects and guides itself through multiple algorithms, therefore, reads on the claimed limitation.

	Regarding claim 4, the applicant argues:
	v) Seo does not disclose any multi-object tracking based on “co-occurrence of multiple parts of each of the plurality of objects” and are not based on “trajectories”.
The examiner respectfully disagrees. Fig 1 and 2 clearly describe multiple portions of multiple objects co-occuring in each frame of the video clip (e.g., multiple portions of a car and human occurring in the same frame).
Furthermore, the applicant’s claim merely recites that “the input data includes trajectories based on co-occurrence of multiple parts of each of the plurality of objects”. Fig 1 and 2 clearly depict trajectories of multiple portions of multiple objects co-occuring. The applicant’s claim does not recite anywhere that the claimed trajectories have been actively extracted/detected.

Regarding claim 8, the applicant argues:
vi) Seo does not disclose “estimating sensed physical entities based on outputs of the plurality of sensors”.
	The examiner respectfully disagrees. Seo discloses estimating motion of physical objects captured in video clips, wherein the video clips are generated by cameras. The applicant’s claim does not specify the claimed limitation “estimating sensed physical entities” enough to be distinguished from Seo’s physical objects with motion estimated.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO JIN PARK whose telephone number is (571)270-3569. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VU LE can be reached on (571)272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Soo Jin Park/Primary Examiner, Art Unit 2668