Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Claim 20 recites the following limitation: “a computer-readable recording medium storing instructions executable by a processor, wherein when executed, the instructions cause a processor of an electronic device to perform operations of”. In the other hand, the specification states: “… machine-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or may be distributed (e.g., downloaded or uploaded) via an application store (e.g., Play Store™), directly between two user devices (e.g., smartphones), or online”, (Par. 0073, Applicant’s specification). As shown in Par. 0073 of the specification,  the information is being exchanged via transmission by network, such as the internet, and the machine-readable storage medium, is therefore not limited to physical devices (Rom, Ram ...etc.) and includes a signal carrier wave; and a “signal”, “carrier wave”, or “transmission medium” are deemed non-statutory. The Examiner suggests amending the claim 20 to reflect such as: “a non-transitory computer-readable recording medium storing instructions executable by a processor, wherein when executed, the instructions cause a processor of an electronic 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 12, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Mehrseresht, (US-PGPUB 2018/0349704)

In regards to claim 1, Mehrseresht discloses an electronic device, (see at least: 
Fig. 13A), comprising: 
a memory (1306 in Fig. 13A) including at least one instruction; and a processor 
(1305 in Fig. 13A), wherein by executing the at least one instruction, the processor is configured to: 

an appearance-related feature value and a motion-related feature value from the video, (see at least: Fig. 1, steps 1-5-120, and Par. 0088-0089, identifying a plurality of people in the video sequence, and extracting video features, from inputted video clips 105 of the video sequence, [i.e., implicitly checking video features, “feature information”, corresponding to a video]. Further, Par. 0094, extracting appearance features such as color histogram, histogram of gradients, and scale invariant feature transform (SIFT) feature, [i.e., the extracted video features, “feature information”, implicitly includes appearance features of [players], “at least one of an appearance-related feature value and a motion-related feature value from the video”]);
calculate at least one of a starting score related to a starting point of an action instance, an ending score related to an ending point of an action instance, and a relatedness score between action instances on the basis of the feature information corresponding to the video, the action instances being included in the video, (see at least: Fig. 9, and Par. 0128, the action moment classification scores are determined for action moments defined by ground truth from the training set. Specifically, the temporal information (start and end frames) for each action moment is obtained from the ground truth. That is the action moment classification scores are implicitly determined for action moments based on the temporal information (start and end frames) for each action moment of the ground truth, [i.e., implicitly determining, “calculating”, based on the temporal information, at least one of starting score related to a starting point of an action moment, “action instance”, an ending score related to an ending point of an action moment, “action instance”, and a relatedness score between action instances on the 
generate an action proposal included in the video on the basis of the at least one score, (see at least: Fig. 10, and Par. 0136, obtaining an interaction feature from equation (6), by concatenating the moment classification scores 730 from start and end action moments of an interaction, [i.e., generating the interaction feature, “action proposal”, in the video on the basis of the moment classification scores 730 from start and end action moments of an interaction, “at least one score”]).

Regarding claim 12, claim 12 recites substantially similar limitations as set forth in claim 1. As such, claim 12 is in rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a method of an electronic device generating an action proposal”. However, Mehrseresht discloses the “method of an electronic device generating an action proposal”, (Mehrseresht, see at least: Fig. 2, Par. 0007)

Regarding claim 20, claim 20 recites substantially similar limitations as set forth in claim 1. As such, claim 20 is in rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a computer-readable recording medium storing instructions executable by a processor, wherein when executed, the instructions cause a processor of an electronic device to perform operations”. However, Mehrseresht discloses the “computer-readable recording medium storing instructions executable by a processor, wherein when executed, the .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrseresht, (US-PGPUB 2018/0349704) in view of Deever, (US-PGPUB 2011/0293250)

In regards to claim 2, Mehrseresht discloses the limitations of claim 1.
Mehrseresht does not expressly disclose wherein by executing the at least one instruction, the processor is configured to: group a plurality of frames constituting the 
However, Deever discloses group a plurality of frames constituting the video on a specified frame basis to generate a plurality of snippets, (see at least: Par. 0094, a form key video snippets step 270 forms key video snippets 275 corresponding to the highest-ranked key video frames 265, and a form video summary step 280 assembles the key video snippets 275 to form a video summary 285, [i.e., implicitly grouping a plurality of frames constituting the video on a specified frame basis to generate a plurality of snippets]); and check the at least one feature value corresponding to each of the plurality of snippets, (see at least: Par. 0122, the total time duration for the video summary is automatically determined, [i.e., the time duration is implicitly relative to local motion, “feature value”], and Par. 0115, discloses that the form key video snippets step 270 forms the key video snippets 275 according to the total time duration and the minimum time duration for the video summary and a criterion which specifies the minimum time duration for each of the key video snippets, [i.e., implicitly checking for time duration relative to local motion value, “feature value”, corresponding to each of the plurality of snippets]. See also Par. 0079).
Mehrseresht and Deever are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Mehrseresht, to include the steps 270-280, as though by Deever, in order to form key video snippets corresponding to the highest-ranked key video frames, and further grouping the key video snippets to form a video summary 285, (Deever, Par. 0094).
Regarding claim 13, claim 13 recites substantially similar limitations as set forth in claim 2. As such, claim 13 is in rejected for at least similar rational.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Mehrseresht, (US-PGPUB 2018/0349704) in view of Abdelhak et al, (US-PGPUB 2019/0050996)
Mehrseresht discloses the limitations of claim 1.
Mehrseresht does not expressly disclose wherein by executing the at least one instruction, the processor is configured to: sample at least some of the plurality of frames constituting the video; generate an RGB frame and a FLOW frame using the sampled frames; and perform at least one of an operation of determining the appearance-related feature value from the RGB frame and an operation of determining the motion-related feature value from the FLOW frame.
However, Abdelhak discloses sample at least some of the plurality of frames constituting the video, and generating an RGB frame and a FLOW frame using the sampled frames, (see at least: Par. 0016, The TSN approach sparsely samples short snippets from a given video, [i.e., sampling at least some of the plurality of frames constituting the video]), where RGB frames and stacked optical flow frames are extracted from the short samples, [i.e., implicitly generating an RGB frame and a FLOW frame using the sampled frames]. Abdelhak further discloses performing at least one of an operation of determining the appearance-related feature value from the RGB frame and an operation of determining the motion-related feature value from the FLOW frame, (see at least: Par. 0042, encoding high-level motion dynamics, capturing the granular local motion patterns as they occur over an extended time interval. Disclosed example motion in the representations shown in FIG. 7 and FIG. 8, [i.e., implicitly performing at least one of an operation of determining the appearance-related feature value from the RGB frame and an operation of determining the motion-related feature value from the FLOW frame, “high-level motion dynamics”]).
Mehrseresht and Abdelhak are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Mehrseresht, to include the EoT generator 102, as though by Abdelhak, in order to determine the high-level motion dynamics, (Abdelhak, Par. 0042).

Claims 4-6 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrseresht, (US-PGPUB 2018/0349704) in view of Deever, (US-PGPUB 2011/0293250); and further in view of Lan et al, (US-PGPUB 2019/0080176)

In regards to claim 4, Mehrseresht discloses the limitations of claim 1.
Mehrseresht does not expressly disclose wherein by executing the at least one 
instruction, the processor sets a relatedness score, which is of a location of one of snippet, a preceding snippet, and a following snippet in which a probability of including an action instance is greater than or equal to a first threshold probability, to be relatively high on the basis of the feature information corresponding to the video.
However, Deever discloses wherein by executing the at least one instruction, the processor sets a relatedness score, which is of a location of one of snippet, a preceding snippet, and a following snippet, to be relatively high on the basis of the feature information corresponding to the video, (see at least: Fig. 7, and Par. 0079, assigning a score is a function of the global motion and the local motion, (feature information corresponding to the video), and the video frame with the highest selection score within a time interval 235 can be selected as the key video frame 245 for that time interval 235, [i.e., setting a relatedness score, which is of a location of one of snippet, a preceding snippet, and a following snippet, to be relatively high on the basis of the feature information corresponding to the video. Note that the time interval between snippets is implicitly relative to the location of one of snippet, a preceding snippet, and a following snippet, as shown in Fig. 7]), 
Mehrseresht and Deever are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Mehrseresht, to assig a selection score to a plurality of video frames within each time interval, as though by Deever, in order to select the video frame with the highest selection score within a time interval as the key video frame 245 for that time interval 235, (Deever, Par. 0079)
The combine teaching Mehrseresht and Deever does not expressly disclose that the probability of including an action instance is greater than or equal to a first threshold probability.
Lan et al discloses the probability of including an action instance is greater than or equal to a first threshold probability, (see at least: Par. 0054, comparing the probabilities of the elements in the label vector with a threshold, [i.e., probability threshold], and the frame v.sub.t may be determined to be associated with one or more action labels with respect probabilities in the label vector higher than the threshold, [i.e., labeling vectors 
Mehrseresht and Deever and Lan et al are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Mehrseresht and Deever, to use the SoftMax layer 324, as though by Lan et al, in order to label vectors indicating an action in the frame with highest probability, (Lan et al, Par. 0054).

In regards to claim 5, Mehrseresht discloses the limitations of claim 1.
Mehrseresht does not expressly disclose wherein by executing the at least one 
instruction, the processor is configured to set a starting score, which is of a location of one of snippet and a preceding snippet in which a probability of being a starting point of an action instance is greater than or equal to a second threshold probability on the basis of the feature information corresponding to the video.
However, Deever discloses wherein by executing the at least one instruction, 
the processor is configured to set a starting score, which is of a location of one of snippet and a preceding snippet, on the basis of the feature information corresponding to the video, (see at least: Par. 0117, digital video sequence can be analyzed to determine an importance value as a function of time, and the start and end times for a key video snippet can be determined, [i.e., set a starting score, which is of a location of one of snippet and a preceding snippet on the basis of the feature information corresponding to the video]
Mehrseresht and Deever are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the start and end times for a key video snippet, (Deever, Par. 0117)
The combine teaching Mehrseresht and Deever does not expressly a 
probability of being a starting point of an action instance is greater than or equal to a second threshold probability on the basis of the feature information corresponding to the video.
However, Lan et al discloses the probability of being a starting point of an action instance is greater than or equal to a second threshold probability on the basis of the feature information corresponding to the video, (Par. 0087, the feature processing element may be caused to process the features based on the probability, and the forecast element with third predetermined parameters may be caused to generate a confidence of the frame being a special frame in which the action starts or ends based on the processed features. In response to the confidence exceeds a threshold, a forecast for the special frame may be determined, [i.e., the probability of a frame for being a starting point of an action instance is greater than or equal to a second threshold probability on the basis of the feature information corresponding to the video]).
Mehrseresht and Deever and Lan et al are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Mehrseresht and Deever, to use the FIC layer 336, as though by Lan et al, in order to determine that an action will take place or terminate soon, based on comparing the first or the second confidence to the threshold, (Lan et al, Par. 0072).


action instance is greater than or equal to a second threshold probability on the basis of the feature information corresponding to the video, (see at least: Fig. 5, Par. 0032, At process 520 of method 500, the classification module 131 performs per-frame class labeling on video frame I.sub.t at time t and generates action score distributions 205, (i.e., setting score). At process 530 of method 500, the localization module 132 generates an action-agnostic start probability indicating a likelihood that video frame I.sub.t may contain any action start. At process 540 of method 500, the fusion module 145 generates an action-specific start probability 210 corresponding to each action class, …. and at 550, the two-stage ODAS system 200 determines whether an action start of a specific action class is contained in the video frame, “based implicitly on the probability of being a starting point of an action instance is greater than or equal to a second threshold probability on the basis of the feature information corresponding to the video”).

In regards to claim 6, Mehrseresht discloses the limitations of claim 1.
Mehrseresht does not expressly disclose wherein by executing the at least one instruction, the processor is configured to set an ending score, which is of a location of one of a snippet and a following snippet in which a probability of being an ending point of an action instance is greater than or equal to a third threshold probability on the basis of the feature information corresponding to the video.



the processor is configured to set an ending score, which is of a location of one of snippet and a preceding snippet, on the basis of the feature information corresponding to the video, (see at least: Par. 0117, digital video sequence can be analyzed to determine an importance value as a function of time, and the start and end times for a key video snippet can be determined, [i.e., set an ending score, which is of a location of one of a snippet and a following snippet])
Mehrseresht and Deever are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Mehrseresht, to analyze the digital video sequence, as though by Deever, in order to determine the start and end times for a key video snippet, (Deever, Par. 0117).
The combine teaching Mehrseresht and Deever does not expressly discloses a probability of being an ending point of an action instance is greater than or equal to a third threshold probability on the basis of the feature information corresponding to the video.
However, Lan et al discloses a probability of being an ending point of an action instance is greater than or equal to a third threshold probability on the basis of the feature information corresponding to the video, (see at least: Par. 0061, the regression sub-network 330 may be designed to automatically provide the confidence(s) for a frame being the start and/or end points of an action based on the features of this frame learned by the feature learning sub-network 310, [i.e., implicitly setting scores for a frame being the start and/or end points of an action]; and Par. 0071-0072, discloses determining a first confidence for the current frame being the frame in which an action starts and/or a second confidence for the current frame being the frame in which an action 
Mehrseresht and Deever and Lan et al are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Mehrseresht and Deever, to use the FIC layer 336, as though by Lan et al, in order to determine that an action will take place or terminate soon, based on comparing the first or the second confidence to the threshold, (Lan et al, Par. 0072)

Regarding claim 14, claim 14 recites substantially similar limitations as set forth in claim 4. As such, claim 14 is in rejected for at least similar rational.

Regarding claim 15, claim 15 recites substantially similar limitations as set forth in claim 5. As such, claim 15 is in rejected for at least similar rational.

Regarding claim 16, claim 16 recites substantially similar limitations as set forth in claim 6. As such, claim 16 is in rejected for at least similar rational.

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrseresht, (US-PGPUB 2018/0349704) in view of Lan et al, (US-PGPUB 2019/0080176)

In regards to claim 7, Mehrseresht discloses the limitations of claim 1.
Furthermore, Mehrseresht discloses determine at least one-time section including a starting point and an ending point of each action instance on the basis of the at least one score, (see at least: Fig. 9, and Par. 0128, the action moment classification scores are determined for action moments defined by ground truth from the training set. Specifically, the temporal information (start and end frames) for each action moment is obtained from the ground truth, [i.e., implicitly determine at least one-time section including a starting point and an ending point of each action instance on the basis of the at least one score]).
Mehrseresht does not expressly determine at least one-time section including a starting point and an ending point of each action instance on the basis of the at least one score; and cut the at least one-time section out of the video to generate an action proposal included in the video.
However, Lan et al discloses determine at least one-time section including a starting point and an ending point of each action instance on the basis of the at least one score, and cut the at least one-time section out of the video to generate an action proposal included in the video, (see at least: Par. 0072, determining a first confidence for the current frame being the frame in which an action starts and/or a second confidence for the current frame being the frame in which an action ends, [i.e., determining at least one-
Mehrseresht and Lan et al are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Mehrseresht, to cut the negative frames, as though by Lan et al, in order to detect an action, (Lan et al, Par. 0074).

Regarding claim 17, claim 17 recites substantially similar limitations as set forth in claim 7. As such, claim 17 is in rejected for at least similar rational.

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrseresht, and Lan et al, as applied to claim 7 above; and further in view of Marvaniya et al, (US-PGPUB 2020/0257763)

In regards to claim 8, the combine teaching Mehrseresht and Lan et al discloses the limitations of claim 1.
The combine teaching Mehrseresht and Lan et al does not expressly disclose wherein the at least one time section may include at least one of a first time section corresponding to a combination of snippets in which a sum of the starting score and the ending score is relatively high, a second time section corresponding to a combination of 
Marvaniya et al discloses the second time section corresponding to a combination of snippets in which the relatedness score is greater than or equal to a threshold value, (see at least: Par. 0027, an information gain label may be assigned to each snippet, where the information gain of a snippet may be determined by a comparison of the importance score of snippets to that of a predetermined threshold value or threshold range, and at instances where the information gain of a snippet may be less than the threshold value, 209, the snippet may be merged with sequential snippets, either before or after the target snippet, if the information gain of the combined sequential snippets is nominal or less at 210, [i.e., implicitly the second time section corresponding to a combination of snippets in which the relatedness score is greater than or equal to a threshold value).
Mehrseresht and Lan et al and Marvaniya et al are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Mehrseresht and Lan et al, to compare the importance score of snippets to that of a predetermined threshold value, as though by Marvaniya et al, in order to combine the snippets based on information gain of snippet, (Par. 0027).

Regarding claim 18.
Claims 9-11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrseresht, and Lan et al, as applied to claim 7 above; and further in view of Yuzawa et al, (US-PGPUB 2013/0096860)

In regards to claim 9, the combine teaching Mehrseresht and Lan et al discloses the limitations of claim 1.
The combine teaching Mehrseresht and Lan et al does not expressly disclose select feature information corresponding to the at least one-time section from the feature information corresponding to the video; determine a correction offset of the action proposal on the basis of the selected feature information; and correct the action proposal on the basis of the determined correction offset.
However, Yuzawa discloses selecting feature information corresponding to the at least one-time section from the feature information corresponding to the video, (see at least: Fig. 11, Par. 0120, the user A (100A) has moved from a position 1150 to a measured position 1154A through a measured position 1152A to see a user B (100B}, [i.e., implicitly selecting the user movement corresponding to the time section between position 1154A through a measured position 1152A, corresponding to the video]); determining a correction offset of the action proposal on the basis of the selected feature information, and correct the action proposal on the basis of the determined correction offset, (see at least: Fig. 11, Par. 0124, the correction module 150 determines whether or not the preceding time (i.e., 00:00:00, which is specified in the "Time" column 1210 of the measurement data table 1200) is an action starting point), [i.e., determine a correction offset of the action proposal on the basis of the selected feature information]; and If the 
Mehrseresht and Lan et al and Yuzawa et al are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Mehrseresht and Lan et al, to use the correction module 150, as though by Yuzawa et al, in order to perform correction using the position of the action starting point, by linearly shifting the position from the action starting point, (Yuzawa, Par. 0124).

In regards to claim 10, the combine teaching Mehrseresht, Lan et al, and Yuzawa et al discloses the limitations of claim 9.
Furthermore, Mehrseresht discloses determining a proposal score corresponding to reliability of the action proposal on the basis of the selected feature information, (Mehrseresht, see at least: Fig. 10, and Par. 0136, obtaining an interaction feature from equation (6), by concatenating the moment classification scores 730 from start and end action moments of an interaction, [i.e., generating the interaction feature, “action proposal”, in the video on the basis of the moment classification scores 730 from start and end action moments of an interaction, “at least one score”]); and 
The other hand, Yuzawa et al discloses determining the correction offset for an action proposal in which the proposal score is greater than or equal to a specified score, (Yuzawa et al, see at least: Par. 0122-0124, performing a correction on the basis of a 

In regards to claim 11, the combine teaching Mehrseresht and Lan et al discloses the limitations of claim 1.
The combine teaching Mehrseresht and Lan et al does not expressly disclose wherein by executing the at least one instruction, the processor is configured to: expand the at least one-time section to include at least a portion of a specified preceding time section or a specified following time section; and select feature information corresponding to the expanded time section.
However, Yuzawa discloses expanding the at least one-time section to include at least a portion of a specified preceding time section or a specified following time section, and selecting feature information corresponding to the expanded time section, (see at least: Fig. 11, and Par. 0122-0124, the correction module 150 adds -0.5 to the value in the x coordinate of the position 1156 (-1.5, -2.0), which is the action starting point of the user B (100B), [i.e., implicitly expanding the at least one-time section to include at least a portion of a specified preceding time section or a specified following time section]; and determining the moving destination of the user (A) or (B), using the position of the one of action starting point or action ending point, [i.e., selecting feature information corresponding to the expanded time section]).


Regarding claim 19, claim 19 recites substantially similar limitations as set forth in claim 9. As such, claim 19 is in rejected for at least similar rational.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)270-1670. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571)272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        03/04/2022