Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s response to the last office action, filed December 9, 2021 has been entered and made of record. Claims 1, 7-10, 13, 19-22, 25, and 31 have been amended, and claim 33 has been added. By this amendment, claims 1-33 are pending in this application.

Response to Arguments
Applicant’s arguments with respect to claims 1-32 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan, (US-PGPUB 2009/0232417) in view of Lokshin et al, (US-PGPUB 2017/0118539)

In regards to claim 1, McMahan discloses a method for managing video 
captured by an imaging device, comprising: 
capturing a video in response to a capture command received at the imaging device, (see at least: Abstract, and Par. 0001, and Par. 0025, the digital camera 10 captures images, implicitly in response to a command by a user at the imaging device); and 
classifying the captured video based on feature(s) extracted from the captured video, (see at least: Par. 0014, and 0025, analyzing a digitally captured image to identify one or more recognizable objects in the image automatically, using the information, [i.e., feature(s)], from sensors associated with the digital camera 10 that is used to identify the objects. Further, Par. 0026, discloses that when analyzing an image, the present invention classifies the different subjects 42, 44, 46 as being either a "static" object or a "dynamic" object, [i.e., classifying the captured video based on feature(s) extracted from the captured video]);  

However, Lokshin discloses the capturing a video of an event, (Par. 0025, a plurality of cameras may be utilized to capture timestamped video of event such as sporting events); classifying the captured video based on feature(s) extracted from the captured video, (see at least: step 202 in Fig. 2, and Par. 0040, the machine learning algorithm or predictive model may then use these inputs as a training set to automatically classify video frames based on the sensor record data, [i.e., the sensor record data represent the feature(s) extracted from the captured video]); and identifying frames in the captured video that represent the event based on the classification, (see at least: step 208 in Fig. 2, and Par. 0043, selecting frames where selected feature or event is present, [i.e., identifying frames in the captured video that represent the event based implicitly on the classification at step 202]); and generating a media item from a subset of the captured video according to the identifying, (see at least: step 216 in Fig. 2, and Par. 0048, calculating a three-dimensional position of the sensor recording device for each frame in the subset of video frames; and determining a set of image areas for each frame, “media item”, in the subset of video frames, [i.e., generating a media item from a subset of the captured video implicitly according to the identifying at step 208]).
McMahan and Lokshin are combinable because they are both concerned with image recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify McMahan, to use steps 202-216, as though by Lokshin, in order to select 

In regards to claim 2, the combine teaching McMahan and Lokshin as whole discloses the limitations of claim 1. 
Furthermore, McMahan discloses wherein: when the classifying identifies the captured video as representing a static event, the generated media item is a still image; and when the classifying identifies the captured video as representing a dynamic event, the generated media item is a video, (see at least: Par. 0027, based on its classification, the present invention selects an appropriate recognition algorithm to identify the object, using any known technique to recognize a given static or dynamic object, [i.e., the classifying identifies the captured video as representing a static event or dynamic event]. Further, Par. 0020, discloses that the display 24 displays an image or video for a user almost immediately after the user captures the image, [accordingly, the display 24 displays an image, “i.e., the generated media item is a still image”, when the classification identifies a static object, “i.e., object being implicitly in static state not performing any activity or event”, while the display 24 displays a video, “i.e., the generated media item is a video”, when the classification identifies a dynamic object, “i.e., object performing activity such as movement”]).

In regards to claim 3, the combine teaching McMahan and Lokshin as whole discloses the limitations of claim 1. 
wherein the feature(s) extracted are derived from object detection analysis, (see at least: Par. 0025, analyzing a digitally captured image to identify one or more recognizable objects in the image automatically, using the information, “feature(s)”, from sensors associated with the digital camera 10 that is used to identify the objects, [i.e., sensors information such as location provided by GPS is implicitly derived from the object detection analysis], and the classifying is based on detection of a predetermined object type from the captured video, (Par. 0026, when analyzing an image, the present invention classifies the different subjects 42, 44, 46 as being either a "static" object or a "dynamic" object”, [i.e., classifies the different subjects is based on detection of the subjects as being either a "static" object or a "dynamic" object, “predetermined object type”]).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan and Lokshin, as applied to claim 1; and further in view of Newell et al, (US-PGPUB 2008/0306995)
The combine teaching McMahan and Lokshin as whole discloses the limitations of claim 1.
The combine teaching McMahan and Lokshin as whole does not expressly disclose wherein the feature(s) extracted are derived from scene recognition analysis and the classifying is based on recognition of a predetermined scene type from the captured video.
However, Newell et al discloses algorithms that can include scene classifiers which identify or classify a scene into one or more scene types (i.e., beach, indoor, etc.), [i.e., activities (i.e., running, etc.), [i.e., the one or more activities “e.g., running” represents the extracted feature(s), which are implicitly derived from scene recognition analysis]), (see at least: Par. 0024)
McMahan and Lokshin and Newell et al are combinable because they are both concerned with object(s) recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan and Lokshin, to use scene classifiers, as though by Newell et al, in order to identify or classify a scene into one or more scene types, (Newell, Par. 0024)

The following prior art of record, Dareddy et al, (US-PGPUB 2020/0186897), is 
pertinent to claim 4, as it discloses also the limitation: “wherein the feature(s) extracted are derived from scene recognition analysis”, (Par. 0098, a second module 504b may be configured to detect events, “feature(s)”, occurring within the video game), and the classifying is based on recognition of a predetermined scene type from the captured video, (Par. 0098, the first module 504a may be trained to classify the video frames into different scene-types).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan and Lokshin, as applied to claim 1; and further in view of Kehtarnavaz et al, (US-PGPUB 2016/0292497)
The combine teaching McMahan and Lokshin as whole discloses the limitations of claim 1.

Kehtarnavaz discloses wherein the feature(s) extracted are derived from motion recognition analysis, (see at least: Par. 0021-0024, a movement recognition system 100 utilizing an inertial sensor 106 and a depth sensor 108, where the inertial sensor 106 may measure information corresponding to an object's inertial movement, and depth sensor 108 may measure a three dimensional shape of object 104, [i.e., the “information corresponding to an object's inertial movement”, and “the three dimensional shape of object”, correspond to the extracted feature(s) derived from motion recognition analysis]), and the classifying is based on recognition of a predetermined motion type from the captured video, (Par. 0030, the single HMM classification logic 302 may be configured to determine a type of movement of object (i.e., classify a movement) utilizing the signals from both the inertial sensor 106 and the depth sensor 108 by utilizing a HMM classifier, [i.e., the classifying is implicitly based on recognition of a predetermined motion type from the captured video).
McMahan and Lokshin and Kehtarnavaz et al are combinable because they are both concerned with feature(s) recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan and Lokshin, to use the movement recognition system 100, and HMM classification logic 302, as though by Kehtarnavaz, in order to measure information corresponding to an object's 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan and Lokshin, as applied to claim 1; and further in view of Abramson et al, (US-PGPUB 2017/0294210)
The combine teaching McMahan and Lokshin as whole discloses the limitations of claim 1.
Furthermore, McMahan discloses wherein the feature(s) extracted are derived from object detection analyses, (see at least: Par. 0006, the recognition algorithm may operate to identify a person's face).
The combine teaching McMahan and Lokshin as whole does not expressly disclose wherein the feature(s) extracted are derived from motion recognition and wherein, when a detected object is recognized to have motion that is greater than a threshold amount, the classifying identifies the captured video as representing a dynamic event, and when the detected object is recognized to have motion that is lower than a threshold amount, the classifying identifies the captured video as representing a static event.
However, Abramson discloses wherein, the feature(s) extracted are derived from motion recognition, (see at least: Par. 0035, movements in the video are segmented, such as a controller may segment moving areas and non-moving areas in the series of frames based on color, [i.e., accordingly, the color, “feature(s)” may be extracted from the moving areas and non-moving areas based on motion recognition], and wherein, when a detected static while portions of the images with movement may be referred to as dynamic, such as an area of no movement is an area where object motion does not exceed a distance threshold, [i.e., when the detected object within the area where object motion does not exceed a distance threshold, implicitly identifying the captured video as representing a static event, and explicitly while when the detected object within the area where object motion exceed a distance threshold, identifying the captured video as representing a dynamic event]).
McMahan and Lokshin and Abramson et al are combinable because they are both concerned with object classification. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan and Lokshin, to compare the object motion area to distance threshold, as though Abramson et al, in order to determine whether the portions of the images are static or dynamic based on the distance threshold, (Abramson, Par. 0037)

Claims 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan and Lokshin, as applied to claim 2; and further in view of Vijayanarasimhan et al, (US-PGPUB 2019/0114487)
In regards to claim 7, the combine teaching McMahan and Lokshin as whole discloses the limitations of claim 2.
Furthermore, Lokshin capturing timestamped video of event such as sporting events, such as determining a time T between the start and end of an event, (Par. 0025, 0030-0031).
However the combine teaching McMahan and Lokshin as whole does not expressly disclose when the classifying identifies the captured video as representing a dynamic event, the identifying frames in the captured video based on the classification comprises identifying a beginning or an end of the event.
Vijayanarasimhan discloses when the classifying identifies the captured video as representing a dynamic event, the identifying frames in the captured video based on the classification comprises identifying a beginning or an end of the event, (see at least: Fig. 6, and Par. 0101, at block 606, a video segment is identified from the video that includes frames between the start time and the end time for the action in the video, implicitly based on classification at block 604, [i.e., the identifying frames in the captured video based on the classification implicitly comprises identifying a beginning or an end of the event when the classifying identifies the captured video as representing a dynamic event, “event”]).
McMahan and Lokshin and Vijayanarasimhan et al are combinable because they are both concerned with object classification. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan and Lokshin, to use the blocks 604, 604, as though by Vijayanarasimhan, in order to identify the video segment that includes frames between the start time and the end time for the action in the video, (Vijayanarasimhan, Par. 0101)
In regards to claim 8, the combine teaching McMahan, Lokshin et al, Vijayanarasimhan et al as whole discloses the limitations of claim 7.
Furthermore, Vijayanarasimhan et al discloses wherein the identifying a beginning or an end of the event is based on appearance or disappearance of detected object(s) in the captured video, (Vijayanarasimhan, see at least: Par. 0019, a motion classifier may be used to identify motions in the video that correspond to actions. A start time and an end time of each action may be identified based on application of the motion classifier, which may identify motion associated with the person jumping into the pool, [i.e., tart time and an end time of the action is implicitly based on appearance or disappearance of the person jumping into the pool]).

In regards to claim 9, the combine teaching McMahan, Lokshin et al, Vijayanarasimhan et al as whole discloses the limitations of claim 7.
Furthermore, Vijayanarasimhan et al discloses wherein the identifying a beginning or an end of the event is based on an act associated with a recognized predefined action type in the captured image, (see at least: Par. 0019, a start time and an end time of each action may be identified based on application of the motion classifier, which may identify motion associated with the person jumping into the pool, [i.e., tart time and an end time of the action is implicitly based on act associated with a recognized predefined action type, “person jumping into the pool”, in the captured image]).

In regards to claim 10, the combine teaching McMahan, Lokshin et al, Vijayanarasimhan et al as whole discloses the limitations of claim 7.
.

Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan and Lokshin, as applied to claim 2; and further in view of Sakaida et al, (US-PGPUB 2014/0268247)

In regards to claim 11, the combine teaching McMahan and Lokshin as whole discloses the limitations of claim 2.
Furthermore, McMahan discloses that when the classifying identifies the captured video as representing a static event, the generated media item is a still image; (McMahan, see at least: Par. 0027, based on its classification, the present invention selects an appropriate recognition algorithm to identify the object, using any known technique to recognize a given static or dynamic object, [i.e., the classifying identifies the captured video as representing a static event or dynamic event]).

However, Feng et al discloses determining quality levels of frame(s) from the captured video; and if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, ), the system selects a frame with the acceptable level of quality, (see at least: Fig. 3A, steps 300-308, and Par. 0057-0059, obtaining a sequence of video frames of a target object, wherein the target object is a face (operation 302); and evaluating the sequence of video frames to determine a set of frames with an acceptable level of quality based on scores for features associated with a respective frame (decision 306), [i.e., determining quality levels of frame(s) from the captured video]; If the result of the evaluation is not greater than a predetermined threshold for the acceptable level of quality (decision 3060), the system prompt the user to improve the quality of the captured target object, (e.g., capture text, image, audio, or video data); and If the result of the evaluation is greater than the predetermined threshold for the acceptable level of quality (decision 306), the system selects a frame (e.g., from the determined set of frames with the acceptable level of quality) with the acceptable level of quality (operation 308), [i.e., if a determined quality level of a frame of the frame(s) is higher than the predetermined threshold), the system selects a frame with the acceptable level of quality]).

However, the combine teaching McMahan, Lokshin, and Feng et al as whole does not expressly disclose determining a quality level of the still image; and prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame.
Sakaida et al discloses determining a quality level of the still image, (see at least: Par. 0080, generating preview images 234 by the digital image sensor, computing first quality metric, and comparing the first quality to a first quality threshold, (see at least: Par. 0080, implicitly determining a quality level of the still image); and if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame, (see at least: Par. 0081-0082, the captured image is evaluated immediately after capture to verify that the captured image 236 is of sufficient quality before going on to the next page of the document, by evaluating the captured still image 236 according to a second quality metric, and notifying the user that a still image has been captured when the second quality metric for the captured image is at or above the second quality threshold, [i.e., if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device]. Further, Par. 0093, in another embodiment, the scanning application prompting the user of the imaging device that the determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image; and if authorized by the user, replacing the still image with the higher quality frame]).
McMahan, Lokshin, Feng et al, and Sakaida et al are combinable because they are both concerned with object detection. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine McMahan, Lokshin, and Feng et al, to evaluate the captured image 236 according to a quality metric, as though by Sakaida et al, in order to replace the selected captured still images with higher quality captured images, (Sakaida, Par. 0093)

In regards to claim 12, the combine teaching McMahan, Lokshin, Feng et al, and Sakaida et al as whole discloses the limitations of claim 11.
Furthermore, Feng et al discloses wherein a quality level of a frame is determined based on recognizing an object's state in the frame, wherein a state of an object comprises a pose, an orientation, or an appearance, (Feng, see at least: Par. 0032, evaluating the quality of an image and selects facial images with a quality which exceeds a certain threshold, [i.e., quality level of a frame is determined based on recognizing an object's state, such as a face, “object’s appearance”]).

Claims 13-15, 19-22, 25-27, and 31 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan, (US-PGPUB 2009/0232417) in view of Lokshin et al, (US-PGPUB 2017/0118539); and further in view of Vijayanarasimhan et al, (US-PGPUB 2019/0114487)

In regards to claim 13, McMahan discloses a computer system, comprising: 
at least one processor, (16 in Fig. 2) associated with an imaging device, (14 in 
Fig. 2); at least one memory, (22 in Fig. 2), comprising instructions configured to be executed by the at least one processor to perform a method comprising:
capturing a video in response to a capture command received at the imaging device, (see at least: Abstract, and Par. 0001, and Par. 0025, the digital camera 10 captures images, implicitly in response to a command by a user at the imaging device); and 
classifying the captured video based on feature(s) extracted from the captured video, (see at least: Par. 0014, and 0025, analyzing a digitally captured image to identify one or more recognizable objects in the image automatically, using the information, [i.e., feature(s)], from sensors associated with the digital camera 10 that is used to identify the objects. Further, Par. 0026, discloses that when analyzing an image, the present invention classifies the different subjects 42, 44, 46 as being either a "static" object or a "dynamic" object, [i.e., classifying the captured video based on feature(s) extracted from the captured video]);  
McMahan does not expressly disclose capturing a video of an event; identifying frames in the captured video that represent the event based on the classification; and 
However, Lokshin discloses the capturing a video of an event, (Par. 0025, a plurality of cameras may be utilized to capture timestamped video of event such as sporting events); classifying the captured video based on feature(s) extracted from the captured video, (see at least: step 202 in Fig. 2, and Par. 0040, the machine learning algorithm or predictive model may then use these inputs as a training set to automatically classify video frames based on the sensor record data, [i.e., the sensor record data represent the feature(s) extracted from the captured video]); and identifying frames in the captured video that represent the event based on the classification, (see at least: step 208 in Fig. 2, and Par. 0043, selecting frames where selected feature or event is present, [i.e., identifying frames in the captured video that represent the event based implicitly on the classification at step 202]); and generating a media item from a subset of the captured video according to the identifying, (see at least: step 216 in Fig. 2, and Par. 0048, calculating a three-dimensional position of the sensor recording device for each frame in the subset of video frames; and determining a set of image areas for each frame, “media item”, in the subset of video frames, [i.e., generating a media item from a subset of the captured video implicitly according to the identifying at step 208]).
McMahan and Lokshin are combinable because they are both concerned with image recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify McMahan, to use steps 202-216, as though by Lokshin, in order to select frames where selected feature or event is present, (Lokshin, Par. 0043) to thereby improve a video recognition, (Lokshin, Par. 0012)

Vijayanarasimhan discloses generating a summary media item from a subset of the captured video according to the identifying, (see at least: Fig. 6, and Par. 0101, identifying a video segment from the video that includes frames between the start time and the end time for the action in the video, (606 in Fig. 6); and generating video clip, (summary media item), that includes the video segment, (610 in Fig. 6), [i.e., generating a summary media item from the video that includes frames between the start time and the end time, for the action in the video, “subset of the captured video according to the identifying”]. Note that the video clip is a summary of the action that occurred during the live video, (see at least: Par. 0006)).
McMahan and Lokshin and Vijayanarasimhan are combinable because they are all concerned with image recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan and Lokshin, to include blocks 606-610, as though by Vijayanarasimhan, in order to identify video segments that include actions in them, (Vijayanarasimhan, see at least: Par. 0008)

In regards to claim 14, the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 13. 
Furthermore, McMahan discloses wherein: when the classifying identifies the captured video as representing a static event, the generated media item is a still image; and when the classifying identifies the captured video as representing a dynamic event, 

In regards to claim 15, the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 13. 
Furthermore, McMahan discloses wherein the feature(s) extracted are derived from object detection analysis, (McMahan, see at least: Par. 0025, analyzing a digitally captured image to identify one or more recognizable objects in the image automatically, using the information, “feature(s)”, from sensors associated with the digital camera 10 that is used to identify the objects, [i.e., sensors information such as location provided by GPS is implicitly derived from the object detection analysis], and the classifying is based on detection of a predetermined object type from the captured video, (McMahan, Par. 0026, when analyzing an image, the present invention classifies the different subjects 42, 44, 46 as being either a "static" object or a "dynamic" object”, [i.e., classifies the different 

In regards to claim 19, the combine teaching the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 13.
Furthermore, Vijayanarasimhan discloses when the classifying identifies the captured video as representing a dynamic event, the identifying frames in the captured video based on the classification comprises identifying a beginning or an end of the event, (Vijayanarasimhan, see at least: Fig. 6, and Par. 0101, at block 606, a video segment is identified from the video that includes frames between the start time and the end time for the action in the video, implicitly based on classification at block 604, [i.e., the identifying frames in the captured video based on the classification implicitly comprises identifying a beginning or an end of the event when the classifying identifies the captured video as representing a dynamic event, “event”]).

In regards to claim 20, the combine teaching the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 19.
Furthermore, Vijayanarasimhan et al discloses wherein the identifying a beginning or an end of the event is based on appearance or disappearance of detected object(s) in the captured video, (Vijayanarasimhan, see at least: Par. 0019, a motion classifier may be used to identify motions in the video that correspond to actions. A start time and an end time of each action may be identified based on application of the motion classifier, which may identify motion associated with the person jumping into the pool, [i.e., tart time 

In regards to claim 21, the combine teaching the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 19.
Furthermore, Vijayanarasimhan et al discloses wherein the identifying a beginning or an end of the event is based on an act associated with a recognized predefined action type in the captured image, (see at least: Par. 0019, a start time and an end time of each action may be identified based on application of the motion classifier, which may identify motion associated with the person jumping into the pool, [i.e., tart time and an end time of the action is implicitly based on act associated with a recognized predefined action type, “person jumping into the pool”, in the captured image]).

In regards to claim 22, the combine teaching the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 19.
Furthermore, Vijayanarasimhan et al discloses wherein the identifying a beginning or an end of the event is based on a location in the captured video temporally related to a receiving time of the captured command, (Vijayanarasimhan, see at least: Par. 0021, the video application may generate graphical data for displaying a user interface that shows time locations within the video where the actions from the subset of the video segments occurred, [i.e., the beginning or an end of the event is implicitly based on a location in the captured video temporally related to a receiving time of the captured 

Regarding claim 25, claim 25 recites substantially similar limitations as set forth in claim 13. As such, claim 25 is in rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a non-transitory computer-readable medium comprising instructions executable by at least one processor associated with an imaging device”. However, McMahan discloses the “non-transitory computer-readable medium comprising instructions executable by at least one processor associated with an imaging device”, (see at least: Par. 0019, a computer program instructions and data required for operation are stored in non-volatile memory).

Regarding claim 26, claim 26 recites substantially similar limitations as set forth in claim 14. As such, claim 26 is in rejected for at least similar rational.

Regarding claim 27, claim 27 recites substantially similar limitations as set forth in claim 15. As such, claim 27 is in rejected for at least similar rational.

Regarding claim 31, claim 31 recites substantially similar limitations as set forth in claim 19. As such, claim 31 is in rejected for at least similar rational.

Claims 16 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan, Lokshin, and Vijayanarasimhan, as applied to claim 13; and further in view of Newell et al, (US-PGPUB 2008/0306995)

In regards to claim 16, the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 13. 
The combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole does not expressly disclose wherein the feature(s) extracted are derived from scene recognition analysis and the classifying is based on recognition of a predetermined scene type from the captured video.
However, Newell et al discloses algorithms that can include scene classifiers which identify or classify a scene into one or more scene types (i.e., beach, indoor, etc.), [i.e., classifying is based on recognition of a predetermined scene type from the captured video], or one or more activities (i.e., running, etc.), [i.e., the one or more activities “e.g., running” represents the extracted feature(s), which are implicitly derived from scene recognition analysis]), (see at least: Par. 0024)
McMahan, Lokshin, Vijayanarasimhan, and Newell et al are combinable because they are all concerned with object(s) recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan, Lokshin, and Vijayanarasimhan, to use scene classifiers, as though by Newell et al, in order to identify or classify a scene into one or more scene types, (Newell, Par. 0024)

Regarding claim 28, claim 28 recites substantially similar limitations as set forth in claim 16. As such, claim 28 is in rejected for at least similar rational.

Claims 17 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan, Lokshin, and Vijayanarasimhan, as applied to claim 13; and further in view of Kehtarnavaz et al, (US-PGPUB 2016/0292497)

In regards to claim 17, the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 13.
The combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole does not expressly disclose wherein the feature(s) extracted are derived from motion recognition analysis and the classifying is based on recognition of a predetermined motion type from the captured video.
Kehtarnavaz discloses wherein the feature(s) extracted are derived from motion recognition analysis, (see at least: Par. 0021-0024, a movement recognition system 100 utilizing an inertial sensor 106 and a depth sensor 108, where the inertial sensor 106 may measure information corresponding to an object's inertial movement, and depth sensor 108 may measure a three dimensional shape of object 104, [i.e., the “information corresponding to an object's inertial movement”, and “the three dimensional shape of object”, correspond to the extracted feature(s) derived from motion recognition analysis]), and the classifying is based on recognition of a predetermined motion type from the captured video, (Par. 0030, the single HMM classification logic 302 may be configured to determine a type of movement of object (i.e., classify a movement) utilizing the signals  classifying is implicitly based on recognition of a predetermined motion type from the captured video).
McMahan, Lokshin, Vijayanarasimhan, and Kehtarnavaz et al are combinable because they are both all concerned with feature(s) recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan, Lokshin, and Vijayanarasimhan, to use the movement recognition system 100, and HMM classification logic 302, as though by Kehtarnavaz, in order to measure information corresponding to an object's inertial movement, and a shape of the object (Par. 0021-0024), and further determining a type of movement of object, (Kehtarnavaz, Par. 0030)

Regarding claim 29, claim 29 recites substantially similar limitations as set forth in claim 17. As such, claim 29 is in rejected for at least similar rational.

Claims 18 and 30 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan, Lokshin, and Vijayanarasimhan, as applied to claims 13 and 25 above; and further in view of Abramson et al, (US-PGPUB 2017/0294210)

In regards to claim 18, the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 13.

The combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole does not expressly disclose wherein the feature(s) extracted are derived from motion recognition and wherein, when a detected object is recognized to have motion that is greater than a threshold amount, the classifying identifies the captured video as representing a dynamic event, and when the detected object is recognized to have motion that is lower than a threshold amount, the classifying identifies the captured video as representing a static event.
However, Abramson discloses wherein, the feature(s) extracted are derived from motion recognition, (see at least: Par. 0035, movements in the video are segmented, such as a controller may segment moving areas and non-moving areas in the series of frames based on color, [i.e., accordingly, the color, “feature(s)” may be extracted from the moving areas and non-moving areas based on motion recognition], and wherein, when a detected object is recognized to have motion that is greater than a threshold amount, the classifying identifies the captured video as representing a dynamic event, and when the detected object is recognized to have motion that is lower than a threshold amount, the classifying identifies the captured video as representing a static event, (see at least: block 202, 204, in Fig. 2, and Par. 0035-0037, the segmentation may refer to distinguishing dynamic objects from static objects in the series of images or each frame of the video; and that a portions of the images with no movement may be referred to as static while portions of the images with movement may be referred to as dynamic, such as an area of no 
McMahan, Lokshin, Vijayanarasimhan, and Abramson et al are combinable because they are both concerned with object classification. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan, Lokshin, and Vijayanarasimhan, to compare the object motion area to distance threshold, as though Abramson et al, in order to determine whether the portions of the images are static or dynamic based on the distance threshold, (Abramson, Par. 0037)

Regarding claim 30, claim 30 recites substantially similar limitations as set forth in claim 18. As such, claim 30 is in rejected for at least similar rational.

Claims 23-24, and 32 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan, Lokshin, and Vijayanarasimhan, as applied to claims 13 and 26; and further in view of Sakaida et al, (US-PGPUB 2014/0268247)

In regards to claim 23, the combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole discloses the limitations of claim 13.
Furthermore, McMahan discloses that when the classifying identifies the captured video as representing a static event, the generated media item is a still image; (McMahan, 
The combine teaching McMahan, Lokshin, and Vijayanarasimhan as whole does not expressly the determining a quality level of the still image; determining quality levels of frame(s) from the captured video; if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame.
However, Feng et al discloses determining quality levels of frame(s) from the captured video; and if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, ), the system selects a frame with the acceptable level of quality, (see at least: Fig. 3A, steps 300-308, and Par. 0057-0059, obtaining a sequence of video frames of a target object, wherein the target object is a face (operation 302); and evaluating the sequence of video frames to determine a set of frames with an acceptable level of quality based on scores for features associated with a respective frame (decision 306), [i.e., determining quality levels of frame(s) from the captured video]; If the result of the evaluation is not greater than a predetermined threshold for the acceptable level of quality (decision 3060), the system prompt the user to improve the quality of the captured target object, (e.g., capture text, image, audio, or video data); and If the result of the evaluation is greater than the predetermined threshold for the acceptable level of quality (decision 306), the system selects a frame (e.g., from the system selects a frame with the acceptable level of quality]).
McMahan, Lokshin, Vijayanarasimhan, and Feng et al are combinable because they are both concerned with object classification. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching McMahan, Lokshin, and Vijayanarasimhan, to include steps 302-308, as though by Feng et al, in order to select a frame with the acceptable level of quality, (Feng, Par. 0059).
However, the combine teaching McMahan, Lokshin, Vijayanarasimhan, and Feng et al as whole does not expressly disclose determining a quality level of the still image; and prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame.
Sakaida et al discloses determining a quality level of the still image, (see at least: Par. 0080, generating preview images 234 by the digital image sensor, computing first quality metric, and comparing the first quality to a first quality threshold, (see at least: Par. 0080, implicitly determining a quality level of the still image); and if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame, (see at least: Par. 0081-0082, the captured image is evaluated immediately after capture to verify that the captured image 236 is of sufficient quality before going on to the next page of the document, by evaluating the captured still image 236 according to a second quality metric, and notifying the user that a still image if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device]. Further, Par. 0093, in another embodiment, the scanning application prompts the user to review the captured still images in the sorted order, (Fig. 9B), where the scanning application repeats the monitoring and capturing operations for images selected by the user, thereby replacing selected captured still images with higher quality captured images, [i.e., prompting the user of the imaging device that the determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image; and if authorized by the user, replacing the still image with the higher quality frame]).
McMahan, Lokshin, Vijayanarasimhan, Feng et al, and Sakaida et al are combinable because they are all concerned with object detection. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine McMahan, Lokshin, Vijayanarasimhan, and Feng et al, to evaluate the captured image 236 according to a quality metric, as though by Sakaida et al, in order to replace the selected captured still images with higher quality captured images, (Sakaida, Par. 0093).

In regards to claim 24, the combine teaching McMahan, Lokshin, Vijayanarasimhan, Feng et al, and Sakaida et al as whole discloses the limitations of claim 23.
Furthermore, Feng et al discloses wherein a quality level of a frame is determined based on recognizing an object's state in the frame, wherein a state of an object 

Regarding claim 32, claim 32 recites substantially similar limitations as set forth in claim 23. As such, claim 23 is in rejected for at least similar rational.

Claim 33 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan, (US-PGPUB 2009/0232417) in view of Vijayanarasimhan et al, (US-PGPUB 2019/0114487)
McMahan discloses a method for managing video captured by an imaging device, comprising:
capturing a video at the imaging device, (see at least: Abstract, and Par. 0001, and Par. 0025, the digital camera 10 captures images and/or video),
receiving, at a time after the beginning of the captured video, a command to capture an event in the captured video;
classifying the captured video based on feature(s) extracted from the captured video, (see at least: Par. 0014, and 0025, analyzing a digitally captured image to identify one or more recognizable objects in the image automatically, using the information, [i.e., feature(s)], from sensors associated with the digital camera 10 that is used to identify the objects. Further, Par. 0026, discloses that when analyzing an image, the present invention classifies the different subjects 42, 44, 46 as being either a "static" object or a 
marking the captured video based on the classification, (see at least: Par. 0027, once the object(s) being recognized, the digital camera 10 may use the information as metadata to annotate the image 40, [i.e., marking the captured video based on the classification, since the object recognition is performed based on the classification]); and
generating a media item summarizing the event from the captured video according to the marking, (see at least: Par. 0020, display 24 allows the user to view images and video captured by digital camera 10, [i.e., implicitly generating images and video, “media item”, captured by digital camera 10, “from the captured video”], where the metadata used to annotate captured images may be displayed on display 24 along with the images, [i.e., the images and video, “media item”, captured by digital camera 10, “from the captured video”, are generated according to metadata used to annotate, “marking”, captured images. See also Fig. 4, and Par. 0032, the controller 20 could then display the captured image along with the window overlay 50 containing the metadata, [i.e., generating a media item from the captured video according to the marking]).
McMahan does not expressly disclose receiving, at a time after the beginning of the captured video, a command to capture an event in the captured video; and the generating media item summarizing the event
Vijayanarasimhan discloses disclose receiving, at a time after the beginning of the captured video, a command to capture an event in the captured video, (see at least: Par. Fig. 5, and Par. 0093, the animation module 206 instructs the user interface module 208 to generate graphical data that shows the location within the video where the people jump advances the video to the portion that shows people jumping into the pool, [i.e., implicitly receiving, at a time after the beginning of the captured video, from the animation module 206, and instruction “command” to capture an event in the captured video, “people jump into the pool”. Note that time “t” of pool jumping on time line of Fig. 5, corresponds to the time after the beginning of the captured video]. Vijayanarasimhan further discloses the generating media item summarizing the event, (see at least: Fig. 6, and Par. 0101, identifying a video segment from the video that includes frames between the start time and the end time for the action in the video, (606 in Fig. 6); and generating video clip, (media item summarizing the event), that includes the video segment, (610 in Fig. 6), [i.e., generating media item summarizing the event. Note that the video clip corresponds to a summary of the action that occurred during the live video, (see at least: Par. 0006)).
McMahan and Vijayanarasimhan are combinable because they are both concerned with object(s) recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify McMahan, to use animation module 206, as though by Vijayanarasimhan, in order to instruct the user to generate time and location within the video where the event of interest occurs, (e.g., people jump into the pool) occurs, (Vijayanarasimhan, Par. 00930).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)270-1670. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571)272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, 




/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                            01/05/2021