Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s response to the last office action, filed July 6, 2022 has been entered and made of record. Claims 1-2, 13, 25, 33 have been amended; claims 35-37 have been added. Claims 1-37 are pending in this application.

Response to Arguments
Applicant’s arguments with respect to claims 1-37 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 


The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 13, 25, 33-34, and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Lokshin et al, (US-PGPUB 2017/0118539) in view of Andrizzi et al, (US Patent 10,332,564); and further in view of Oien et al, (US-PGPUB 2017/0180676)

In regards to claim 1, Lokshin discloses a method for managing video 
captured by an imaging device, comprising: 
capturing a video of an event, (see at least: Par. 0025, plurality of cameras may be utilized to capture timestamped video of event such as sporting events); and 
classifying the captured video based on feature(s) extracted from the captured video, (see at least: step 202 in Fig. 2, and Par. 0040, the machine learning algorithm or predictive model may then use these inputs as a training set to automatically classify video frames based on the sensor record data, [i.e., the sensor record data represent the feature(s) extracted from the captured video]). 
Lokshin does not expressly disclose receiving, at a command time during the 
capturing and after the beginning of the capturing, a command to capture the event in the captured video; classifying the captured video based on the command time after the beginning of the event and on feature(s) extracted from frame(s) of the captured video proximate to the command time; searching in the captured video for frames captured both before and after the command time that represent the event based on the classification; and generating a media item from a subset of the captured video identified in the searching.
However, Andrizzi et al discloses receiving, at a command time during the 
capturing and after the beginning of the capturing, a command to capture the event in the captured video, (see at least: Fig. 1A, col. 2, lines 25-28, during or after the moment of interest, the device may receive a command to select video data that was just captured and stored in the buffer, as that video data includes the moment of interest, [i.e., implicitly receiving command to capture the moment of interest, at a command time during the capturing and after the beginning of the capturing]. Andrizzi et al further discloses that the classifying captured video being based on feature(s) extracted from frame(s) of the captured video proximate to the command time, (see at least: Fig. 3, col. 7, line 11, through col. 8, line 31, associating tags 108  with specific moments, “features” within the video data 106, including a forward tag 108-10, a backward tag 108-12, a begin tag 108-14, an end tag 108-16 and a window tag 108-18, based on associating timestamps, “proximate to the command time” to the different tags, [i.e., implicitly distinguishing the video based on the tagged specific moments, “features” within the video data 106 proximate to the command time, where the specific moments correspond to the video clip data within the video data 106, “feature”, and are implicitly extracted from frame(s) of the captured video); searching in the captured video for frames captured both before and after the command time that represent the event based on the classification, (see at least: col. 2, lines 25-27,  and col. 4, lines 15-53, after the moment of interest, the device may receive a command to select video data that was just captured and stored in the buffer, as that video data includes the moment of interest, and the device may determine a beginning of the moment of interest, identify video data in the circular buffer from the beginning of the moment of interest, and may upload the video data and a tag to a server, [i.e., the device may receive a command for searching in the captured video for frames captured after the command time that represent the event based on the  tagged moments of interest “based on classification”]. Andrizzi et al further discloses at col. 3, lines 31-38, that the device 102 may capture video data during a standby mode and may store the captured video data into a buffer, and upon receiving a command from the user 10, such as a backward tag command, the device 102 may upload at least a portion of the video data stored in the buffer to the server 112, [i.e., receiving a command from the user for searching in the captured video for frames captured before the command time that represent the event based on the tags associated with the specific events, “based on classification”]); and generating a media item from a subset of the captured video identified in the searching, (see at least: col. 2, lines 32-34, server may then use the tag and the video data in later operations, such as creating a video summary of an event, “generating media item”, where the event includes the moment of interest, implicitly identified in the searching by the device).
Lokshin and Andrizzi are combinable because they are both concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Lokshin, to associate tags with specific moments within the video data, as though by Andrizzi, in order to generate the edited video by the server, (Andrizzi, col. 4, lines 17-26).
However, the combine teaching Lokshin and Andrizzi as whole does not expressly disclose classifying the captured video based on the command time after the beginning of the event.
Oien et al discloses classifying the captured video based on the command time after the beginning of the event, (see at least: Par. 0019-020, and 0034-0045, the event detector 208 may detect different classes of events, comprising start event, and stop event, “classifying the captured video”, for example, a collision start event may be determined and 30 seconds of data may be flagged for write protection and five minutes of data after the collision start event may be marked for write protection, “command time after the beginning of the event”, [i.e., classifying event of captured video, “classifying the captured video”, based on the flagged data marked for write protection, for a second period of time after the occurrence of the start event, “command time after the beginning of the event”]).
Lokshin and Andrizzi and Oien et al are combinable because they are all concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin and Andrizzi, to use the event detector 208, as though by Oien et al, in order to classify the detected event, based on data for a second period of time after the occurrence of the start event, (Oien, Par. 0020)

Regarding claim 13, claim 13 recites substantially similar limitations as set forth in claim 1. As such, claim 13 is in rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a computer system, comprising at least one processor associated with an imaging device; at least one memory comprising instructions configured to be executed by the at least one processor to perform a method”. However, Lokshin discloses the “computer system, comprising at least one processor associated with an imaging device; at least one memory comprising instructions configured to be executed by the at least one processor to perform a method”, (Lokshin, see at least: Fig. 4, CPU 402, “processor”, and memory 404).

Regarding claim 25, claim 25 recites substantially similar limitations as set forth in claim 1. As such, claim 25 is in rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a non-transitory computer-readable medium comprising instructions executable by at least one processor associated with an imaging device to perform a method”. However, Lokshin discloses the “non-transitory computer-readable medium comprising instructions executable by at least one processor associated with an imaging device to perform a method”, (see at least: Par. 0079, “computer readable medium”).

In regards to claim 33, Lokshin discloses a method for managing video 
captured by an imaging device, comprising: 
capturing a video of an event, (see at least: Par. 0025, plurality of cameras may be utilized to capture timestamped video of event such as sporting events); and 
identify a type of event based on feature(s) extracted from the captured video, (see at least: step 202 in Fig. 2, and Par. 0040, the machine learning algorithm or predictive model may then use these inputs as a training set to automatically classify video frames based on the sensor record data, [i.e., the sensor record data represent the feature(s) extracted from the captured video]). 
Lokshin does not expressly discloses receiving, during the capturing and at a 
command time after the beginning of the capturing, a command to capture the event in the captured video; identify a type of event based on the command time after the beginning of the capturing; marking the captured video based on the identified type of the event and the command time, wherein the marking identifies a span of frames in the captured video with capture times surrounding the command time; and generating a media item summarizing the event from the captured video according to the marking.
Andrizzi et al discloses the receiving, during the capturing and at a command 
time after the beginning of the capturing, a command to capture the event in the captured video, (see at least: Fig. 1A, col. 2, lines 25-28, during or after the moment of interest, the device may receive a command to select video data that was just captured and stored in the buffer, as that video data includes the moment of interest, [i.e., implicitly receiving command to capture the moment of interest, at a command time during the capturing and after the beginning of the capturing]); identify a type of event based on the command time, (see at least: Fig. 3, col. 7, line 11, through col. 8, line 31, associating tags 108  with specific moments, “features” within the video data 106, including a forward tag 108-10, a backward tag 108-12, a begin tag 108-14, an end tag 108-16 and a window tag 108-18, based on associating timestamps, “based on the command time” to the different tags, [i.e., implicitly distinguishing the video based on the tagged specific moments, “features” within the video data 106 based on the command time, where the specific moments correspond to the video clip data within the video data 106, “feature”, and are implicitly extracted from frame(s) of the captured video); marking the captured video based on the identified type of the event and the command time, wherein the marking identifies a span of frames in the captured video with capture times surrounding the command time, (see at least: col. 9, lines 7-19, the server 112 may also recognize certain movements (e.g., particular hand motions, head gestures, etc.), and to create a tag based on such recognitions, and using the tags to determine a begin-point and an endpoint for the individual video clips as described in greater detail above with regard to FIG. 3. Further col 15, lines 51-53, discloses that the location may be marked relative to a timestamp associated with the video data including the start and stop points, [i.e., marking the captured video based on tagged “identified type” of the event and the command time, “timestamps”, wherein the marking identifies a span of frames in the captured video with capture times surrounding the command time as shown in Fig. 3]); and generating a media item summarizing the event from the captured video according to the marking, (see at least: col. 2, lines 32-34, as shown in Fig. 3, the server 112  may then use the tag and the video data in later operations, such as creating a video summary of an event, where the event includes the moment of interest.
Lokshin and Andrizzi are combinable because they are both concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Lokshin, to associate tags with specific moments within the video data, as though by Andrizzi, in order to generate the edited video by the server, (Andrizzi, col. 4, lines 17-26).
However, the combine teaching Lokshin and Andrizzi as whole does not expressly disclose identify a type of event based on the command time after the beginning of the event.
Oien et al discloses classifying the captured video based on the command time after the beginning of the event, (see at least: Par. 0019-020, and 0034-0045, the event detector 208 may detect different classes of events, comprising start event, and stop event, “classifying the captured video”, for example, a collision start event may be determined and 30 seconds of data may be flagged for write protection and five minutes of data after the collision start event may be marked for write protection, “command time after the beginning of the event”, [i.e., classifying event of captured video, “classifying the captured video”, based on the flagged data marked for write protection, for a second period of time after the occurrence of the start event, “command time after the beginning of the event”]).
Lokshin and Andrizzi and Oien et al are combinable because they are all concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin and Andrizzi, to use the event detector 208, as though by Oien et al, in order to classify the detected event, based on data for a second period of time after the occurrence of the start event, (Oien, Par. 0020).


In regards to claim 34, the combine teaching Lokshin, Andrizzi, and Oien as 
whole discloses the limitations of claim 1.
Furthermore, Andrizzi discloses wherein the features extracted from the captured video are extracted from a frame having a frame time at the command time, (Andrizzi, see at least: Fig. 3, col. 7, lines 9-36, the specific moments of interest, “video clip data 302” are extracted from the video data 106, using the timestamps associated with the plurality of tags 108 based on user specific command time).  

In regards to claim 35, the combine teaching Lokshin, Andrizzi, and Oien as 
whole discloses the limitations of claim 1.
Furthermore, Oien et al discloses wherein: the classifying identifies an event type of the event based on the command time after the beginning of the event, (see at least: Par. 0019-020, and 0034-0045, the event detector 208 may detect different classes of events, comprising start event, and stop event, “event type”, for example, a collision start event may be determined and 30 seconds of data may be flagged for write protection and five minutes of data after the collision start event may be marked for write protection, “command time after the beginning of the event”, [i.e., classifying identifies the event type, “start and stop event”, based on the flagged data marked for write protection, for a second period of time after the occurrence of the start event, “command time after the beginning of the event”]).
In the other hand, Andrizzi discloses the searching is based on the event type, (see at least: col. 2, lines 25-27,  and col. 4, lines 15-53, after the moment of interest, the device may receive a command to select video data that was just captured and stored in the buffer, as that video data includes the moment of interest, and the device may determine a beginning of the moment of interest, identify video data in the circular buffer from the beginning of the moment of interest, and may upload the video data and a tag to a server, [i.e., the device may receive a command for searching in the captured video for frames captured after the command time that represent the event based on the  tagged moments of interest “based on the event type”])

In regards to claim 37, Lokshin discloses a method for managing video captured by an imaging device, comprising: 
capturing a video of an event, (see at least: Par. 0025, plurality of cameras may be utilized to capture timestamped video of event such as sporting events); and 
extracting features from the captured video at a temporal location, (see at least: par. 0039, using a sensor reading device for recording data regarding the performer such as the performer's acceleration, velocity, [i.e., implicitly extracting features from the captured video]. Further, Par. 0042, and 0052, associating metadata with the video data, including information such as the geographic location of the video, the date and/or time the video was taken, “temporal location”, [i.e., the features such as performer's acceleration, velocity, are implicitly extracted based on the temporal location, “geographic location of the video, the date and/or time the video was taken”])
Lokshin does not expressly disclose receiving, at a command time during the capturing and after the beginning of the capturing, a command to capture the event in the captured video; and that extracted features corresponding to the command time after the beginning of the capturing; searching in the captured video for frames captured both before and after the command time that represent the event based on the extracted features; and generating a media item from a subset of the captured video identified in the searching.
However, Andrizzi et al discloses receiving, at a command time during the 
capturing and after the beginning of the capturing, a command to capture the event in the captured video, (see at least: Fig. 1A, col. 2, lines 25-28, during or after the moment of interest, the device may receive a command to select video data that was just captured and stored in the buffer, as that video data includes the moment of interest, [i.e., implicitly receiving command to capture the moment of interest, at a command time during the capturing and after the beginning of the capturing]. Andrizzi et al further discloses searching in the captured video for frames captured both before and after the command time that represent the event based on the extracted features, (see at least: col. 2, lines 25-27,  and col. 4, lines 15-53, after the moment of interest, the device may receive a command to select video data that was just captured and stored in the buffer, as that video data includes the moment of interest, and the device may determine a beginning of the moment of interest, identify video data in the circular buffer from the beginning of the moment of interest, and may upload the video data and a tag to a server. Further, col. 4, lines 37-39, the device may capture video data throughout the party, but the user 10 may generate tags for specific moments or specific guests at the party, in the video clip [i.e., implicitly extracting features of interest, “specific moments or specific guests”], where the video clips may be tagged and associated with a particular time/timestamp, date, and/or location, “extracting features of interest at a temporal location, “the extracted events of interest at a temporal location”. Andrizzi et al further discloses at col. 3, lines 31-38, that the device 102 may capture video data during a standby mode and may store the captured video data into a buffer, and upon receiving a command from the user 10, such as a backward tag command, the device 102 may upload at least a portion of the video data stored in the buffer to the server 112, [i.e., receiving a command from the user for searching in the captured video for frames captured before the command time that represent the event based on the tags associated with the specific events, “based on the extracted features of interest at a temporal location”]); and generating a media item from a subset of the captured video identified in the searching, (see at least: col. 2, lines 32-34, server may then use the tag and the video data in later operations, such as creating a video summary of an event, “generating media item”, where the event includes the moment of interest, implicitly identified in the searching by the device).
Lokshin and Andrizzi are combinable because they are both concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Lokshin, to associate tags with specific moments within the video data, as though by Andrizzi, in order to generate the edited video by the server, (Andrizzi, col. 4, lines 17-26).
The combine teaching Lokshin and Andrizzi as whole does not expressly disclose that the extracted features corresponding to the command time after the beginning of the capturing.
However, the combine teaching Lokshin and Andrizzi as whole does not expressly disclose classifying the captured video based on the command time after the beginning of the event.
Oien et al discloses extracting the features based on the command time after the beginning of the event, (see at least: Par. 0019-020, and 0034-0045, the event detector 208 may detect different classes of events, comprising start event, and stop event, for example, a collision start event may be determined, “extracting features”, and 30 seconds of data may be flagged for write protection and five minutes of data after the collision start event may be marked for write protection, “command time after the beginning of the event”, [i.e., extracting features, “collision start event and/or Collison stop event”, based on the flagged data marked for write protection, for a second period of time after the occurrence of the start event, “based on the command time after the beginning of the event”]).
Lokshin and Andrizzi and Oien et al are combinable because they are all concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin and Andrizzi, to use the event detector 208, as though by Oien et al, in order to detect the collision, based on data for a second period of time after the occurrence of the start event, (Oien, Par. 0020)

Claims 2-3, 7-10, 14-15, 19-22, 26-27, and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Lokshin, Andrizzi, and Oien, as applied to claim 1; and further in view of McMahan, (US-PGPUB 2009/0232417)

In regards to claim 2, the combine teaching Lokshin, Andrizzi, and Oien as 
whole discloses the limitations of claim 1.
	The combine teaching Lokshin, Andrizzi, and Oien as whole does not expressly disclose wherein; when the classifying identifies the captured video as representing a static event, the generated media item is a still image; and when the classifying identifies the captured video as representing a dynamic event, the generated media item is a video.
	However, McMahan discloses wherein: when the classifying identifies the captured video as representing a static event, the generated media item is a still image; and when the classifying identifies the captured video as representing a dynamic event, the generated media item is a video, (see at least: Par. 0027, based on its classification, the present invention selects an appropriate recognition algorithm to identify the object, using any known technique to recognize a given static or dynamic object, [i.e., the classifying identifies the captured video as representing a static event or dynamic event]. Further, Par. 0020, discloses that the display 24 displays an image or video for a user almost immediately after the user captures the image, [accordingly, the display 24 displays an image, “i.e., the generated media item is a still image”, when the classification identifies a static object, “i.e., object being implicitly in static state not performing any activity or event”, while the display 24 displays a video, “i.e., the generated media item is a video”, when the classification identifies a dynamic object, “i.e., object performing activity such as movement”]).
Lokshin, Andrizzi, Oien, and McMahan are combinable because they are all concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin, Andrizzi, and Oien, to use the recognition algorithm, as thought by McMahan, in order to recognize a given static or dynamic object(s), (McMahan, Par. 0027)
In regards to claim 3, the combine teaching Lokshin, Andrizzi, and Oien as 
whole discloses the limitations of claim 1.
The combine teaching Lokshin, Andrizzi, and Oien as whole does not expressly 
disclose wherein the feature(s) extracted are derived from object detection analysis and the classifying is based on detection of a predetermined object type from the captured video.
	However, McMahan discloses wherein the feature(s) extracted are derived from object detection analysis, (see at least: Par. 0025, analyzing a digitally captured image to identify one or more recognizable objects in the image automatically, using the information, “feature(s)”, from sensors associated with the digital camera 10 that is used to identify the objects, [i.e., sensors information such as location provided by GPS is implicitly derived from the object detection analysis], and the classifying is based on detection of a predetermined object type from the captured video, (Par. 0026, when analyzing an image, the present invention classifies the different subjects 42, 44, 46 as being either a "static" object or a "dynamic" object”, [i.e., classifies the different subjects is based on detection of the subjects as being either a "static" object or a "dynamic" object, “predetermined object type”]).
Lokshin, Andrizzi, Oien, and McMahan are combinable because they are all concerned with video imaging recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin, Andrizzi, and Oien, to analyze a digitally captured images using the information from the sensors associated with camera, as though by McMahan, in order to identify one or more recognizable objects, (McMahan, Par. 0025).
In regards to claim 7, the combine teaching Lokshin, Andrizzi, Oien, and McMahan as whole discloses the limitations of claim 2.
Furthermore, Andrizzi discloses wherein: when the classifying identifies the captured video as representing a dynamic event, the identifying frames in the captured video based on the classification comprises identifying a beginning or an end of the event, (see at least: col. 9, lines 7-19, the server 112 may also recognize certain movements (e.g., particular hand motions, head gestures, etc.), and to create a tag based on such recognitions, [i.e. when the classifying identifies the captured video as representing a dynamic event], such that the server 112 may use the tags to determine a begin-point and an endpoint for the individual video clips as described in greater detail above with regard to FIG. 3, [i.e., identifying a beginning or an end of the event])

In regards to claim 8, the combine teaching Lokshin, Andrizzi, Oien, and McMahan, as whole discloses the limitations of claim 7.
Furthermore, Andrizzi discloses wherein the identifying a beginning or an end of the event is based on appearance or disappearance of detected object(s) in the captured video, (see at least: see at least: col. 9, lines 7-19, where the recognition of certain movements (e.g., particular hand motions, head gestures, etc.), and using the tags to determine a begin-point and an endpoint for the individual video clips, implicit the appearance or disappearance of detected object(s))

In regards to claim 9, the combine teaching Lokshin, Andrizzi, Oien, and McMahan as whole discloses the limitations of claim 7.
Furthermore, Andrizzi discloses wherein the identifying a beginning or an end of the event is based on an act associated with a recognized predefined action type in the captured image, (see at least: see at least: col. 9, lines 7-19, the server 112 may also recognize certain movements (e.g., particular hand motions, head gestures, etc.), [i.e., recognizing predefined action type in the captured image], and to create a tag based on such recognitions, such that the server 112 may use the tags to determine a begin-point and an endpoint for the individual video clips as described in greater detail above with regard to FIG. 3, [i.e., identifying a beginning or an end of the event is based on recognized certain movements, “an act associated with a recognized predefined action type in the captured image”])

In regards to claim 10, the combine teaching Lokshin, Andrizzi, Oien, and McMahan, as whole discloses the limitations of claim 7.
Furthermore, Andrizzi discloses wherein the identifying a beginning or an end of the event is based on a location in the captured video temporally related to a receiving time of the captured command, (see at least: col. 15, lines 55-59, the location may be marked relative to a timestamp associated with the video data, according to a location in the buffer of the pause 916, or in some other manner relative to the video data in the buffer and/or the video data). 

Regarding claim 14, claim 14 recites substantially similar limitations as set forth in claim 2. As such, claim 14 is in rejected for at least similar rational.

Regarding claim 15, claim 15 recites substantially similar limitations as set forth in claim 3. As such, claim 15 is in rejected for at least similar rational.

Regarding claim 19, claim 19 recites substantially similar limitations as set forth in claim 7. As such, claim 18 is in rejected for at least similar rational.

Regarding claim 20, claim 20 recites substantially similar limitations as set forth in claim 8. As such, claim 20 is in rejected for at least similar rational.

Regarding claim 21, claim 21 recites substantially similar limitations as set forth in claim 9. As such, claim 21 is in rejected for at least similar rational.

Regarding claim 22, claim 22 recites substantially similar limitations as set forth in claim 10. As such, claim 22 is in rejected for at least similar rational.

Regarding claim 26, claim 26 recites substantially similar limitations as set forth in claim 2. As such, claim 26 is in rejected for at least similar rational.

Regarding claim 27, claim 27 recites substantially similar limitations as set forth in claim 3. As such, claim 27 is in rejected for at least similar rational.

In regards to claim 31, the combine teaching Lokshin, Andrizzi, Oien, and McMahan as whole discloses the limitations of claim 26.
Furthermore, Andrizzi discloses wherein: when the classifying identifies the captured video as representing a dynamic event, the identifying frames in the captured video based on the classification comprises marking a beginning or an end of the event, (see at least: col. 9, lines 7-19, the server 112 may also recognize certain movements (e.g., particular hand motions, head gestures, etc.), and to create a tag based on such recognitions, [i.e. when the classifying identifies the captured video as representing a dynamic event], such that the server 112 may use the tags to determine a begin-point and an endpoint for the individual video clips as described in greater detail above with regard to FIG. 3, [i.e., implicitly marking a beginning or an end of the event using tags]. See also col 15, lines 51-53, location may be marked relative to a timestamp associated with the video data including the start and stop points).

Claims 4, 16, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Lokshin, Andrizzi, and Oien, as applied to claims 1, 13, and 25 above; and further in view of Newell et al, (US-PGPUB 2008/0306995)

In regards to claim 4, the combine teaching Lokshin, Andrizzi, and Oien as 
whole discloses the limitations of claim 1.
The combine teaching Lokshin, Andrizzi, and Oien as whole does not expressly disclose wherein the feature(s) extracted are derived from scene recognition analysis and the classifying is based on recognition of a predetermined scene type from the captured video.
However, Newell et al discloses algorithms that can include scene classifiers which identify or classify a scene into one or more scene types (i.e., beach, indoor, etc.), [i.e., classifying is based on recognition of a predetermined scene type from the captured video], or one or more activities (i.e., running, etc.), [i.e., the one or more activities “e.g., running” represents the extracted feature(s), which are implicitly derived from scene recognition analysis]), (see at least: Par. 0024)
Lokshin, Andrizzi, Oien, and Newell et al are combinable because they are both concerned with object(s) recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin, Andrizzi, and Oien, to use scene classifiers, as though by Newell et al, in order to identify or classify a scene into one or more scene types, (Newell, Par. 0024)

The following prior art of record, Dareddy et al, (US-PGPUB 2020/0186897), is 
pertinent to claim 4, as it discloses also the limitation: “wherein the feature(s) extracted are derived from scene recognition analysis”, (Par. 0098, a second module 504b may be configured to detect events, “feature(s)”, occurring within the video game), and the classifying is based on recognition of a predetermined scene type from the captured video, (Par. 0098, the first module 504a may be trained to classify the video frames into different scene-types).

Regarding claim 16, claim 16 recites substantially similar limitations as set forth in claim 4. As such, claim 16 is in rejected for at least similar rational.

Regarding claim 28, claim 28 recites substantially similar limitations as set forth in claim 4. As such, claim 28 is in rejected for at least similar rational.

Claims 5, 17, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Lokshin, Andrizzi, and Oien, as applied to claims 1, 13, and 25 above; and further in view of Kehtarnavaz et al, (US-PGPUB 2016/0292497)
The combine teaching Lokshin, Andrizzi, and Oien as whole discloses the limitations of claim 3.
The combine teaching Lokshin, Andrizzi, and Oien as whole does not expressly disclose wherein the feature(s) extracted are derived from motion recognition analysis and the classifying is based on recognition of a predetermined motion type from the captured video.
Kehtarnavaz discloses wherein the feature(s) extracted are derived from motion recognition analysis, (see at least: Par. 0021-0024, a movement recognition system 100 utilizing an inertial sensor 106 and a depth sensor 108, where the inertial sensor 106 may measure information corresponding to an object's inertial movement, and depth sensor 108 may measure a three dimensional shape of object 104, [i.e., the “information corresponding to an object's inertial movement”, and “the three dimensional shape of object”, correspond to the extracted feature(s) derived from motion recognition analysis]), and the classifying is based on recognition of a predetermined motion type from the captured video, (Par. 0030, the single HMM classification logic 302 may be configured to determine a type of movement of object (i.e., classify a movement) utilizing the signals from both the inertial sensor 106 and the depth sensor 108 by utilizing a HMM classifier, [i.e., the classifying is implicitly based on recognition of a predetermined motion type from the captured video).
Lokshin, Andrizzi, Oien, and Kehtarnavaz et al are combinable because they are all concerned with feature(s) recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin, Andrizzi, and Oien, to use the movement recognition system 100, and HMM classification logic 302, as though by Kehtarnavaz, in order to measure information corresponding to an object's inertial movement, and a shape of the object (Par. 0021-0024), and further determining a type of movement of object, (Kehtarnavaz, Par. 0030)

Regarding claim 17, claim 17 recites substantially similar limitations as set forth in claim 5. As such, claim 17 is in rejected for at least similar rational.

Regarding claim 29, claim 29 recites substantially similar limitations as set forth in claim 5. As such, claim 29 is in rejected for at least similar rational.

Claims 6, 18, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Lokshin, Andrizzi, and Oien, as applied to claims 1, 13, and 25 above; and further in view of Abramson et al, (US-PGPUB 2017/0294210)

In regards to claim 6, the combine teaching Lokshin, Andrizzi, and Oien as whole discloses the limitations of claim 1.
The combine teaching Lokshin, Andrizzi, and Oien as whole does not expressly disclose wherein the feature(s) extracted are derived from motion recognition and wherein, when a detected object is recognized to have motion that is greater than a threshold amount, the classifying identifies the captured video as representing a dynamic event, and when the detected object is recognized to have motion that is lower than a threshold amount, the classifying identifies the captured video as representing a static event.
However, Abramson discloses wherein, the feature(s) extracted are derived from motion recognition, (see at least: Par. 0035, movements in the video are segmented, such as a controller may segment moving areas and non-moving areas in the series of frames based on color, [i.e., accordingly, the color, “feature(s)” may be extracted from the moving areas and non-moving areas based on motion recognition], and wherein, when a detected object is recognized to have motion that is greater than a threshold amount, the classifying identifies the captured video as representing a dynamic event, and when the detected object is recognized to have motion that is lower than a threshold amount, the classifying identifies the captured video as representing a static event, (see at least: block 202, 204, in Fig. 2, and Par. 0035-0037, the segmentation may refer to distinguishing dynamic objects from static objects in the series of images or each frame of the video; and that a portions of the images with no movement may be referred to as static while portions of the images with movement may be referred to as dynamic, such as an area of no movement is an area where object motion does not exceed a distance threshold, [i.e., when the detected object within the area where object motion does not exceed a distance threshold, implicitly identifying the captured video as representing a static event, and explicitly while when the detected object within the area where object motion exceed a distance threshold, identifying the captured video as representing a dynamic event]).
Lokshin, Andrizzi, Oien, and Abramson et al are combinable because they are both concerned with object classification. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin, Andrizzi, and Oien, to compare the object motion area to distance threshold, as though Abramson et al, in order to determine whether the portions of the images are static or dynamic based on the distance threshold, (Abramson, Par. 0037)

Regarding claim 18, claim 18 recites substantially similar limitations as set forth in claim 6. As such, claim 18 is in rejected for at least similar rational.

Regarding claim 30, claim 30 recites substantially similar limitations as set forth in claim 6. As such, claim 30 is in rejected for at least similar rational.

Claims 11-12, 23-24, and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Lokshin, Andrizzi, Oien, and McMahan, as applied to claim 2; and further in view of Sakaida et al, (US-PGPUB 2014/0268247)

In regards to claim 11, the combine teaching Lokshin, Andrizzi, Oien, and McMahan as whole discloses the limitations of claim 2.
Furthermore, McMahan discloses that when the classifying identifies the captured video as representing a static event, the generated media item is a still image; (McMahan, see at least: Par. 0027, based on its classification, the present invention selects an appropriate recognition algorithm to identify the object, using any known technique to recognize a given static or dynamic object, [i.e., the classifying identifies the captured video as representing a static event or dynamic event]).
The combine teaching Lokshin, Andrizzi, Oien, and McMahan as whole does not expressly disclose the determining a quality level of the still image; determining quality levels of frame(s) from the captured video; if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame.
However, Feng et al discloses determining quality levels of frame(s) from the captured video; and if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, ), the system selects a frame with the acceptable level of quality, (see at least: Fig. 3A, steps 300-308, and Par. 0057-0059, obtaining a sequence of video frames of a target object, wherein the target object is a face (operation 302); and evaluating the sequence of video frames to determine a set of frames with an acceptable level of quality based on scores for features associated with a respective frame (decision 306), [i.e., determining quality levels of frame(s) from the captured video]; If the result of the evaluation is not greater than a predetermined threshold for the acceptable level of quality (decision 3060), the system prompt the user to improve the quality of the captured target object, (e.g., capture text, image, audio, or video data); and If the result of the evaluation is greater than the predetermined threshold for the acceptable level of quality (decision 306), the system selects a frame (e.g., from the determined set of frames with the acceptable level of quality) with the acceptable level of quality (operation 308), [i.e., if a determined quality level of a frame of the frame(s) is higher than the predetermined threshold), the system selects a frame with the acceptable level of quality]).
Lokshin, Andrizzi, Oien, McMahan, and Feng et al are combinable because they are all concerned with object classification. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Lokshin, Andrizzi, Oien, and McMahan, to include steps 302-308, as though by Feng et al, in order to select a frame with the acceptable level of quality, (Feng, Par. 0059).
However, the combine teaching Lokshin, Andrizzi, Oien, McMahan, and Feng et al as whole does not expressly disclose determining a quality level of the still image; and prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame.
Sakaida et al discloses determining a quality level of the still image, (see at least: Par. 0080, generating preview images 234 by the digital image sensor, computing first quality metric, and comparing the first quality to a first quality threshold, (see at least: Par. 0080, implicitly determining a quality level of the still image); and if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device; and if authorized by the user, replacing the still image with the higher quality frame, (see at least: Par. 0081-0082, the captured image is evaluated immediately after capture to verify that the captured image 236 is of sufficient quality before going on to the next page of the document, by evaluating the captured still image 236 according to a second quality metric, and notifying the user that a still image has been captured when the second quality metric for the captured image is at or above the second quality threshold, [i.e., if a determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image, prompting a user of the imaging device]. Further, Par. 0093, in another embodiment, the scanning application prompts the user to review the captured still images in the sorted order, (Fig. 9B), where the scanning application repeats the monitoring and capturing operations for images selected by the user, thereby replacing selected captured still images with higher quality captured images, [i.e., prompting the user of the imaging device that the determined quality level of a frame of the frame(s) is higher than the determined quality level of the still image; and if authorized by the user, replacing the still image with the higher quality frame]).
Lokshin, Andrizzi, Oien, McMahan, Feng et al, and Sakaida et al are combinable because they are both concerned with object detection. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine Lokshin, Andrizzi, Oien, McMahan, Feng et al, to evaluate the captured image 236 according to a quality metric, as though by Sakaida et al, in order to replace the selected captured still images with higher quality captured images, (Sakaida, Par. 0093)

In regards to claim 12, the combine teaching Lokshin, Andrizzi, Oien, McMahan, Feng et al, and Sakaida as whole discloses the limitations of claim 11.
Furthermore, Feng et al discloses wherein a quality level of a frame is determined based on recognizing an object's state in the frame, wherein a state of an object comprises a pose, an orientation, or an appearance, (Feng, see at least: Par. 0032, evaluating the quality of an image and selects facial images with a quality which exceeds a certain threshold, [i.e., quality level of a frame is determined based on recognizing an object's state, such as a face, “object’s appearance”]).

Regarding claim 23, claim 23 recites substantially similar limitations as set forth in claim 11. As such, claim 23 is in rejected for at least similar rational.

Regarding claim 24, claim 24 recites substantially similar limitations as set forth in claim 12. As such, claim 24 is in rejected for at least similar rational.

Regarding claim 32, claim 32 recites substantially similar limitations as set forth in claim 11. As such, claim 32 is in rejected for at least similar rational.

Claim 36 is rejected under 35 U.S.C. 103 as being unpatentable over Lokshin, Andrizzi, and Oien, as applied to claim 1 above; and further in view of dePaz et al, (US-PGPUB 2019/035460)
The combine teaching Lokshin, Andrizzi, and Oien as whole discloses the 
limitations of claim 1.
Furthermore, Oien et al discloses wherein: the classifying identifies a scene type at the command time after the beginning of the event, (see at least: Par. 0019-020, and 0034-0045, the event detector 208 may detect different classes of events, comprising start event, and stop event, “event type”, for example, a collision start event may be determined and 30 seconds of data may be flagged for write protection and five minutes of data after the collision start event may be marked for write protection, “command time after the beginning of the event”, [i.e., classifying identifies the scene type, “a collision start event”, based on the flagged data marked for write protection, for a second period of time after the occurrence of the start event, “command time after the beginning of the event”]).
The combine teaching Lokshin, Andrizzi, and Oien as whole does not expressly 
disclose that the searching is based on the scene type.
	However, dePaz discloses the searching being based on the scene type, (see at least: Par. 0004, a first scene command to search for scenes in the video content of a scene type, [i.e., searching video content based on the scene type]).
Lokshin, Andrizzi, Oien, and dePaz et al are combinable because they are both concerned with object detection. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine Lokshin, Andrizzi, and Oien, to receive, from the user, a first scene command, as though by dePaz et al, in order to search for scenes in the video content of a scene type, (dePaz, Par. 0004)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)272-0273. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                            10/22/2022