DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/09/2022 has been entered.
 

Response to Arguments
Applicant’s arguments with respect to claims 1-3, 5-10, 12-14 and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-10, 12-14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Shetty et al. US 2016/0070962 further in view of Yamaguchi et al. US 2011/0169952.

In regarding to claim 1 Shetty teaches:
1. An electronic device, comprising: a memory storing one or more instructions; and a processor executing the one or more instructions stored in the memory, wherein the processor is configured to execute the one or more instructions to: 
Shetty, Fig. 1 and 0024-0028

obtain a plurality of image sequences divide from original image data; determine a predetermined number of image sequences among the plurality of image sequences as an input image group; 
FIG. 2 illustrates the segmentation of a video and selection of a representative frame, according to one embodiment. The segmentation and selection of a representative frame is performed as described above by the components of the video hosting service 100. Video 200 is segmented into a set of segments 210 by the video segmentation module 120. Each of the segments includes a chronological set of frames 220, shown here as frames F.sub.1-F.sub.7. Each of the frames is associated with a set of semantic features identified by the feature extraction module 120. In this example, the illustrated segment is a segment showing a lion chasing a gazelle. In the segment, initially the frames depict a lion, then at frame F.sub.3 and F.sub.4 a gazelle is shown, and a lion begins chasing the gazelle at F.sub.5 and are both in-frame and identified in F.sub.6, and the lion alone is identified in F.sub.7. As described above, these semantic features in one embodiment identify a likelihood of a semantic concept being present in a frame, and while displayed here as "present," the semantic concepts may only indicate that a particular concept, e.g., "lion" is likely or highly likely present in a frame or may include a floating point likelihood or probability of the concept occurring in the frame. After scoring the semantic concepts in the frame, the frame selection module 124 selects frame F.sub.6 as the representative frame in this segment. When scoring the frames, the frame selection module 124 identifies that the semantic concepts associated with the segment are "lion" and "gazelle." Frame F.sub.6, as including both lion and gazelle, receives a score for each concept and a total semantic score accounting for each. After optionally generating a combined score accounting for an aesthetic score, Frame F.sub.6 is selected as the representative frame 230. In practice, multiple frames are likely to include the concepts "lion" and "gazelle." Incorporating the aesthetic score may assist in identifying which of these frames is aesthetically most pleasing to a user.
Shetty 0050, emphasis added.


select a first image sequence among the image sequences included in the first input image group and add the selected first image sequence to a highlight image group based on a correlation with one or more image sequences pre-classified as a highlight image group, 
FIG. 2 illustrates the segmentation of a video and selection of a representative frame, according to one embodiment. The segmentation and selection of a representative frame is performed as described above by the components of the video hosting service 100. Video 200 is segmented into a set of segments 210 by the video segmentation module 120. Each of the segments includes a chronological set of frames 220, shown here as frames F.sub.1-F.sub.7. Each of the frames is associated with a set of semantic features identified by the feature extraction module 120. In this example, the illustrated segment is a segment showing a lion chasing a gazelle. In the segment, initially the frames depict a lion, then at frame F.sub.3 and F.sub.4 a gazelle is shown, and a lion begins chasing the gazelle at F.sub.5 and are both in-frame and identified in F.sub.6, and the lion alone is identified in F.sub.7. As described above, these semantic features in one embodiment identify a likelihood of a semantic concept being present in a frame, and while displayed here as "present," the semantic concepts may only indicate that a particular concept, e.g., "lion" is likely or highly likely present in a frame or may include a floating point likelihood or probability of the concept occurring in the frame. After scoring the semantic concepts in the frame, the frame selection module 124 selects frame F.sub.6 as the representative frame in this segment. When scoring the frames, the frame selection module 124 identifies that the semantic concepts associated with the segment are "lion" and "gazelle." Frame F.sub.6, as including both lion and gazelle, receives a score for each concept and a total semantic score accounting for each. After optionally generating a combined score accounting for an aesthetic score, Frame F.sub.6 is selected as the representative frame 230. In practice, multiple frames are likely to include the concepts "lion" and "gazelle." Incorporating the aesthetic score may assist in identifying which of these frames is aesthetically most pleasing to a user.
Shetty 0050, emphasis added.


by using a trained model trained using an artificial intelligence algorithm; 
The features extracted using the feature extraction module 120 in one embodiment are visual low-level frame-based features. For example, one embodiment uses a color histogram, histogram of oriented gradients, color-differencing with adjacent frames, motion features, and feature tracking, though other frame-based features can be used. The features extracted are collected on a per-frame basis and could comprise other frame-based features such as an identified number of faces or a histogram of oriented optical flow, and may comprise a combination of extracted features. Further features are extracted in other embodiments, such as a Laplacian-of-Gaussian (LoG) or Scale Invariant Feature Transform (SIFT) feature extractor, a color histogram computed using hue and saturation in HSV color space, motion rigidity features, texture features, filter responses (e.g. derived from Gabor wavelets), including 3D filter responses, edge features using edges detected by a Canny edge detector, gradiant location and orientation histogram (GLOH), local energy-based shape histogram (LESH), or speeded-up robust features (SURF). Additional audio features can also be used, such as volume, an audio spectrogram, speech-no-speech indicators, or a stabilized auditory image. The features may also include intermediate layer outputs of a deep neural network trained for a variety of image and video recognition, classification, or ranking tasks. Optionally, in order to reduce the dimensionality of these features while maintaining the discriminating aspects, the features are reduced. The feature reduction is performed in one embodiment using a learned linear projection using principal component analysis to reduce the dimensionality of the feature vectors to 50, or some other suitable number less than 100. Other embodiments can use additional techniques to reduce the number of dimensions in the feature vectors when desired.
Shetty 0036, 0038, emphasis added.

However, Shetty fails to explicitly teach, but Yamaguchi teaches:
determine the predetermined number of image sequences from a next image sequence arranged adjacent to the selected first image sequence in a reproduction time order as a second input image group; 
When the encode unit 212 encodes the input digital video data with an I-frame interval producing a ratio of one I-frame to every 15 frames, one GOP (Group of Pictures) then consists of frames in the order I-B-B-P-B-B-P-B-B-P-B-B-P-B-B. This is shown in the top row of Fig. 4 labeled "Large I-frame Interval". When the I-frame interval produces a ratio of one I-frame to every three frames, the encode unit 212 encodes the data so that one GOP consists of frames in the order I-P-P. This is shown in the bottom row of Fig. 4, labeled "Small I-frame Interval". 
Yamaguchi 000065 and Figs. 4-5, emphasis added.

select a second image sequence among the image sequences included in the second input image group based on a correlation with the image sequences including the first image sequence pre-classified as the highlight image group;
When the encode unit 212 encodes the input digital video data with an I-frame interval producing a ratio of one I-frame to every 15 frames, one GOP (Group of Pictures) then consists of frames in the order I-B-B-P-B-B-P-B-B-P-B-B-P-B-B. This is shown in the top row of Fig. 4 labeled "Large I-frame Interval". When the I-frame interval produces a ratio of one I-frame to every three frames, the encode unit 212 encodes the data so that one GOP consists of frames in the order I-P-P. This is shown in the bottom row of Fig. 4, labeled "Small I-frame Interval". 
Yamaguchi 000065 and Figs. 4-5, emphasis added.

Accordingly, it would have been obvious to one ordinary skill in the art before the effective filing date the claimed invention to combine with the system of Shetty in order determine the predetermined number of image sequences from a next image sequence arranged adjacent to the selected first image sequence in a reproduction time order as a second input image group, as such, I-frames created based on the size of video data.
Further, Shetty teaches: 
add the selected second image sequence to the highlight image group; and generate summary image data extracted from the original image data, by using image sequence included in the highlight image group. 
FIG. 3 illustrates the generation of a segment table indicating representative frames for video segments of a video according to one embodiment. In this example, a video 300 includes a variety of animals. The video is analyzed by the video segmentation module 122 using several methods of identifying video segments, which yields identified video segment sets 310A-C. For each video segment in the set, a representative frame 315 is identified by the frame selection module 124 as described above. Since the various methods of segmentation may identify different boundaries within the video 300, different representative frames may be selected for the various segments, as shown. The segments and representative frames are stored in a segment table 320, which identifies the segments, a representative frame for each segment, and a set of semantic concepts associated with the representative frame.
Shetty 0051-0054 and Fig. 5, emphasis added.


In regarding to claim 2 Shetty and Yamaguchi teaches:
2. The electronic device of claim 1, further, Shetty teaches wherein the processor is configured to execute the one or more instructions to: select one of image sequences included in the input image data group based on section information corresponding to each of the image sequences included in the input image data group, wherein the section information comprises information about a section to which each image sequence belongs among a plurality of sections into which the original image data is divided.
FIG. 3 illustrates the generation of a segment table indicating representative frames for video segments of a video according to one embodiment. In this example, a video 300 includes a variety of animals. The video is analyzed by the video segmentation module 122 using several methods of identifying video segments, which yields identified video segment sets 310A-C. For each video segment in the set, a representative frame 315 is identified by the frame selection module 124 as described above. Since the various methods of segmentation may identify different boundaries within the video 300, different representative frames may be selected for the various segments, as shown. The segments and representative frames are stored in a segment table 320, which identifies the segments, a representative frame for each segment, and a set of semantic concepts associated with the representative frame.
Shetty 0050-0051, emphasis added.
 
In regarding to claim 3 Shetty and Yamaguchi teaches:
3. The electronic device of claim 1, further, Shetty teaches wherein the processor is configured to execute the one or more instructions to: divide the original image data into the plurality of image sequences based on a predetermined time unit; 
FIG. 3 illustrates the generation of a segment table indicating representative frames for video segments of a video according to one embodiment. In this example, a video 300 includes a variety of animals. The video is analyzed by the video segmentation module 122 using several methods of identifying video segments, which yields identified video segment sets 310A-C. For each video segment in the set, a representative frame 315 is identified by the frame selection module 124 as described above. Since the various methods of segmentation may identify different boundaries within the video 300, different representative frames may be selected for the various segments, as shown. The segments and representative frames are stored in a segment table 320, which identifies the segments, a representative frame for each segment, and a set of semantic concepts associated with the representative frame.
Shetty 0050-0051 and Fig. 3 segment table, emphasis added.

and determine the predetermined number of image sequences arranged adjacent to each other in a reproduction time order among the plurality of divided image sequences as the input image group. 
FIG. 3 illustrates the generation of a segment table indicating representative frames for video segments of a video according to one embodiment. In this example, a video 300 includes a variety of animals. The video is analyzed by the video segmentation module 122 using several methods of identifying video segments, which yields identified video segment sets 310A-C. For each video segment in the set, a representative frame 315 is identified by the frame selection module 124 as described above. Since the various methods of segmentation may identify different boundaries within the video 300, different representative frames may be selected for the various segments, as shown. The segments and representative frames are stored in a segment table 320, which identifies the segments, a representative frame for each segment, and a set of semantic concepts associated with the representative frame.
Shetty 0050-0051 and Fig. 3 segment table, emphasis added.

In regarding to claim 5 Shetty and Yamaguchi teaches:
5. The electronic device of claim 1, further, Shetty teaches wherein the processor is configured to execute the one or more instructions to determine a target time of the summary image data based on a user input.
Shetty 0051-0054 and Fig. 5,
 
In regarding to claim 6 Shetty and Yamaguchi teaches:
6. The electronic device of claim 5, further, Shetty teaches wherein the processor is configured to execute the instructions to select an image sequence to be added to the highlight image group multiple times based on the target time. 
Shetty 0051-0054 and Fig. 5,

In regarding to claim 7 Shetty and Yamaguchi teaches:
7. The electronic device of claim 1, further, Shetty teaches wherein the processor is configured to execute the one or more instructions to control a display to display the generated summary image data as a thumbnail image. 
In one embodiment, the video summary module also determines whether to replace a default thumbnail for a video based on the selected representative frame. Each video may be associated with a default thumbnail, which may be designated by a user uploading the video or may be selected based on semantic of aesthetic features of the video. The video summary module 126 determines whether to replace the default thumbnail in some embodiments by comparing a relevance score of the selected representative frames to a relevance score calculated with respect to the default thumbnail. The relevance scores may be calculated with respect to the video metadata, search query, or user interests as described above. When the representative frame relevance score is higher than the default thumbnail by a threshold value, the representative frame is selected as a replacement thumbnail for display.
Shetty 0058, 0064, emphasis added.
Claims 8-10, 12-14 list all similar elements of claims 1-3 and 5-7, but in method form rather than device form.  Therefore, the supporting rationale of the rejection to claims 1-3 and 5-7 applies equally as well to claims 8—10 and 12-14.
Claim 15 list all similar elements of claim 1 or 8, but in non-transitory recording medium form rather than device or method form.  Therefore, the supporting rationale of the rejection to claims 1 or 8 applies equally as well to claims 15.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T TEKLE whose telephone number is (571)270-1117.  The examiner can normally be reached on Monday-Friday 8:00-4:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached on 571-272-3922.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DANIEL T TEKLE/Examiner, Art Unit 2481