DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. 

Claim Objections
Claims 16-17 are objected to because of the following informalities: The claims appear to contain a typographical error. The limitation “configured to executed” should read “configured to execute.”  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 9-10 and 16-20 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 9 recites “[t]he non-transitory program storage device of claim 10.” There is no antecedent basis for claim 10. For the purpose of further examination, claim 9 has been interpreted as dependent on claim 8.

Claim 10 depends from claim 9 and therefore inherit all of the deficiencies of claim 9 discussed above.

Claims 16-20 depend from claim 15 and recite the limitation “the device.” The limitation renders the claims indefinite because it is not clear whether the device corresponds to the electronic device or the one or more image capture devices recited earlier in claim 15. In addition, claims 8-14 further recite “program storage device.” For the purpose of further examination, the device recited claims 16-20 has been interpreted as “the electronic device.”

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 7, 8, and 14 is/are rejected under 35 U.S.C. 102(a)(1) and 102(a)(2) as being anticipated by Suri et al. (US 2015/0363635 A1), hereinafter referred to as Suri.
Regarding claim 1, Suri teaches a computer-implemented method for image selection, the method comprising:
obtaining a sequence of images (Suri Abstract: “The video file may be decoded to obtain video frames and audio data associated with the video frames. Feature scores for each video frame may be obtained by analyzing features of the video frame or the audio data associated with the video frame based on a local rule, a global rule, or both”);
Suri ¶0013: “High-level features may include features such as the quantities, positions, and/or facial features of human faces that are detected in the video frames”; Suri ¶0044: “The high-level analysis module 214 may analyze each decoded video frame for high-level features. In at least one embodiment, the high-level feature analyses may include face detection, face tracking, face recognition, saliency analysis, audio power analysis, audio classification analysis, speech analysis, and motion analysis”);
determining a first location for the detected first face in each of the one or more images, of the sequence of images, having the detected first face (Suri ¶0018: “The multiple high-level features 112 may include features such as the quantities, positions, and facial features of human faces that are detected in the video frames”; Suri ¶0045: “the high-level analysis module 214 may generate a list of detected faces with their positions in the video frame, the area of the video frame covered by each face, and a detection confidence score for each face that indicate a confidence in the detection”);
generating a heat map based on the first location of the detected first face in each of the images of the sequence of images (Suri ¶0051: “Based on the saliency analysis, the high-level analysis module 214 may apply a local rule to generate a heat map that displays a saliency score of every pixel in the video frame. A heat map is a graphical representation of data that is arranged in a matrix in which individual values in the matrix are represented using colors”);
determining a face quality score for the detected first face for each of the one or more images, of the sequence of images, having the detected first face (Suri ¶0027: “The low-level analysis module 212 may analyze each decoded video frame for low-level features to produce feature scores. In various embodiments, the low-level features may include exposure quality, saturation quality, hue variety, shakiness, average brightness, color entropy, and/or histogram differences between adjacent video frames”; Suri ¶0045: “the high-level analysis module 214 may generate a list of detected faces with their positions in the video frame, the area of the 
determining a peak face quality score for the detected first face based at least in part on the face quality scores and the generated heat map (Suri ¶0027, ¶0045 & ¶0051 discussed above; Suri ¶0014: “a consumer may select a set of video files with the highest importance scores for sharing on a website”; Suri ¶0045: “the high-level analysis module 214 may generate a list of detected faces with their positions in the video frame, the area of the video frame covered by each face, and a detection confidence score for each face that indicate a confidence in the detection”; further note that Suri ¶0027-¶0031 teaches determining the highest scores for different evaluation categories, e.g., exposure rating score, saturation score, hue score, shakiness score, brightness score, etc.); and
selecting a first image of the sequence of images, corresponding with the peak face quality score for the detected first face (Suri ¶0014 discussed above; Suri ¶0069: “the video segmentation module 220 may select a video frame with a highest window-mass as the center of the t-second long important video section”; Suri ¶0075: “For example, such a video editing application may enable a user to select video sections with section importance values that exceeds a particular score threshold to be digitally combined together to create a highlight video file”).

Regarding claim 8, Suri further teaches a non-transitory program storage device comprising instructions stored thereon to cause the one or more processors to perform the method described in claim 1 (Suri ¶0021: “The computing devices 104 may include one or more processors 202, interfaces 204, and memory 206. Each of the processors 202 may be a single-core processor or a multi-core processor”; Suri ¶0023: “Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, 

Regarding claims 7 and 14, Suri teaches the method and non-transitory program storage device of claims 1 and 8, wherein selecting the first image of the sequence of images comprises storing the first image (Suri ¶0074: “the data store 226 may store video files 228, ranked video files 230, ranked video sections 232, and/or metadata 234 associated with the ranked video files 230 and the ranked video sections 232”).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 4-5, 11-12, 15, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Suri et al. (US 2015/0363635 A1), in view of El-Khamy et al. (US 2017/0344808 A1), hereinafter referred to as Suri and El-Khamy, respectively.
claims 4 and 11, Suri teaches the method and non-transitory program storage device of claims 1 and 8, wherein the peak quality score is determined based on a probability value (Suri ¶0059: “The RANSAC algorithm may repeat this procedure a number of times until the probability of finding a good set of transformation parameters reaches a predetermined probability threshold given the data mismatch rate”; Suri ¶0060: “This score may include two parts: (1) a prior probability score which depends on the parameters and how far away the parameters are from commonly expected values, and (2) a probability score based on a robust function of the re-projection distance of the feature point matches. Such a score favors feature points which project to the correct locations, but allows outliers to coexist”).
However, Suri does not appear to explicitly teach that the value is output by a machine learning model.
Pertaining to the same field of endeavor, El-Khamy teaches using a machine learning model (El-Khamy ¶0015: “The subject matter disclosed herein relates to a system and a method for a unified deep-learning machine that can learn and perform multiple tasks that are conventionally performed in series for face and/or object recognition”; El-Khamy ¶0049: “the system 300 includes a unified architecture, multi-task deep-learning machine”; El-Khamy ¶0051: “the classification score generator 323 outputs a confidence score for each bounding box to determine whether the bounding box contains a face or not”).
Suri and El-Khamy are considered to be analogous art because they are directed to image processing for detecting objects. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the rule-based facial image analysis (as taught by Suri) to use machine learning (as taught by El-Khamy) because the combination allows multiple single-task machines to share resources with other different single-task deep learning machines configured as a multi-task deep-learning machines and therefore enables computational resources to be shared (El-Khamy ¶0016).

claims 5 and 12, Suri, in view of El-Khamy, teaches the method and non-transitory program storage device of claims 4 and 11, wherein the machine learning model is configured to detect a picture worthiness of a face rather than facial objects (The claims have been interpreted as determining a picture worthiness/quality based on the face as a whole, e.g., global analysis, rather than individual facial features such as eyes, nose, etc. Suri Abstract: “Feature scores for each video frame may be obtained by analyzing features of the video frame or the audio data associated with the video frame based on a local rule, a global rule, or both”; Suri ¶0013: “The local rules may be applied during the generation of feature analysis results for a video frame, and the global rules may be applied to during the generation of feature analysis results for an entire video file”; Suri ¶0069: “the importance calculation module 218 may generate a section importance value for each video section in a similar manner as with respect to entire video files”).

Regarding claim 15, Suri teaches an electronic device, comprising:
a memory (Suri ¶0021: “The computing devices 104 may include one or more processors 202, interfaces 204, and memory 206. Each of the processors 202 may be a single-core processor or a multi-core processor”);
one or more image capture devices (Suri ¶0045: “A detected face may be facing a camera that captured the video frame or sideways with respect to the camera”); and
one or more processors operatively coupled to the memory, wherein the one or more processors are configured to execute instructions causing the one or more processors to (Suri ¶0021 discussed above; also see Suri ¶0076: “In the context of software, the operations represent computer-executable instructions that, when executed by one or more processors, cause one or more processors to perform the recited operations”):
obtain a sequence of images (Suri Abstract: “The video file may be decoded to obtain video frames and audio data associated with the video frames. Feature scores for each video 
detect a first face in one or more images of the sequence of images (Suri ¶0013: “High-level features may include features such as the quantities, positions, and/or facial features of human faces that are detected in the video frames”; Suri ¶0044: “The high-level analysis module 214 may analyze each decoded video frame for high-level features. In at least one embodiment, the high-level feature analyses may include face detection, face tracking, face recognition, saliency analysis, audio power analysis, audio classification analysis, speech analysis, and motion analysis”);
determine a first location for the detected first face in each of the one or more images, of the sequence of images, having the detected first face (Suri ¶0018: “The multiple high-level features 112 may include features such as the quantities, positions, and facial features of human faces that are detected in the video frames”; Suri ¶0045: “the high-level analysis module 214 may generate a list of detected faces with their positions in the video frame, the area of the video frame covered by each face, and a detection confidence score for each face that indicate a confidence in the detection”);
generate a heat map based on the first location of the detected first face in each of the images of the sequence of images (Suri ¶0051: “Based on the saliency analysis, the high-level analysis module 214 may apply a local rule to generate a heat map that displays a saliency score of every pixel in the video frame. A heat map is a graphical representation of data that is arranged in a matrix in which individual values in the matrix are represented using colors”);
determine a face quality score for the detected first face for each of the one or more images having the detected first face, wherein the face quality score is determined based on a holistic assessment of face quality for the detected first face (Suri Abstract: “Feature scores for each video frame may be obtained by analyzing features of the video frame or the audio data associated with the video frame based on a local rule, a global rule, or both”; Suri ¶0013: “The Suri ¶0027: “The low-level analysis module 212 may analyze each decoded video frame for low-level features to produce feature scores. In various embodiments, the low-level features may include exposure quality, saturation quality, hue variety, shakiness, average brightness, color entropy, and/or histogram differences between adjacent video frames”; Suri ¶0045: “the high-level analysis module 214 may generate a list of detected faces with their positions in the video frame, the area of the video frame covered by each face, and a detection confidence score for each face that indicate a confidence in the detection”); and 
select a first image of the sequence of images, based on the determined face quality scores and the generated heat map for the detected face (Suri ¶0014: “a consumer may select a set of video files with the highest importance scores for sharing on a website”; Suri ¶0069: “the video segmentation module 220 may select a video frame with a highest window-mass as the center of the t-second long important video section”; Suri ¶0075: “For example, such a video editing application may enable a user to select video sections with section importance values that exceeds a particular score threshold to be digitally combined together to create a highlight video file”).
Suri teaches that a machine learning model is used to perform audio classification analysis (Suri ¶0053). However, Suri does not appear to explicitly teach that a machine learning model is used to determine a face quality score for the detected first face.
Pertaining to the same field of endeavor, El-Khamy teaches using a machine learning model to determine a face quality score for the detected first face (El-Khamy ¶0015: “The subject matter disclosed herein relates to a system and a method for a unified deep-learning machine that can learn and perform multiple tasks that are conventionally performed in series for face and/or object recognition”; El-Khamy ¶0049: “the system 300 includes a unified architecture, multi-task deep-learning machine”; El-Khamy ¶0051: “the classification score 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the rule-based facial image analysis (as taught by Suri) to use machine learning (as taught by El-Khamy) because the combination allows multiple single-task machines to share resources with other different single-task deep learning machines configured as a multi-task deep-learning machines and therefore enables computational resources to be shared (El-Khamy ¶0016).

Regarding claim 17, Suri, in view of El-Khamy, teaches the device of claim 15, wherein the heat map includes heat map values corresponding to the second locations of the detected second face in the sequence of images (Suri ¶0051 discussed above teaches that the heat map is generated for each pixel in the image), and 
wherein the one or more processors are configured to executed instructions that further cause the one or more processors to compare the heat map values corresponding to the second locations to a threshold heat map value (Suri ¶0046: “the high-level analysis module 214 may be configured to calculate a face importance score if the size of the detect face is between a minimum size threshold and a maximum size threshold. Conversely, faces whose size are smaller than the minimum size threshold or greater than a maximum size threshold may be considered invalid for face importance score calculation by the high-level analysis module 214, or a negative score bias may be assigned to the corresponding video frame for such occurrences”; Suri ¶0071: “These values may also include a quality density, which may reflect a percentage of frames in a video file or a video section with negative or positive features that exceed a corresponding threshold”; Suri ¶0075: “such a video editing application may enable a user to select video sections with section importance values that exceeds a particular score threshold to be digitally combined together to create a highlight video file”).

Claim 18 is rejected using the same rationale as applied to claims 4 and 11 discussed above.

Claim 19 is rejected using the same rationale as applied to claims 5 and 12 discussed above.

Allowable Subject Matter
Claim(s) 2-3, 6, and 13 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim(s) 9-10, 16, and 20 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:
Regarding claims 2, 9, and 16, the prior art of record teaches the method, non-transitory program storage device, and device of claims 1, [8], and 15, wherein the instructions to determine the set of selected images further cause the one or more processors to:
detect a second face in one or more images of the sequence of images; determine a second location for the detected second face in each of the one or more images, of the sequence of images, having the detected second face, wherein the heat map is further based on the second location of the detected second face, and filter the detected second face based on the heat map (Suri ¶0065: “the high-level analysis module 214 may perform the face detection, the face tracking, and/or the face recognition for one or more faces in each video frame using a monochrome and down sampled version of the video frame”; El-Khamy ¶0037: 
However, the prior art, alone or in combination, does not appear to teach or suggest that the heat map indicates that the second location of the detected second face changes more than the first location of the detected first face.

Regarding claims 6, 13 and 20, the prior art of record teaches the method, non-transitory program storage device, and device of claims 1, 8 and 15, but does not appear to explicitly teach or suggest determining whether the face quality score peaks within a sliding window.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO J SHIN whose telephone number is (571)272-9753. The examiner can normally be reached M-F; 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more 





/Soo Shin/Primary Examiner, Art Unit 2667