DETAILED ACTION
Allowable Subject Matter
Claims 1, 4-8, 11-15, 18-22, 28, 34, and 40-42 are allowed.
The following is an examiner’s statement of reasons for allowance: the closest prior art in Sato (U.S. Patent Application Publication 2010/0182501) does not teach nor suggest in detail the recognition unit being not capable of recognizing a voice extracted by the extraction unit as a word or a sentence, the combination unit combines, with the selected part of the frame, a character string that is different from a character string obtained by converting a word or a sentence recognized unit and is specified based on a feature derived from image analysis of one frame included in the selected part, as in the claimed invention (In remarks filed on 02/18/2022 on page 13, line 3 through page 14, line 2). Sato et al. only teaches a combination unit that combines a character string prepared in advance, which is different from a character string based on an extracted voice, with a selected part of the frame.
So as indicated by the above statements, the closest prior art as discussed above, either singularly or in combination, fails to anticipate or render the above discussed features/limitations obvious and additionally, applicant’s arguments have been considered persuasive, in light of the claim limitations as well as the enabling portions of the specification.
Furthermore, prior art, alone or in combination, fails to teach or fairly suggest, the combination of elements as described below:
An image processing apparatus comprising: a selection unit configured to select, from a moving image including a plurality of frames, a part of the moving image; an extraction unit configured to extract a voice during a predetermined time corresponding to the selected part in the moving image; a recognition unit configured to recognize a voice extracted by the extraction unit as a word or a sentence, wherein (i) in a case where a voice extracted by the extraction unit is a mixed voice, the recognition unit is not capable of recognizing the voice extracted by the extraction unit as a word or a sentence, and (ii) in a case where a voice extracted by the extraction unit is not a mixed voice, the recognition unit is capable of recognizing the voice extracted by the extraction unit as a word or a sentence; a conversion unit configured to convert a word or a sentence recognized by the recognition unit into a character string; and a combination unit configured to combine a character string with the part of the moving image selected by the selection unit or a frame among frames corresponding to the part, wherein, in accordance with the recognition unit being not capable of recognizing a voice extracted by the extraction unit as a word or a sentence, the combination unit combines, with the selected part or the frame, a character string that is different from a character string obtained by converting a word or a sentence recognized by the recognition unit and is specified based on a feature derived from image analysis of one frame included in the selected part, and wherein, in accordance with the recognition unit being capable of recognizing a voice extracted by the extraction unit as a word or a sentence, the combination unit combines, with the selected part or the frame, a character string that is obtained by converting a word or a sentence recognized by the recognition unit and is specified not based on a feature derived from image analysis of one frame included in the selected part (Independent claim 1; claims 4-7, 22, and 40 depend from claim 1).
An image processing method comprising: a selection step of selecting, from a moving image including a plurality of frames, a part of the moving image; -4-an extraction step of extracting a voice during a predetermined time corresponding to the selected part in the moving image; a recognition step of recognizing a voice extracted in the extraction step as a word or a sentence, wherein (i) in a case where a voice extracted in the extraction step is a mixed voice, the voice extracted at the extraction step is not recognizable as a word or a sentence in the recognition step, and (ii) in a case where a voice extracted in the extraction step is not a mixed voice, the voice extracted at the extraction step is recognizable as a word or a sentence in the recognition step; a conversion step of converting a word or a sentence recognized at the recognition step into a character string; and a combination step of combining a character string with the part of the moving image selected at the selection step or a frame among frames corresponding to the part, wherein, in accordance with a voice extracted in the extraction step being not recognizable as a word or a sentence in the recognition step, a character string that is different from a character string obtained by converting a word or a sentence recognized in the recognition step and is specified based on a feature derived from image analysis of one frame included in the selected part is combined with the selected part or the frame in the combination step, and wherein, in accordance with a voice extracted at the extraction step being not recognizable as a word or a sentence in the recognition step, a character string that is obtained by converting a word or a sentence recognized in the recognition step and is specified not based on -5-a feature derived from image analysis of one frame included in the selected part is combined with the selected part or the frame in the combination step (Independent claim 8; claims 11-14, 28, and 41 depend from claim 8).
A non-transitory computer-readable storage medium storing a program for causing a computer to perform an image processing method comprising: a selection step of selecting, from a moving image including a plurality of frames, a part of the moving image; an extraction step of extracting a voice during a predetermined time corresponding to the selected part in the moving image; a recognition step of recognizing a voice extracted in the extraction step as a word or a sentence, wherein (i) in a case where a voice extracted in the extraction step is a mixed voice, the voice extracted at the extraction step is not recognizable as a word or a sentence in the recognition step, and (ii) in a case where a voice extracted in the extraction step is not a mixed voice, the voice extracted at the extraction step is recognizable as a word or a sentence in the recognition step; -7-a conversion step of converting a word or a sentence recognized at the recognition step into a character string; and a combination step of combining a character string with the part of the moving image selected at the selection step or a frame among frames corresponding to the part, wherein, in accordance with a voice extracted in the extraction step being not recognizable as a word or a sentence in the recognition step, a character string that is different from a character string obtained by converting a word or a sentence recognized in the recognition step and is specified based on a feature derived from image analysis of one frame included in the selected part is combined with the selected part or the frame in the combination step, and wherein, in accordance with a voice extracted at the extraction step being not recognizable as a word or a sentence in the recognition step, a character string that is obtained by converting a word or a sentence recognized in the recognition step and is specified not based on a feature derived from image analysis of one frame included in the selected part is combined with the selected part or the frame in the combination step (Independent claim 15; claims 18-21, 34, and 42 depend from claim 15).
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEATHER R JONES whose telephone number is (571)272-7368. The examiner can normally be reached Mon. - Fri.: 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached on (571)272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/HEATHER R JONES/Primary Examiner, Art Unit 2481                                                                                                                                                                                                        
March 9, 2022