DETAILED ACTION
This office action is in response to Applicant’s submission filed on 7/19/2019. Claims 1-20 are pending in the application. As such, claims 1- 20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2018-140118, filed on 7/26/2018.
Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on the following dates 7/19/2019  have been considered by the examiner.

Drawings
The drawing filed on 7/19/2019 have been accepted and considered by the examiner.

Claim Objections
Claim 8, line 4 and claim 9, line 4 objected to because of the following informalities:  
“a image” should read “an image ".  

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
Claim 1:
“an acquisition unit that acquires voice data and image data, in line 3
“a display control unit that performs control to display the image data acquired by the acquisition unit in synchronization with the voice data; in line 5
“a reception unit that receives a display element to be added for display to a specific character in the image data displayed by the display control unit;” in line 8
“a setting unit that sets a playback period in which the specific character in the voice data is played back, as a display period of the display element received by the reception unit in the image data” in line 11
Claim 2:
“an image recognition unit
“wherein the display control unit performs control to display the specific character converted into text by the image recognition unit.” In line 6
Claim 3:
“a correction unit that corrects the specific character converted into text by the image recognition unit” in line 3
Claim 4:
“an addition unit that adds candidates for a read representation that is possibly included in the voice data, as the specific character.” in line 3
Claim 5:
“an addition unit that adds candidates for a read representation that is possibly included in the voice data, as the specific character” in line 3
Claim 6:
“a suggestion unit that suggests candidates for a read representation to be added by the addition unit” in line 3
Claim 7:
“a suggestion unit that suggests candidates for a read representation to be added by the addition unit” in line 3
Claim 8:
“wherein the specific character is a character string disposed in a preset area in a image data indicated by the display element received by the reception unit” in line 5
Claim 9:
“wherein the specific character is a character string disposed in a preset area in a image data indicated by the display element received by the reception unit” in line 5
Claim 10:
“a voice recognition unit that recognizes the voice data as voice and converts the voice into text.” in line 3
Claim 11:
“wherein the display control unit performs control to display a character string converted into text by the voice recognition unit” in line 3
Claim 12:
“a correction unit that corrects the character string converted into text by the voice recognition unit” in line 3
Claim 13:
“wherein the display control unit performs control to display a list of the character strings converted into texts by the voice recognition unit” in line 3
Claim 14:
“wherein the display control unit performs control to display a list of the character strings converted into texts by the voice recognition unit and  a playback period of the voice data of each character string” in line 3
Claim 15:
“wherein in a case where a plurality of the specific characters are included in the voice data, the display control unit performs control to display a character string corresponding to the specific characters as a candidate” in line 5
Claim 16:
“wherein the display control unit performs control to display an entire text of the voice data converted into text by the voice recognition unit and” in line 3
Claim 17:
“wherein the display control unit performs control to display a candidate for a character string corresponding to the specific character together with contexts before and after the character string” in line 3
Claim 18:
“a playback unit that plays back a candidate for a character string corresponding to the specific character together with contexts before and after the character string” in line 3
Claim 19:
“wherein the display control unit performs control to display the voice data possibly corresponding to the specific character by converting the voice data into text by the voice recognition unit” in line 3
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 8 – 14, 19 - 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by  Faisman et al. (US20080177786A1)(hereinafter "Faisman").

Regarding claim 1, Faisman teaches an information processing apparatus comprising: an acquisition unit that acquires voice data and image data, respectively; (Faisman, Par. 0006:” … the method further comprising the steps of acquiring a multimedia data stream, segmenting the multimedia stream into a video data and an audio data stream).
a display control unit that performs control to display the image data acquired by the acquisition unit in synchronization with the voice data; (Faisman, Par. 0006:” … wherein the playback times of the video and audio data streams are synchronized, associating playback time annotation indicators with the time synchronized video and audio data streams”).
a reception unit that receives a display element to be added for display to a specific character in the image data displayed by the display control unit; and (Faisman, Par. 0007:” The method further comprises the steps of associating the discrete playback time annotation indicators of the audio data stream words, or phrases that are reproduced within the audio data stream with respective textual representations of the words, or phrases that are comprised within the transcript, editing the transcript of the audio data stream, and outputting the transcript, the video data and audio data streams in a predetermined data format.”).
a setting unit that sets a playback period in which the specific character in the voice data is played back, as a display period of the display element received by the reception unit in the image data. (Faisman, Par. 0025:” Yet further aspects of the present invention allow for the provision of feedback to a system user based upon the timing information that is associated with the edited pronunciation of transcription text, which in its turn can also be used to improve the annotation editing process. FIG. 2 shows a screenshot of an editable transcription stream that is synchronized with a media data stream file. The screenshot shows a GUI 200, wherein the GUI 200 is used to display and edit time-aligned transcriptions. The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing information. The right-side display 210 is configured to playback a multimedia data file. The right-side display 210 further comprises multimedia controls, thus allowing for the control of the listening/viewing aspects of a multimedia data.”).

Regarding claim 8, Faisman teaches the information processing apparatus according to claim 1, wherein the specific character is a character string disposed in a preset area in a image data indicated by the display element received by the edited pronunciation of transcription text, which in its turn can also be used to improve the annotation editing process. FIG. 2 shows a screenshot of an editable transcription stream that is synchronized with a media data stream file. The screenshot shows a GUI 200, wherein the GUI 200 is used to display and edit time-aligned transcriptions. The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing information. The right-side display 210 is configured to playback a multimedia data file. The right-side display 210 further comprises multimedia controls, thus allowing for the control of the listening/viewing aspects of a multimedia data.”).

Regarding claim 9, Faisman teaches the information processing apparatus according to claim 2, wherein the specific character is a character string disposed in a preset area in a image data indicated by the display element received by the reception unit. (Faisman, Par. 0025:” Yet further aspects of the present invention allow for the provision of feedback to a system user based upon the timing information that is associated with the edited pronunciation of transcription text, which in its turn can also be used to improve the annotation editing process. FIG. 2 shows a screenshot of an editable transcription stream that is synchronized with a media data stream file. The screenshot shows a GUI 200, wherein the GUI 200 is display and edit time-aligned transcriptions. The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing information. The right-side display 210 is configured to playback a multimedia data file. The right-side display 210 further comprises multimedia controls, thus allowing for the control of the listening/viewing aspects of a multimedia data.”).

Regarding claim 10, Faisman teaches the information processing apparatus according to claim 1, further comprising: a voice recognition unit that recognizes the voice data as voice and converts the voice into text. (Faisman, Par. 0020:” At 110, a transcription of the audio data stream file is created from the audio data stream; wherein the transcription can be configured as a standard transcription of the audio data stream, a translation of the audio data stream, a listing of annotations that are associated with the audio data stream, or a summarization of the audio data stream. The transcription can be created using any conventionally available ASR conversion tool.”).

Regarding claim 11, Faisman teaches the information processing apparatus according to claim 10, wherein the display control unit performs control to display a character string converted into text by the voice recognition unit. (Faisman, Par. 0025:” The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing transcription can be created using any conventionally available ASR conversion tool.”).

Regarding claim 12, Faisman teaches the information processing apparatus according to claim 11, further comprising: a correction unit that corrects the character string converted into text by the voice recognition unit. (Faisman, Par. 0016:” Currently, many situations occur when it is necessary to create, and synchronize a transcription of a multimedia file [i.e., files containing audio and video data components] with the original multimedia file [e.g., transcripts or translations of video files, media databases, captions of television programs, etc . . .]. ASR and automatic translation tools can be used to create initial draft transcriptions of a multimedia file. However, the transcription drafts that are generated by these tools more so than not will require the further editing of the transcription in order to provide the correct textual representation of the content that has been derived from the original multimedia file.”).

Regarding claim 13, Faisman teaches the information processing apparatus according to claim 10, wherein the display control unit performs control to display a list of the character strings converted into texts by the voice recognition unit. (Faisman, Par. 0020:” At 110, a transcription of the audio data stream file is created from the audio data stream; wherein the transcription can be configured as a standard transcription of the audio data stream, a translation of the audio data listing of annotations that are associated with the audio data stream, or a summarization of the audio data stream. The transcription can be created using any conventionally available ASR conversion tool. The transcription comprises synchronization information that relates the textual elements of the transcription with the original multimedia data stream file from which the transcription was derived.”, and Par. 0025:” The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing information. The right-side display 210 is configured to playback a multimedia data file. The right-side display 210 further comprises multimedia controls, thus allowing for the control of the listening/viewing aspects of a multimedia data.”).

Regarding claim 14, Faisman teaches the information processing apparatus according to claim 10, wherein the display control unit performs control to display a list of the character strings converted into texts by the voice recognition unit and a playback period of the voice data of each character string. (Faisman, Par. 0026:” The present application further allows for the editing of a transcript in conjunction with the simultaneous listing and viewing of a multimedia data stream file. The timing information that is embedded into the transcription allows for the navigation from the edited text of the transcription to a relational playback position of the media file, and from the playback position of the media file to the text of the transcription during the editing process. A system user has only to select a character, word, or phrase in the text, and the multimedia file will travel to the corresponding synchronized point within the multimedia playback. Conversely, a user can select a multimedia data playback position, and the text that is synchronized with the playback position of the multimedia file will accordingly be highlighted.”, and Par. 0020:” The transcription can be created using any conventionally available ASR conversion tool.”).


Regarding claim 19, Faisman teaches the information processing apparatus according to claim 10, wherein the display control unit performs control to display the voice data possibly corresponding to the specific character by converting the voice data into text by the voice recognition unit. (Faisman, Par. 0016:” Currently, many situations occur when it is necessary to create, and synchronize a transcription of a multimedia file [i.e., files containing audio and video data components] with the original multimedia file [e.g., transcripts or translations of video files, media databases, captions of television programs, etc . . .]. ASR and automatic translation tools can be used to create initial draft transcriptions of a multimedia file.”, and Par. 0025:” The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing information.”).

computer program products] having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention.”).
acquiring voice data and image data, respectively; (Faisman, Par. 0006:” … the method further comprising the steps of acquiring a multimedia data stream, segmenting the multimedia stream into a video data and an audio data stream).
controlling to display the image data acquired in the acquiring in synchronization with the voice data; (Faisman, Par. 0006:” … wherein the playback times of the video and audio data streams are synchronized, associating playback time annotation indicators with the time synchronized video and audio data streams”).
receiving a display element to be added for display to a specific character in the image data displayed in the controlling to display; and (Faisman, Par. 0007:” The method further comprises the steps of associating the discrete playback time annotation indicators of the audio data stream words, or phrases that are reproduced within the audio data stream with respective corresponding textual representations of the words, or phrases that are comprised within the transcript, editing the transcript of the audio data stream, and outputting the transcript, the video data and audio data streams in a predetermined data format.”).
setting a playback period in which the specific character in the voice data is played back, as a display period of the display element received in the receiving in the image data. (Faisman, Par. 0025:” Yet further aspects of the present invention allow for the provision of feedback to a system user based upon the timing information that is associated with the edited pronunciation of transcription text, which in its turn can also be used to improve the annotation editing process. FIG. 2 shows a screenshot of an editable transcription stream that is synchronized with a media data stream file. The screenshot shows a GUI 200, wherein the GUI 200 is used to display and edit time-aligned transcriptions. The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing information. The right-side display 210 is configured to playback a multimedia data file. The right-side display 210 further comprises multimedia controls, thus allowing for the control of the listening/viewing aspects of a multimedia data.”).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Faisman (US20080177786A1), and in further view of Rutschman et al. (US20190028721A1)(hereinafter  “Rutschman”).

Regarding claim 2, Faisman does not teach the information processing apparatus according to claim 1, further comprising: an image recognition unit that performs image recognition on a specific character in the image data, and converts the 
Rutschman teaches an image recognition unit that performs image recognition on a specific character in the image data, and converts the specific character into text (Rutschman, Par. 0104:” In one instance, the hub processor 412 generates the data request for a specified character or text from at least one of the first image processor and/or the second image processor based on the client request at 2002. The client 418 can transmit a request for the text or symbols within an image and the hub processor 412 can request character recognition be performed by the image processor 408 with respect to image data to determine the text and/or symbols within the image.”).
wherein the display control unit performs control to display the specific character converted into text by the image recognition unit. (Rutschman, Par. 0104:” For example, in a context of a home environment, a visually impaired person can utilize the imaging device 400 to assist with reading of text, such as newspaper text, magazine text, tablet display text, labels, food item nutritional information, medication prescription directions, internet of things display text, or any other information within or proximate a home. For instance, a person can hold up a medication container and speak a request for reading the medication label indicia. The hub processor 412 can receive the audio request and generate a request to the image processor 408 to perform character recognition on the image data associated with the medication label. The hub processor 412 can then obtain the text result and output the text to a larger display or convert the text to speech for reading aloud.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Faisman’s “teaching of editing multimedia stream” with Rutschman’s “teaching of providing low latency communication” to perform image recognition on a specific character in the image data, and converts the specific character into text, in order to automatically determine the resolution based on a device associated with the client, as evidence by Rutschman (see Par. 0078).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Faisman (US20080177786A1), Rutschman (US20190028721A1), and in further view of Seino et al. (US20070058868A1)(hereinafter  “Seino”).

Regarding claim 3, Faisman does not teach the information processing apparatus according to claim 2, further comprising: a correction unit that corrects the specific character converted into text by the image recognition unit. (Seino, claim 3:” a character recognition part that outputs text data resulting from character recognition that is performed by using the character image data; and a correction processing part that displays a window on which the text data outputted from said character recognition part and the image data are displayed so as to be visually confirmation or correction of the text data which is the character recognition result.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Faisman’s “teaching of editing multimedia stream” and Rutschman’s “teaching of providing low latency communication” with Seino’s “teaching of character reader “to correct the specific character converted into text by the image recognition unit, in order to efficiently perform a confirmation work or a correction work of a character recognition result, as evidence by Seino (Par. 0014).

Claim 4, and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Faisman (US20080177786A1), Rutschman (US20190028721A1), Seino (US20070058868A1), and in further view of Funakura, Hiroyuki (US20050027525A1)(hereinafter  “Funakura”).

Regarding claim 4, Faisman does not teach the information processing apparatus according to claim 2, further comprising: an addition unit that adds candidates for a read representation that is possibly included in the voice data, as the specific character.
Funakura teaches an addition unit that adds candidates for a read representation that is possibly included in the voice data, as the specific character. (Funakura, Par. 0011:” Further, a specific word is converted image data corresponding to the character data converted by the voice discriminator. An image represented by the image data is shown on the display.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Faisman’s “teaching of editing multimedia stream” and Rutschman’s “teaching of voice discriminating tag for making  voice distinguishable ” and Seino’s “teaching of character reader “  with Funakura’s “teaching of character reader “ to add candidates for a read representation that is possibly included in the voice data, as the specific character, in order to improve the expression of the character information obtained by the voice recognition, as evidence by Funakura (See Par. 0014).

Regarding claim 5, Faisman does not teach the information processing apparatus according to claim 3, further comprising: an addition unit that adds candidates for a read representation that is possibly included in the voice data, as the specific character.
Funakura teaches an addition unit that adds candidates for a read representation that is possibly included in the voice data, as the specific character. (Funakura, Par. 0011:” Further, a specific word is converted into image data corresponding to the character data converted by voice discriminator. An image represented by the image data is shown on the display.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Faisman’s “teaching of editing multimedia stream” and Rutschman’s “teaching of voice discriminating tag for making  voice distinguishable ” and Seino’s “teaching of character reader “  with Funakura’s “teaching of character reader “ to add candidates for a read representation that is possibly included in the voice data, as the specific character, in order to improve the expression of the character information obtained by the voice recognition, as evidence by Funakura (See Par. 0014).


Claim 6, and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Faisman (US20080177786A1), Rutschman (US20190028721A1), Seino (US20070058868A1), Funakura (US20050027525A1), and in further view of Yamamichi et al. (US20030206647A1)(hereinafter  “Yamamichi”).

Regarding claim 6, Faisman does not teach the information processing apparatus according to claim 4, further comprising: a suggestion unit that suggests candidates for a read representation to be added by the addition unit.
suggest] a letter recognition apparatus and an image inputting/outputting system using said letter recognition apparatus which sounds a warning signal to its operator when it encounters letters prone to misrecognition during its letter recognition process, requesting verification or correction of the letter detected during letter recognition.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Faisman’s “teaching of editing multimedia stream”, Rutschman’s “teaching of voice discriminating tag for making  voice distinguishable ”, Seino’s “teaching of character reader “, and Funakura’s “teaching of character reader “  with Yamamichi’s “teaching of distinguishing ID information from images “ to suggest candidates for a read representation to be added by the addition unit, in order to allow efficient image data filing, as well as efficient image data searching and reproduction, as evidence by Yamamichi (see Par. 0017).

Regarding claim 7, Faisman does not teach the information processing apparatus according to claim 5, further comprising: a suggestion unit that suggests candidates for a read representation to be added by the addition unit.
Yamamichi teaches a suggestion unit that suggests candidates for a read representation to be added by the addition unit. (Yamamichi, Par. 0013:” Tokkaihei suggest] a letter recognition apparatus and an image inputting/outputting system using said letter recognition apparatus which sounds a warning signal to its operator when it encounters letters prone to misrecognition during its letter recognition process, requesting verification or correction of the letter detected during letter recognition.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Faisman’s “teaching of editing multimedia stream”, Rutschman’s “teaching of voice discriminating tag for making  voice distinguishable ”, Seino’s “teaching of character reader “, and Funakura’s “teaching of character reader “  with Yamamichi’s “teaching of distinguishing ID information from images “ to suggest candidates for a read representation to be added by the addition unit, in order to allow efficient image data filing, as well as efficient image data searching and reproduction, as evidence by Yamamichi (see Par. 0017).

Claims 15 -18 are rejected under 35 U.S.C. 103 as being unpatentable over Faisman (US20080177786A1), and in further view of Na et al. (US20140208209A1)(hereinafter  “Na”).

Regarding claim 15, Faisman does not teach the information processing apparatus according to claim 10, wherein in a case where a plurality of the specific 
Na teaches wherein in a case where a plurality of the specific characters are included in the voice data, the display control unit performs control to display a character string corresponding to the specific characters as a candidate. (Na, Par. 0140:” Upon receiving an input [e.g., a voice input "1"] for selecting any one of the recommended candidate words, the controller 180 can insert and display that specific one of the plurality of candidate words at the particular position, as shown in FIG. 17c.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Faisman’s “teaching of editing multimedia stream” with Na’s “teaching of editing a voice recognition result” to perform control to display a character string corresponding to the specific characters as a candidate, in order for editing a voice recognition result in a more convenient way, as evidence by Na (See Par. 0007).

Regarding claim 16, Faisman teaches the information processing apparatus according to claim 15, wherein the display control unit performs control to display an entire text of the voice data converted into text by the voice recognition unit and display a character string corresponding to the specific character as a candidate by changing display from other character strings. (Faisman, Par. 0016:” Currently, many situations occur when it is necessary to create, and synchronize a transcription of a multimedia file [i.e., files containing audio and video data components] with the original multimedia file [e.g., transcripts or translations of video files, media databases, captions of television programs, etc . . .]. ASR and automatic translation tools can be used to create initial draft transcriptions of a multimedia file.”, and Par. 0025:” The left-side display 205, displays the text of a transcription. All of the textual character data of the transcription is associated with annotated timing information. The right-side display 210 is configured to playback a multimedia data file. The right-side display 210 further comprises multimedia controls, thus allowing for the control of the listening/viewing aspects of a multimedia data.”, and Par. 0025:” FIG. 2 shows a screenshot of an editable transcription stream that is synchronized with a media data stream file.”).

Regarding claim 17, Faisman teaches the information processing apparatus according to claim 15, wherein the display control unit performs control to display a candidate for a character string corresponding to the specific character together with contexts before and after the character string. (Faisman, Par. 0007:” The method further comprises the steps of associating the discrete playback time annotation indicators of the audio data stream words, or phrases that are reproduced within the audio data stream with respective corresponding textual representations of the words, or phrases that are comprised within the transcript, editing the transcript of the audio data stream, and outputting the transcript, the video data and audio data streams in a predetermined data format.”).  Note: since 

Regarding claim 18, Faisman teaches the information processing apparatus according to claim 15, further comprising: a playback unit that plays back a candidate for a character string corresponding to the specific character together with contexts before and after the character string. (Faisman, Par. 0007:” The method further comprises the steps of associating the discrete playback time annotation indicators of the audio data stream words, or phrases that are reproduced within the audio data stream with respective corresponding textual representations of the words, or phrases that are comprised within the transcript, editing the transcript of the audio data stream, and outputting the transcript, the video data and audio data streams in a predetermined data format.”).  Note: since the audio data stream words or phrases, textual representation will be encompassed with the contexts before and after of the given stream.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Kuwabara et al. (U.S. Patent Application : US20060092487A1) teaches Par. 0012:” a video content creating apparatus comprising a photograph data input means for inputting photographic image data, a meta-information adding 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit 





/DARIOUSH AGAHI/             Examiner, Art Unit 2656                                                                                                                                                                                           
/EDGAR X GUERRA-ERAZO/             Primary Examiner, Art Unit 2656