Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-10 are pending. Claims 1, 7, and 9-10 are independent.
This Application was published as U.S. 2021/0327446.
Apparent priority:  10 March 2020.

Applicant and the assignee of this application have a duty of disclosure under 37 CFR 1.56.  Material and references that have come to the attention of the Applicant in such foreign prosecution proceedings must be submitted in an Information Disclosure Statement when considered material to patentability.
37 C.F.R. 1.56   Duty to disclose information material to patentability.
[Editor Note: Para. (c)(3) below is applicable only to patent applications filed under 35 U.S.C. 111(a) or 363 on or after September 16, 2012.]
(a) A patent by its very nature is affected with a public interest. The public interest is best served, and the most effective patent examination occurs when, at the time an application is being examined, the Office is aware of and evaluates the teachings of all information material to patentability. Each individual associated with the filing and prosecution of a patent application has a duty of candor and good faith in dealing with the Office, which includes a duty to disclose to the Office all information known to that individual to be material to patentability as defined in this section. … The Office encourages applicants to carefully examine:
(1) Prior art cited in search reports of a foreign patent office in a counterpart application, and
…

Drawings are NOT objected to.  However, the use of solid black lines for the drawings is strongly encouraged.  Color, shading, or dotted lines do not appear well or copy well in the black and white drawings of patents.
This Application is directed to a particular method of “diarization” which is a process by which parts of a conversation are transcribed and attributed to the different participants in the conversation.
Claim Objections
Claims 2 and 8 are objected to because of informalities that may be addressed with the following suggested amendments: 
2. The method of claim 1, wherein acquiring the speaker-specific voice recognition data includes: acquiring a first speaker-specific recognition result generated on an EPD (End Point Detection) basis from the voice conversation and a second speaker-specific recognition result generated every preset time from the voice conversation; and collecting the first speaker-specific recognition result and the second speaker-specific recognition result without overlap and redundancy therebetween to generate the speaker-specific voice recognition data. 
Claim 8 is similar to Claim 2.
Appropriate correction is required.

Claim 5 is objected to because of informalities that may be addressed with the following suggested amendments: 

5. The method of claim 1, wherein the merging includes determining the continuous utterance from the same speaker based on a silence period shorter than or equal to a predetermined time duration or a syntax feature related to a previous block. 

Appropriate correction is required.
35 U.S.C. 112(f) Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. 
Such claim limitation(s) is/are: “input unit” in Claim 7. These limitations are generic in the context of the art and don’t refer to any specific structure and only serve as placeholders for the structure that performs the associated function(s) without providing any information about what that structure is. MPEP 2181 I A says:
For a term to be considered a substitute for "means," and lack sufficient structure for performing the function, it must serve as a generic placeholder and thus not limit the scope of the claim to any specific manner or structure for performing the claimed function. It is important to remember that there are no absolutes in the determination of terms used as a substitute for "means" that serve as generic placeholders. The examiner must carefully consider the term in light of the specification and the commonly accepted meaning in the technological art. Every application will turn on its own facts.
Based on the ordinary skill in the art and description of functions of these components in the Specification, input unit refers to a microphone.
PLEASE NOTE: This is NOT a rejection. Please don’t address it as a rejection. If the Applicant does not agree with the INTERPRETATION, he may argue or amend to replace the terms interpreted under 112(f) with structural terms such as “microphone” as appropriately supported by the Specification. In the alternative, he may let the interpretation stand if the intent was to include a means plus function limitation in the Claim.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 9 and 10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter.
Claim 9 is directed to “20. A computer-readable recording medium ….”
The phrase “computer-readable recording medium” appears in the Specification as follows: “[0040] … computer-readable recording medium such as a magnetic medium such as a hard disk, a floppy disk, and magnetic tape, optical media such as a CD-ROM or DVD, a magneto-optical medium such as a floptical disk, and a hardware apparatus specially configured to store and execute program instructions such as a flash memory.” 
The examples are hardware and statutory.  But, the list is open ended by the introductory phrase “such as.”
No definition is provided for “machine readable medium.” Accordingly, the phrase “machine readable medium” is interpreted under its broadest reasonable interpretation and as such includes transitory wave media which are machine readable and yet non-statutory. The broadest reasonable interpretation of the Claim would then include non-statutory embodiments and the Claim as a whole is directed to non-statutory subject matter.
To overcome the rejection, see suggested amendment: “20. A non-transitory computer-readable recording medium ….”

Claim 10 is directed to:  “10. A computer program stored in a computer-readable recording medium ….”
“Computer program” per se is non-statutory.  A U.S. Claim cannot be directed to “a computer program.”  Further, as provided in the rejection of Claim 9, the “computer-readable medium” portion also includes non-statutory subject matter.
To overcome the rejection, it is suggested that Claim 10 is canceled.  Claim 9 if amended to overcome the rejection, already includes the scope of Claim 10.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2 and 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 includes:
2. The method of claim 1, wherein acquiring the speaker-specific voice recognition data includes: acquiring a first speaker-specific recognition result generated on an EPD (End Point Detection) basis from the voice conversation and a second speaker-specific recognition result generated every preset time from the voice conversation; and collecting the first speaker-specific recognition result and the second speaker-specific recognition result without overlap and redundance therebetween to generate the speaker-specific voice recognition data. 
Claim 8 has similar language.
The phrase “without overlap and redundance therebetween” is a relative term which renders the claim indefinite. The phrase is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 

The Specification repeats the same language of the Claim and does not elaborate on the meaning of the phrase.

Suggestion:  This phrase is not necessary to ascertaining of the meaning of the Claim and may be an artifact of translation from the original language.  You can amend to remove the phrase without a change in scope:
2. The method of claim 1, wherein acquiring the speaker-specific voice recognition data includes: acquiring a first speaker-specific recognition result generated on an EPD (End Point Detection) basis from the voice conversation and a second speaker-specific recognition result generated every preset time from the voice conversation; and collecting the first speaker-specific recognition result and the second speaker-specific recognition result  to generate the speaker-specific voice recognition data. 
However, if a specific meaning is intended by this phrase, then elaborate inside the Claim.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

 (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-3 and 5-10 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Jung (U.S. 20190392837).
Regarding Claim 1, Jung teaches:
1. A voice conversation reconstruction method performed by a voice conversation reconstruction apparatus, [Jung, “Examples described herein improve the way in which a transcript is generated and displayed so that the context of a conversation taking place during a meeting or another type of collaboration event can be understood by a person that reviews the transcript (e.g., reads or browses through the transcript)….”  Abstract.]
the method comprising: 
acquiring a plurality of speaker-specific voice recognition data corresponding to a plurality of speakers about voice conversation; [Jung, Figure 4, “Voice Recognition Profiles 412” teach the “speaker-specific voice recognition data” of the Claim. Figure 1, “Voice Recognition Module 126” is a speaker identification module.   “[0035] The voice recognition module 126 is configured to receive the meeting speech data 120 from the image capture device 116 and to recognize a voice that speaks an utterance. Thus, the voice recognition module 126 matches a voice with a voice recognition profile to identify a user that spoke. …”  “[0017] FIG. 4 is a diagram illustrating components of an example device configured to receive speech data, match a voice with a voice recognition profile, convert the speech data to text, and segment the text to generate a transcript that captures the context of a conversation.”]
dividing each of the plurality of the speaker-specific voice recognition data into a plurality of blocks using a boundary between tokens depending upon a predefined division criterion; [Jung, Figure 1, “Transcript Viewing Application 136” and Figure 2 showing the “sequence of text segments 138” separated by “User IDs 140” of the users (Lisa, Joe, Beth, Lisa again …” who spoke each segment.  “[0043] As shown, the graphical user interface 200 provides separation between individual text segments so that a viewer can better associate the text segment with a user that spoke the words. Furthermore, the user identifiers can include one or more graphical elements useable to enable the viewer to identify a user and/or gather information about the user. A graphical element can include a user name, a user alias, a user avatar, a user photo, a title, a user location, and so forth…”  Reference divides the recognized text/“recognition data” of the Claim into “text segments 204, 206, etc.”/“plurality of blocks.”  The “predefined criterion” of the Claim is taught by the change of speaker and voice.]
arranging the plurality of blocks of each of the plurality of the speaker-specific voice recognition data in chronological order irrespective of a speaker; [Jung, Figure 2 shows the arrangement of the text blocks in chronological order as the corresponding speech was output.  “Sequence of text segments 138.”  “[0037] … As shown via the second area 140 of the graphical user interface, the identifier <UserA> is graphically level with the first <text segment> listed in the first area 138 and thus a viewer of the transcript can deduce that UserA spoke the first text segment listed in the first area 138, <UserB> is graphically level with the second <text segment> listed in the first area 138 and thus the viewer can deduce that UserB spoke the second text segment listed, <UserC> is graphically level with the third <text segment> listed in the first area 138 and thus the viewer can deduce that UserC spoke the third text segment listed, <UserD> is level with the fourth <text segment> listed in the first area 138 and thus the viewer can deduce that UserD spoke the fourth text segment listed, and so forth.”]
merging blocks from continuous utterance of the same speaker among the arranged plurality of blocks; and [Jung, Figures 5B and 7 and flowcharts of Figures 8-9 show the “re-positioning”/reorganizing of the text segments according to speaker or subject/topic.  Figure 5B and Figure 8 teach this limitation.  In Figure 5B only the text segments spoken by Lisa R. are shown in a sequence.  Figure 8, 808.  “[0086] At operation 808, a transcript of the conversation or meeting is generated using the text. As described above, the transcript includes a sequence of text segments and an individual text segment in the sequence of text segments includes an utterance spoken by a single user of the multiple users.” ]
reconstructing the plurality of blocks subjected to the merging in a conversation format in chronological order and based on a speaker. [Jung, Figure 5B. “[0071] Upon receiving further user input that specifies a user identifier and/or keyword(s), the transcript generation module 428 is configured to search for and identify text segments in the sales meeting transcript 202 that include the user identifier and/or the keyword(s) specified by the user input. As shown in the example graphical user interface 508 of FIG. 5B, the sales meeting transcript 202 is filtered so that the identified text segments are displayed. In this example, a person reviewing the sales meeting transcript enters "Lisa R." into the text entry window 504 or selects the user identifier corresponding to Lisa R. in the user identifier selection area 506. In response, the text segments that capture utterances spoken by Lisa R. are configured for display and/or to be scrolled through. These text segments include text segments 204 and 210 from FIG. 2, but also text segments 510, 512, 514, and 516. Note that text segments 510, 512, 204, 210, 514, and 516 are displayed in an order in which the utterances are spoken by Lisa R.”]

    PNG
    media_image1.png
    527
    800
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    791
    542
    media_image2.png
    Greyscale


Regarding Claim 2, Jung teaches:
2. The method of claim 1, wherein acquiring the speaker-specific voice recognition data includes: 
acquiring a first speaker-specific recognition result generated on an EPD (End Point Detection) basis from the voice conversation and a second speaker-specific recognition result generated every preset time from the voice conversation; and [Jung, the End Point Detection basis in Jung is change of speaker which is detected by the speech recognizer based on speech profiles of the speakers.  Figure 3 shows the timeline on the left hand side of the drawing and shows how the “interrupting utterance 306” / “second speaker-specific recognition result.”  “[0047] …  Consequently, the transcript generation module 130 determines that utterance 306 is an interruption with regard to utterance 304 because two voices of two different people are detected and recognized during a same period of time, the time between t2 and t3.”  “[0048] Rather than generate a single flow text in which words of utterance 306 are interspersed with words of utterance 304 in a strictly time-based manner, the transcript generation module 130 separately identifies the words that comprise utterance 304 and the words that comprise utterance 306 using voice recognition profiles, and groups them into separate text segments to be displayed in the transcript. …”  (Note the description of “preset time” from the instant Application that follows.  According to the Specification of the instant Application, the “preset time” just means a time after the EPD of the first speaker:  “[0034] …For example, the speaker-specific data processor 121 may generate a first speaker-specific recognition result about the voice conversation on an EPD (End Point Detection) basis, and generate a second speaker-specific recognition result at each preset time. For example, the second speaker-specific recognition result may be generated after a last EPD at which the first speaker-specific recognition result is generated occurs….”  “[0045] …For example, the speaker-specific data processor 121 may generate a first speaker-specific recognition result about the voice conversation on an EPD (End Point Detection) basis, and generate a second speaker-specific recognition result at each preset time. For example, the second speaker-specific recognition result may be generated after a last EPD at which the first speaker-specific recognition result is generated occurs….”)]
collecting the first speaker-specific recognition result and the second speaker-specific recognition result without overlap and redundance therebetween to generate the speaker-specific voice recognition data. [Jung, Figure 1, 136 and Figure 2, 200 show the segmentation of the speech of the various speakers (first, second, third, etc.) and allocation of each segment to the corresponding speaker who spoke the segment.  If two speakers speak simultaneously such that their speech overlaps, the system of Jung separates them such that the end result is “without overlap and redundancy”:  “[0005] … In one example, the techniques combine a first set of words and a second set of words (e.g., a set can include one or more words), that are part of an utterance spoken by a user, into a single text segment. The techniques distinguish between the first set of words and the second set of words due to a detected interruption (e.g., the first set of words and the second set of words are separated by an interruption). For instance, the interruption can include a set of words spoken by another user. In one example, the interruption can be associated with an interjection of words that causes the user to pause for a short period of time (e.g., a few seconds) after the first set of words are spoken and before speaking the second set of words. The user may pause to listen to the words being spoken by the other user. In another example, the interruption can be associated with the other user beginning to speak his or her words at the same time the user is speaking the second set of words. Stated another way, the other user begins speaking before the user finishes speaking thereby resulting in an overlapping time period in which multiple people are speaking.”  “[0006] Consequently, the techniques described herein are configured to combine the first and second sets of words spoken by a single user into a single text segment even though there are intervening or overlapping words spoken by the other user. To this end, the first and second sets of words comprise an utterance spoken by the user and the single text segment can be placed in the sequence of text segments of the transcript before a subsequent text segment that captures the set of words spoken by the other user.”]

    PNG
    media_image3.png
    518
    760
    media_image3.png
    Greyscale


Regarding Claim 3, Jung teaches:
3. The method of claim 2, wherein the second speaker-specific recognition result is generated after a last EPD occurs. [Jung, Figure 2, the EPD is change of speaker and after the first speaker Lisa R the EPD/change of speaker occurs and then the second speaker Joe S. is recognized.]

Regarding Claim 5, Jung teaches:
5. The method of claim 1, wherein the merging include determining the continuous utterance from the same speaker based on a silence period shorter than or equal to a predetermined time duration or [Jung determines that the utterance is from a particular speaker based on the “voice recognition profile 412” of the user that is stored for each speaker.  Jung also teaches that if the pause/silence between two portions of speech by the same speaker is shorter than a predefined period of time (Figure 3, 308) the two portions are considered to be continuous.  “[0049] In various examples, the first and second sets of words being spoken within a predefined period of time 308 (e.g., the time between t1 and t3 is less than the predefined period of time 308) may be a condition that must be satisfied to combine the first and second sets of words into a single text segment given a situation where there is an interruption caused by another user speaking an utterance. For example, the predefined period of time 308 can be ten seconds, fifteen seconds, twenty seconds, thirty seconds, one minute, and so forth….”  The role of “predefined period of time 308” also applies to the converse situation:  “[0051] Note that the minimum threshold number of words condition applies in situations where the user continues to speak. Consequently, if a user says a small number of words without continuing to speak within the predefined period of time 308 (e.g., the user says "yes" or "no" in response to a question or the user says, "I agree" and stops speaking), then the user's word(s) can amount to an utterance and a corresponding text segment using the techniques described herein.”] a syntax feature related to a previous block.[Jung teaches both alternatives because it also includes the situation where the “linguistic unit” criterion is used to determine that the portions should be continuous/merged: “15. The method of claim 12, further comprising determining that the first set of words and the second set of words are part of a same linguistic unit, wherein the combining of the first set of words and the second set of words spoken by the first user into the corresponding utterance for the single text segment occurs based on the determining that the first set of words and the second set of words are part of the same linguistic unit.”] [Use of OR makes only one of the conditions limiting.  A reference that teaches one of the alternatives teaches the Claim.]

Regarding Claim 6, Jung teaches:
6. The method of claim 2, 
wherein the method further comprises outputting the voice recognition data reconstructed in the conversation format on a screen, [Jung, Figure 1, 138 or Figure 2 showing the screen with speech recognition results.  “… The techniques described herein further configure a graphical user interface layout, in which the transcript can be displayed….”  Abstract.]
wherein when the screen is updated, the speaker-specific voice recognition data is collectively updated or is updated based on the first speaker-specific recognition result. [Jung, Figure 2, the arrows at the top and bottom indicate a scroll features that updates the screen as more speech comes in.  “[0043] …As the user scrolls through the sequence of text segments, the user identifiers will also scroll to maintain the graphical association between a user identifier and a text segment.”]

Claim 7 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally:
7. A voice conversation reconstruction apparatus comprising: 
an input unit configured to receive voice conversation input; and [Jung, Figure 1, “speech capture device 116.”  Figure 4, “device 400.” ]
a processor configured to process voice recognition of the voice conversation received through the input unit, [Jung, Figure 4, “device 400.” “[0059] Device 400 includes one or more processing unit(s) 402, computer-readable media 404, input/output (I/O) interfaces 406 that enable the use of I/O devices, and communication interface(s) 408….”] wherein the processor is configured to: 
…

Claim 8 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally:
9. A computer-readable recording medium storing therein a computer program, wherein the computer program includes instructions for enabling, when the instructions are executed by a processor, the processor to: [Jung, Figure 4, “device 400.” “[0059] Device 400 includes one or more processing unit(s) 402, computer-readable media 404, input/output (I/O) interfaces 406 that enable the use of I/O devices, and communication interface(s) 408….”]
…

Claim 10 is a claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally:
10. A computer program stored in a computer-readable recording medium, wherein the computer program includes instructions for enabling, when the instructions are executed by a processor, the processor to: [Jung, Figure 4, “device 400.” “[0059] Device 400 includes one or more processing unit(s) 402, computer-readable media 404, input/output (I/O) interfaces 406 that enable the use of I/O devices, and communication interface(s) 408….”  “19. A system comprising: one or more processing units; and a computer-readable medium having encoded thereon computer-executable instructions ….”]
…
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Jung in view of Kahn (U.S. 20060149558).
Regarding Claim 4, Jung teaches and therefore suggests:
4. The method of claim 1, wherein the predefined division criterion includes a silence period longer than or equal to a predetermined time duration or a morpheme feature related to a previous token. [Jung uses the change of speaker, as determined by the speaker-recognition based on speaker voice profiles, as its EPD and line of demarcation and is not looking for silence which is the more usual method of end-point detection in speech.  But Jung teaches that a “linguistic unit condition” may also be used to determine the interruption in the speech.  The “linguistic unit condition” teaches or at the least suggests the “morpheme feature related to a previous token” of the Claim.  See Figure 3, “[0054] In various examples, a determination that a first set of words and a second set of words are part of a same linguistic unit can be used as a condition when creating text segments, so words spoken by a single user in a short period of time (e.g., five seconds, ten seconds, etc.) are grouped together in a single text segment rather than being chopped up into multiple different text segments, given a situation where there is an interruption caused by another user speaking an utterance. A linguistic unit can comprise a phrase, a clause, a sentence, a paragraph, or another type of linguistic unit that can be understood on its own from a grammar perspective. A type of linguistic unit (e.g., a sentence) can be predefined for a text segment.”]
Jung teaches the use of “linguistic units” to determined that separate parts of the utterance pertain to the same segment.  This teaching suggests the use of “morphemes” which are parts of a word as Claimed.
Khan teaches:
wherein the predefined division criterion includes a silence period longer than or equal to a predetermined time duration or a morpheme feature related to a previous token. [Khan, Figure 6, “segmentation parameters 635” and “silence detection 601.”  Kahn teaches that detection of “long silence” between utterances leads to segmentation and “long silence” teaches or suggests a silence longer than a threshold to be objective and usable by the machine.  “[0108] FIG. 6 is a flow diagram illustrating an overview of an exemplary embodiment of end point silence detection for segmentation of utterance for a speech segmentation module.”  “[0609] In one approach, as illustrated in FIG. 6, speech input segmentation is based upon detection of long silence between utterances…  Other data 630 may also be considered, such as average long silence length seen in other speakers.”  “[0612] By requiring longer pauses (silence) between words to define an utterance, the segmentation parameters 635 will result in fewer utterance segments. ….”  “[0093] Techniques are disclosed for user-dependent data generation of segmentation modules for speech that may be used for manual or automatic processing. In one approach, speech analysis may be used to determine typical silence length separating a speaker's utterances….”  Khan, Figure 6 also includes a “speech user profile (Fig. 3).”  Figure 3, “Speech user profile 312” is used for segmentation of speech according to speaker.]
Jung and Kahn pertain to segmentation of speech according to speaker and it would have been obvious to use the periods of silence/pause as indicators of segmentation from Kahn with the system of Jung that primarily uses speaker identification to determine segmentation and also relies on duration of pauses as auxiliary methods of determining segmentation.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Regarding Claim 5, Khan teaches:
5. The method of claim 1, wherein the merging include determining the continuous utterance from the same speaker based on a silence period shorter than or equal to a predetermined time duration or a syntax feature related to a previous block. [Kahn teaches that the voice model of a particular speaker is developed partly by determining the duration of silence that is characteristics of the particular speaker’s speech and is not considered an end-point or a boundary between one sentence and another.  “[0093] Techniques are disclosed for user-dependent data generation of segmentation modules for speech that may be used for manual or automatic processing. In one approach, speech analysis may be used to determine typical silence length separating a speaker's utterances. ….”  ]

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659