DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 24-44 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 11024316,  and over claims 1 and 17-18 of USPN 10978073.  Although the claims at issue are not identical, they are not patentably distinct from each other because they are obvious variants of the same invention.  Furthermore, the claims of the patent anticipate the claims of the application.

Claims of USPN 11024316				Claims of application
1. A computer-implemented method for receiving and processing a plurality of moment-associating elements, the method comprising: receiving the plurality of moment-associating elements, the plurality of moment-associating elements including a plurality of audio elements; transforming the plurality of moment-associating elements into one or more pieces of moment-associating information; and transmitting at least one piece of the one or more pieces of moment-associating information; wherein the transforming the plurality of moment-associating elements into one or more pieces of moment-associating information includes: transcribing the plurality of moment-associating elements into a plurality of transcribed elements by at least: transcribing the plurality of audio elements into a plurality of text elements; and transcribing two or more audio elements of the plurality of audio elements in conjunction with each other to extrapolate one or more tones corresponding to the plurality of text elements; segmenting the plurality of moment-associating elements into a plurality of moment-associating segments by at least segmenting the plurality of audio elements into a plurality of audio segments where any change in a text-corresponding tone occurs; assigning a segment speaker for each segment of the plurality of moment-associating segments by at least assigning a segment speaker for each segment of the plurality of audio segments; and generating the one or more pieces of moment-associating information based on at least the plurality of moment-associating segments and the segment speaker assigned for the each segment of the plurality of moment-associating segments.
2. The computer-implemented method of claim 1 wherein the receiving the plurality of moment-associating elements includes assigning a timestamp associated with each element of the plurality of moment-associating elements.
3. The computer-implemented method of claim 1 wherein the receiving the plurality of moment-associating elements includes receiving one or more visual elements or receiving one or more environmental elements.
4. The computer-implemented method of claim 1 wherein the plurality of audio elements includes at least one selected from a group consisting of one or more voice elements of one or more voice-generating sources and one or more ambient sound elements.
5. The computer-implemented method of claim 3 wherein the receiving one or more visual elements includes at least one selected from a group consisting of receiving one or more pictures, receiving one or more images, receiving one or more screenshots, receiving one or more video frames, receiving one or more projections, and receiving one or more holograms.
6. The computer-implemented method of claim 3 wherein the receiving one or more environmental elements includes at least one selected from a group consisting of receiving one or more global positions, receiving one or more location types, and receiving one or more moment conditions.
7. The computer-implemented method of claim 3 wherein the receiving one or more environmental elements includes at least one selected from a group consisting of receiving a longitude, receiving a latitude, receiving an altitude, receiving a country, receiving a city, receiving a street, receiving a location type, receiving a temperature, receiving a humidity, receiving a movement, receiving a velocity of a movement, receiving a direction of a movement, receiving an ambient noise level, and receiving one or more echo properties.
8. The computer-implemented method of claim 1, and further comprising: receiving one or more voice elements of one or more voice-generating sources; and receiving one or more voiceprints corresponding to the one or more voice-generating sources respectively.

9. The computer-implemented method of claim 8 wherein the transforming the plurality of moment-associating elements into one or more pieces of moment-associating information includes at least one selected from a group consisting of: transcribing the plurality of moment-associating elements into the plurality of transcribed elements based on at least the one or more voiceprints; segmenting the plurality of moment-associating elements into the plurality of moment-associating segments based on at least the one or more voiceprints; and assigning the segment speaker for the each segment of the plurality of moment-associating segments based on at least the one or more voiceprints.

10. The computer-implemented method of claim 8 wherein the receiving one or more voiceprints corresponding to the one or more voice-generating sources respectively includes at least one selected from a group consisting of: receiving one or more acoustic models corresponding to the one or more voice-generating sources respectively; and receiving one or more language models corresponding to the one or more voice-generating sources respectively.
11. The computer-implemented method of claim 1, wherein the transcribing the plurality of moment-associating elements into a plurality of transcribed elements includes: transcribing a first element of the plurality of moment-associating elements into a first transcribed element of the plurality of transcribed elements; transcribing a second element of the plurality of moment-associating elements into a second transcribed element of the plurality of transcribed elements; and correcting the first transcribed element based on at least the second transcribed element.
12. The computer-implemented method of claim 1 wherein the transcribing the plurality of moment-associating elements into a plurality of transcribed elements further includes at least one selected from a group consisting of: determining one or more speaker-change timestamps, each timestamp of the one or more speaker-change timestamps corresponding to a timestamp when a speaker change occurs; determining one or more sentence-change timestamps, each timestamp of the one or more sentence-change timestamps corresponding to a timestamp when a sentence change occurs; and determining one or more topic-change timestamps, each timestamp of the one or more topic-change timestamps corresponding to a timestamp when a topic change occurs.
13. The computer-implemented method of claim 12 wherein the segmenting the plurality of moment-associating elements into a plurality of moment-associating segments is performed based on at least one selected from a group consisting of: the one or more speaker-change timestamps; the one or more sentence-change timestamps; and the one or more topic-change timestamps.

14. The computer-implemented method of claim 1, and further comprising: establishing one or more anchor points based on at least the plurality of moment-associating elements; wherein: the one or more anchor points correspond to one or more timestamps respectively; and each anchor point of the one or more anchor points is navigable, searchable, or both navigable and searchable.
15. The computer-implemented method of claim 14, and further comprising: using the one or more anchor points to navigate the one or more pieces of moment-associating information based on at least the one or more timestamps.
16. The computer-implemented method of claim 14, wherein the one or more anchor points include at least one selected from a group consisting of a word, a phrase, a photo, and a screenshot.
17. The computer-implemented method of claim 1, and further comprising: obtaining one or more moment-associating photos, the one or more moment-associating photos being one or more parts of the plurality of moment-associating elements; wherein: the transforming the plurality of moment-associating elements into one or more pieces of moment-associating information includes transforming the one or more moment-associating photos into one or more anchor photos; and the one or more anchor photos correspond to one or more timestamps respectively.
18. The computer-implemented method of claim 17 wherein each anchor photo of the one or more anchor photos is navigable, searchable, or both navigable and searchable.
19. The computer-implemented method of claim 18, and further comprising: using the one or more anchor photos to navigate the one or more pieces of moment-associating information based on at least the one or more timestamps.

Claims 20-21 are similar to the above claims.
24. (New) A computer-implemented method for receiving and processing a plurality of moment-associating elements, the method comprising: receiving the plurality of moment-associating elements, the plurality of moment-associating elements including a plurality of audio elements; transforming the plurality of moment-associating elements into one or more pieces of moment-associating information by: segmenting the plurality of audio elements into a plurality of audio segments; assigning a segment speaker for each audio segment of the plurality of audio segments; transcribing the plurality of audio segments into a plurality of text elements; and generating the one or more pieces of moment-associating information based on at least the plurality of audio segments, the segment speaker assigned for the each audio segment of the plurality of audio segments, and the plurality of text elements; and transmitting at least one piece of the one or more pieces of moment-associating information.  



















25. (New) The computer-implemented method of claim 24, wherein the receiving the plurality of moment-associating elements includes assigning a timestamp associated with each element of the plurality of moment-associating elements.  
26. (New) The computer-implemented method of claim 24, wherein the receiving the plurality of moment-associating elements includes receiving one or more visual elements or receiving one or more environmental elements.  

27. (New) The computer-implemented method of claim 24, wherein the plurality of audio elements includes at least one selected from a group consisting of one or more voice elements of one or more voice-generating sources.  


28. (New) The computer-implemented method of claim 26, wherein the receiving one or more visual elements includes receiving one or more of at least one selected from a group consisting of pictures, images, screenshots, video frames, one or more projections, and/or holograms.  



29. (New) The computer-implemented method of claim 26, wherein the receiving one or more environmental elements includes receiving one or more of at least one selected from a group consisting of global positions, location types, and conditions associated with the one or more environmental element.  
30. (New) The computer-implemented method of claim 26, wherein the receiving one or more environmental elements includes receiving one or more of at least one selected from a group consisting of a longitude, a latitude, an altitude, a country, a city, a street, a location type, a temperature, a humidity, a movement, a velocity of a movement, a direction of a movement, an ambient noise level, and one or more echo properties.  




31. (New) The computer-implemented method of claim 24, and further comprising: receiving one or more voice elements of one or more voice-generating sources; and receiving one or more voiceprints corresponding to the one or more voice-generating sources respectively. 
 

32. (New) The computer-implemented method of claim 31, segmenting the plurality of audio elements into a plurality of audio segments includes segmenting the plurality of audio elements into the plurality of audio segments based on at least the one or more voiceprints; wherein the assigning a segment speaker for each audio segment of the plurality of audio segments includes assigning the segment speaker for the each segment of the plurality of audio segments based on at least the one or more voiceprints; and wherein the transcribing the plurality of audio segments into a plurality of text elements includes transcribing the plurality of audio segments into the plurality of text elements based on at least the one or more voiceprints.  


33. (New) The computer-implemented method of claim 31, wherein the receiving one or more voiceprints corresponding to the one or more voice-generating sources respectively includes receiving one or more acoustic models and/or language models corresponding to the one or more voice-generating sources respectively.  





34. (New) The computer-implemented method of claim 24, wherein the transcribing the plurality of audio segments into a plurality of text elements includes: transcribing a first element of the plurality of audio segments into a first text element of the plurality of text elements; transcribing a second element of the plurality of audio segments into a second text element of the plurality of text elements; and correcting the first text element based on at least the second text element.  



35. (New) The computer-implemented method of claim 24, wherein the transcribing the plurality of audio segments into a plurality of text elements includes: determining one or more speaker-change timestamps, wherein each timestamp of the one or more speaker-change timestamps corresponds to a timestamp when a speaker change occurs, a sentence change occurs, and/or a topic change occurs.  










36. (New) The computer-implemented method of claim 35, wherein the segmenting the plurality of audio elements into a plurality of audio segments includes segmenting the plurality of audio elements into a plurality of audio segments based on at least one selected from a group consisting of the one or more speaker-change timestamps, the one or more sentence- change timestamps, and the one or more topic-change timestamps.  

37. (New) The computer-implemented method of claim 24, and further comprising: establishing one or more anchor points based on at least the plurality of moment-associating elements; wherein: the one or more anchor points correspond to one or more timestamps respectively; and each anchor point of the one or more anchor points is navigable, searchable, or both navigable and searchable.  
38. (New) The computer-implemented method of claim 37, and further comprising using the one or more anchor points to navigate the one or more pieces of moment-associating information based on at least the one or more timestamps. 
 
39. (New) The computer-implemented method of claim 37, wherein the one or more anchor points include at least one selected from a group consisting of a word, a phrase, a photo, and a screenshot.  

40. (New) The computer-implemented method of claim 24, and further comprising: obtaining one or more moment-associating photos, the one or more moment-associating photos being one or more parts of the plurality of moment-associating elements, and transforming the one or more moment-associating photos into one or more anchor photos, wherein the one or more anchor photos correspond to one or more timestamps respectively.  




41. (New) The computer-implemented method of claim 40 wherein each anchor photo of the one or more anchor photos is navigable, searchable, or both navigable and searchable. 
 
42. (New) The computer-implemented method of claim 41, and further comprising: using the one or more anchor photos to navigate the one or more pieces of moment- associating information based on at least the one or more timestamps.


Claims 43-44 are similar the above claims.


Claims of USPN 10978073				Claims of application
1.A system for processing and presenting a conversation, the system comprising: a sensor configured to capture an audio-form conversation; a controller configured to switch the sensor between a capturing state and an idling state; an interface configured to receive a user instruction to instruct the controller to switch the sensor between the capturing state and the idling state; a processor configured to: automatically transform the audio-form conversation into a transformed conversation, the transformed conversation including a synchronized text, the synchronized text being synchronized with the audio-form conversation; automatically generate one or more segments of the audio-form conversation and one or more segments of the synchronized text by at least automatically segmenting the audio-form conversation and the synchronized text when a speaker change occurs or a natural pause occurs such that each segment of the one or more segments of the audio-form conversation is spoken by only one speaker in audio form and is synchronized with only one segment of the one or more segments of the synchronized text; and automatically assign only one speaker label to each segment of the one or more segments of the synchronized text, each one speaker label representing one speaker; and a presenter configured to present the transformed conversation including the synchronized text and the audio-form conversation.
Claims 17-18 are similar to claim 1 above.
24. (New) A computer-implemented method for receiving and processing a plurality of moment-associating elements, the method comprising: receiving the plurality of moment-associating elements, the plurality of moment-associating elements including a plurality of audio elements; transforming the plurality of moment-associating elements into one or more pieces of moment-associating information by: segmenting the plurality of audio elements into a plurality of audio segments; assigning a segment speaker for each audio segment of the plurality of audio segments; transcribing the plurality of audio segments into a plurality of text elements; and generating the one or more pieces of moment-associating information based on at least the plurality of audio segments, the segment speaker assigned for the each audio segment of the plurality of audio segments, and the plurality of text elements; and transmitting at least one piece of the one or more pieces of moment-associating information.  













Claims 43-44 are similar to claims 24 above.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 24-27, 28-30, 35-37, 39, and 43-44 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kanevsky et al. (USPN 6434520, hereinafter referred to as Kanevsky).

Regarding claims 24 and 43-44, Kanevsky discloses a computer-implemented method, system, and non-transitory computer-readable medium for receiving and processing a plurality of moment-associating elements, the method, system, and non-transitory computer-readable medium comprising: 
receiving the plurality of moment-associating elements, the plurality of moment-associating elements including a plurality of audio elements (figure 1, receiving audio data at step 101; the audio data includes speech of multiple speakers); 
transforming the plurality moment-associating elements into one or more pieces of moment-associating information by: (process in figures 2A-B, audio segments are assigned with speaker IDs and stored) 
segmenting the plurality of audio elements into a plurality of audio segments (figure 1, segmentation step 103; also referring to col. 3, lines 19-26); 
assigning a segment speaker for each audio segment of the plurality of audio segments (figure 2B, step 206, assigning speaker ID tag to each segment);
 transcribing the plurality of audio segments into a plurality of text elements (figure 2B, step 212, transcribe each segment); and 
generating the one or more pieces of moment-associating information based on at least the plurality of audio segments, the segment speaker assigned for the each audio segment of the plurality of audio segments, and the plurality of text elements (figure 2B, step 213, generating and storing segments of audio with selected indexing information; also see col. 8, lines 5-20); and 
transmitting at least one piece of the one or more pieces of moment-associating information (process in figures 2A-B, audio segments are assigned with speaker IDs and are sent/transmitted to a storage).  

Regarding claims 25-27, Kanevsky further discloses the computer-implemented method of claim 24 wherein the receiving the plurality of moment-associating elements includes assigning a timestamp associated with each element of the plurality of moment-associating elements (col. 3, lines 19-29, “time stamping”); wherein the receiving the plurality of moment-associating elements includes receiving one or more visual elements or receiving one or more environmental elements (figure 1, steps 105, 107, and 109, from audio elements and environment elements; also referring to col. 2, line 63 to col. 3, line 10); wherein the receiving the plurality of audio elements includes at least one selected from a group consisting of receiving one or more voice elements of one or more voice-generating sources (figure 1, steps 105, 107, and 109, from audio elements and environment elements; also referring to col. 2, line 63 to col. 3, line 10).  

Regarding claims 29-30, Kanevsky further discloses the computer-implemented method of claim 26 wherein the receiving one or more environmental elements includes receiving one or more of at least one selected from a group consisting of global positions, location types, and conditions associated with the one or more environmental element (figure 1, steps 105, 107, and 109, from audio elements and environment elements; also referring to col. 2, line 63 to col. 3, line 10, various noise conditions; also col. 3, line 52-67, stationary noise condition); wherein the receiving one or more environmental elements includes receiving one or more of at least one selected from a group consisting of a longitude, a latitude, an altitude, a country, a city, a street, a location type, a temperature, a humidity, a movement, a velocity of a movement, a direction of a movement, an ambient noise level, and one or more echo properties (col. 3, lines 1-10, various noise sources implying location of the noise).  

Regarding claims 35-37 and 39, Kanevsky further discloses the computer-implemented method of claim 24 wherein the transcribing the plurality of audio segments into a plurality of text elements includes:: determining one or more speaker-change timestamps (col. 3, lines 19-29, “time stamping”), wherein each timestamp of the one or more speaker-change timestamps corresponds a timestamp when a speaker change occurs (col. 3, lines 19-29, “time stamping”); wherein the segmenting the plurality of audio elements into a plurality of audio segments includes segmenting the plurality of audio elements into a plurality of audio segments based on at least one selected from a group consisting of the one or more speaker-change timestamps (col. 3, lines 19-29, “time stamping”), the one or more sentence-change timestamps, and the one or more topic-change timestamps (col. 6, line 62 to col. 7, line 20; and col. 8, lines 21-36; and col. 9, line 47 to col. 10, line 9; determining topic for each segment; timestamps are already determined for each segment); and further comprising: establishing one or more anchor points based on at least the plurality of moment-associating elements (col. 3, lines 22-29, “start time, end time” are considered anchor points); wherein: the one or more anchor points correspond to one or more timestamps respectively (col. 3, lines 22-29, “start time, end time” are considered anchor points and are associated with the timestamp); and each anchor point of the one or more anchor points is navigable, searchable, or both navigable and searchable (col. 3, lines 11-34, col. 8, line 49 to col. 9, line 29, searching and retrieving audio segments belonging to a particular speaker); and wherein the one or more anchor points include at least one selected from a group consisting of a word, a phrase, a photo, and a screenshot (col. 8, lines 58-66, “keywords/content”).  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 34 is rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky in view of Wise et al. (USPG 2007/0118374, hereinafter referred to as Wise).

Regarding claim 34, Kanevsky further discloses the computer-implemented method of claim 24, wherein the transcribing the plurality of audio segments into a plurality of text elements includes: transcribing a first element of the plurality of audio segments into a first text element of the plurality of text elements (figure 1, step 109 and/or col. 6, lines 39-60; transcribing all segments); transcribing a second element of the plurality of audio segments into a second text element of the plurality of text elements (figure 1, step 109 and/or col. 6, lines 39-60; transcribing all segments).
Kanevsky fails to explicitly disclose, however, Wise teaches correcting the first text element based on at least the second text element (figure 3 and/or paragraph 35; correcting error in a transcript produced by a speech segment based on context determined from the transcripts of a plurality of segments).  
Since Kanevsky and Wise are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of correcting a transcript produced from a particular speech segment based on a context determined from a plurality of segments.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Claims 28, 38 and 40-42 are rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky in view of Chou et al. (USPG 2013/0300939, hereinafter referred to as Chou).

Regarding claim 28, Kanevsky fails to explicitly disclose, however, Chou teaches the computer-implemented method of claim 26 wherein the receiving one or more visual elements includes receiving one or more of at least one selected from a group consisting of pictures, images, screenshots, video frames, projections, and holograms (process in figure 4 and/or referring to paragraph 19).  
Since Kanevsky and Chou are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of using “speaker segmentation algorithms to improve accuracy of scene segmentation algorithms and vice versa to enable efficient and accurate identification of various scenes and speakers” (paragraph 19).  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Regarding claim 38, Kanevsky fails to explicitly disclose, however, Chou teaches. The computer-implemented method of claim 37, and further comprising: using the one or more anchor points to navigate the one or more pieces of moment-associating information based on at least the one or more timestamps (process in figure 4 and/or referring to paragraphs 15, 19, 22, and 69).  
Since Kanevsky and Chou are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of using “speaker segmentation algorithms to improve accuracy of scene segmentation algorithms and vice versa to enable efficient and accurate identification of various scenes and speakers” (paragraph 19).  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Regarding claims 40-42, Kanevsky fails to explicitly disclose, however, Chou teaches the computer-implemented method of claim 24, and further comprising: obtaining one or more moment-associating photos, the one or more moment-associating photos being one or more parts of the plurality of moment-associating elements (process in figure 4 and/or referring to paragraph 19), and transforming the one or more moment-associating photos into one or more anchor photos, wherein the one or more anchor photos correspond to one or more timestamps respectively (process in figure 4 and/or referring to paragraphs 15, 19 and 69); wherein each anchor photo of the one or more anchor photos is navigable, searchable, or both navigable and searchable (col. 3, lines 11-34, col. 8, line 49 to col. 9, line 29, searching and retrieving audio segments belonging to a particular speaker); and further comprising: using the one or more anchor photos to navigate the one or more pieces of moment- associating information based on at least the one or more timestamps (paragraphs 15, 19, and 22).
Since Kanevsky and Chou are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of using “speaker segmentation algorithms to improve accuracy of scene segmentation algorithms and vice versa to enable efficient and accurate identification of various scenes and speakers” (paragraph 19).  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Claim 31-33 are rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky in view of Maes (USPN 6088669).

Regarding claim 31, Kanevsky further discloses the computer-implemented method of claim 24, and further comprising: receiving one or more voice elements of one or more voice-generating sources (figure 1, audio data 101 includes speech of multiple speakers).  Kanevsky fails to explicitly disclose, however, Maes teaches receiving one or more voiceprints corresponding to the one or more voice-generating sources respectively (referring to figure 1; after the speaker is identified by the speaker identification module 410, speaker-dependent HMM models 440 and speaker-dependent language models and vocabularies 450 are loaded into the speech recognizer 120).  
Since Kanevsky and Maes are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of using speaker-dependent HMM models and language models to recognize speech in order ot improve speech recognition accuracy.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Regarding claims 32-33, Kanevsky fails to explicitly disclose, however, Maes teaches segmenting the plurality of audio elements into a plurality of audio segments includes segmenting the plurality of audio elements into the plurality of audio segments based on at least the one or more voiceprints (figure 1, segmentation step 103; also referring to col. 3, lines 19-26); wherein the assigning a segment speaker for each audio segment of the plurality of audio segments includes assigning the segment speaker for the each segment of the plurality of audio segments based on at least the one or more voiceprints (figure 2B, step 206-207, assigning speaker ID tag to each segment); and wherein the transcribing the plurality of audio segments into a plurality of text elements includes transcribing the plurality of audio segments into the plurality of text elements based on at least the one or more voiceprints (figure 2B, step 212, transcribe each segment); wherein the receiving one or more voiceprints corresponding to the one or more voice-generating sources respectively includes receiving one or more acoustic models and/or language models corresponding to the one or more voice-generating sources respectively (figure 1, step 106 and/or figure 2B, steps 207-208).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Bradley et al. (USPG 2017/0294184) teach a method for segmenting speech utterance and assigning speaker ID that is considered pertinent to the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUYEN X VO whose telephone number is (571)272-7631. The examiner can normally be reached M-F, 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HUYEN X VO/Primary Examiner, Art Unit 2656