Status of Claims
Claims 1-22 are pending.
This communication is in response to the communication filed 9/26/2019.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 03/19/2020, which was before the mailing of a first Office action on the merits.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
 Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hirschberg (US 8600745 B2). 
As per claim 1, Hirschberg discloses a method of processing audio content performed by a content processing system comprising a processor and a non-transitory computer-readable storage medium storing instructions that, when executed, cause the content processing system to perform the method (see Hirschberg, US 8600745 B2, claim 7 [not recited in spec or shown in figures], which notes a system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, perform operations comprising: transcribing a voicemail message), the method comprising: 
receiving a first audio content file (see Hirschberg, col. 4, lines 8-10 (16), which notes in step 310 of FIG. 5, one or more speech file(s) are received); 
generating, based on the first audio content file, a first text file comprising transcribed text corresponding to the first audio content file (see Hirschberg, col. 4, lines 10-12 (16), which notes in step 320 of FIG. 5, automatic speech recognition is performed on such speech file(s), such a via an automatic speech recognition component discussed earlier herein; and see Hirschberg, col. 4, lines 34-37 (17), which notes in the present invention, automatic speech recognition is used to analyze the speech signals contained in a speech file, such as a voicemail message to produce a textual transcript/first text file of the speech signals in the voicemail message); 
extracting one or more words from the first text file (see Hirschberg, col. 4, lines 13-14 (16), which notes in step 330 of FIG. 5, the speech file(s) are indexed, such as shown in FIG. 3; and see Hirschberg, col. 3, lines 37-41 (14), which notes the transcripts are then indexed by message indexing component 106 to produce a transcript index, such as shown in FIG. 3, wherein each word in the transcript is indexed relative to the occurrence of the word in the speech file); 
identifying a plurality of segments in the first text file based on the one or more words (see Hirschberg, col. 4, lines 13-14 (16), which notes in step 330 of FIG. 5, the speech file(s) are indexed, such as shown in FIG. 3; see Hirschberg, col. 4, lines 15-16 (16), which notes in step 340, a transcript—such as shown in FIG. 4—of the indexed speech file(s) is provided to a user; see Hirschberg, col. 3, line 67—col. 4 line 5, which notes from within message transcript section 220 as shown in FIG. 4, text of the message selected within message summary section 210 is provided, where a portion or portions of the transcript text may be selected within message transcript section 220, such as shown by selected non-contiguous portions 240, 244 and 248;  and see Hirschberg, FIG. 3, which shows segmenting of the transcript by addresses/time stamps); and 
generating a second audio content file, the second audio content file comprising audio content from the first audio content file corresponding to at least a subset of the plurality of segments (see Hirschberg, col. 4, lines 14-29 (16), which notes a transcript of the indexed speech file(s) is provided to a user, such as shown in FIG. 4, step 340. The user's selection of one or more portion(s) of the speech file(s) transcript is received, such as also shown previously in FIG. 4, step 350. The selected portion(s) of speech file transcript is provided to one or more entities or parties specified by user, step 360, such as via selection delivery component, discussed earlier herein. In one embodiment, the entities or parties may simply be electronic mail addresses or user names specified by the user to which the selected portion or portions/segments of the transcript will be provided to. The specified recipients of the transcript portion or portions may receive the portions in both a textual and an audible format/second audio content file. For example, the portion or portions selected may be provided as text within an electronic mail message with an attachment of an audio file which corresponds to the selected portion or portions/segments).  

As per claim 16, Hirschberg teaches all of the limitations of claim 1 above.
Hirschberg further teaches wherein the method further comprises generating a second text file, the second text file comprising transcribed text corresponding to the second audio content file (see Hirschberg col. 4., lines 6-19, which notes one or more speech file(s)/first and third audio content files are received in step 310, ASR is performed on such speech file(s) in step 320, the speech file(s) are indexed in step 330, a transcript of the indexed speech file(s) is provided to a user in step 340, the user's selection of one or more portion(s) of the speech file(s)/first and third audio content files’ transcript is received in step 350, the respective selected portion(s) of speech file transcript is provided to one or more entities or parties specified by user in step 360, where the specified recipients of the transcript portion or portions of the first and third audio content files may receive the respective portions in both a textual and an audible format, and where the portion or portions selected may be provided as respective text within an electronic mail message with respective attachment of an audio file which corresponds to the respective selected portion or portions).  

Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Hirschberg (US 8600745 B2) in view of Meteer (US 20190180175  A1).  
As per claim 2, Hirschberg teaches all of the limitations of claim 1 above.
Hirschberg fails to specifically teach wherein the method further comprises: extracting one or more audio features from the first audio content file; wherein identifying the plurality of segments in the text file is further based on the extracted one or more audio feature.
However, Meteer does teach wherein the method further comprises: 
extracting one or more audio features from the first audio content file (see Meteer, US 20190180175 A1, [0039], which notes the speech recognition engine 306 can translate audio captured from telephone calls and video conferences between contact center agents (e.g., IVRs or CSRs) and customers, voicemails from customers, instant messages attaching audio, and other electronic communications including audio or video data. In some embodiments, the speech recognition engine 306 can annotate text translated from audio data to identify users speaking at corresponding portions of the text, confidence levels of the speech-to-text translation of each word or phrase (or denote translations below a confidence threshold), prosodic features of utterances (e.g., pitch, stress, volume, etc.), temporal features (e.g., durations of segments of speech, pauses or other idle time, etc.), and other metadata); 
wherein identifying the plurality of segments in the text file is further based on the extracted one or more audio features (see Meteer, US 20190180175 A1, [0041] The text feature extractor 310 can annotate the words and phrases of a communication with their characteristics or features relevant to segmentation and other processes further down in the pipeline. In some embodiments, the segmentation engine 312 can segment a communication into sentences based on temporal features and lexical features of the words and phrases of the communication. The text feature extractor 310 can parse a communication, identify the feature values for the words and phrases of the communication, and generate a representation of the communication (e.g., a feature vector or matrix)).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Hirschberg with the features data store as taught by Meteer in order to use unsupervised learning to discover latent features or otherwise improve segmentation, clustering, and/or classification (see Meteer [0075], which notes the features data store 336 can store the features extracted by the text feature extractor 310, the segment feature extractor 314, and/or the cluster feature extractor 318 so that the features may be used for different stages of the communication data processing pipeline, for data mining, for unsupervised learning to discover latent features or otherwise improve segmentation, clustering, and/or classification, for historical reporting, or other suitable purpose).
The combination of Hirschberg and Meteer includes predictable results, such as historical reporting of segment features.

As per claim 3, Hirschberg in view of Meteer teaches all of the limitations of claim 2 above.
Hirschberg fails to specifically teach wherein the one or more audio features comprise one or more of a pause, vocal pitch, vocal timber, speech rate, vocal emotion, vocal volume, vocal emphasis, vocal patterns, and instrumental interludes.

wherein the one or more audio features comprise one or more of a pause, vocal pitch, vocal timber, speech rate, vocal emotion, vocal volume, vocal emphasis, vocal patterns, and instrumental interludes (see Meteer, US 20190180175 A1, [0039], which notes the speech recognition engine 306 can annotate text translated from audio data to identify users speaking at corresponding portions of the text, confidence levels of the speech-to-text translation of each word or phrase (or denote translations below a confidence threshold), prosodic features of utterances (e.g., pitch, stress, volume, etc.), temporal features (e.g., durations of segments of speech, pauses or other idle time, etc.), and other metadata).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Hirschberg with the features data store as taught by Meteer in order to use unsupervised learning to discover latent features or otherwise improve segmentation, clustering, and/or classification (see Meteer [0075], which notes the features data store 336 can store the features extracted by the text feature extractor 310, the segment feature extractor 314, and/or the cluster feature extractor 318 so that the features may be used for different stages of the communication data processing pipeline, for data mining, for unsupervised learning to discover latent features or otherwise improve segmentation, clustering, and/or classification, for historical reporting, or other suitable purpose).
The combination of Hirschberg and Meteer includes predictable results, such as historical reporting of segment features.

Claims 4-9 are rejected under 35 U.S.C. 103 as being unpatentable over Hirschberg (US 8600745 B2) in view of Srivastava (US 20040006748 A1).  

As per claim 4, Hirschberg teaches all of the limitations of claim 1 above.
Hirschberg fails to specifically teach wherein identifying the plurality of segments in the text file comprises identifying, based on the extracted one or more words, one or more topics, wherein each segment of the at least a subset of the plurality of segments is associated with at least one topic of the identified one or more topics.
However, Srivastava does teach wherein identifying the plurality of segments in the text file comprises identifying, based on the extracted one or more words, one or more topics, wherein each segment of the at least a subset of the plurality of segments is associated with at least one topic of the identified one or more topics (see Srivastava FIG. 3, which notes segmenting in 310, receiving the segments from a previous step as in 320, and assigning topics to the segments in at least 360; see Srivastava [0042], which notes audio classification logic 310 may group speech segments from the same speaker and send the segments to speech recognition logic 320; see Srivastava [0043], which notes speech recognition logic 320 may perform continuous speech recognition to recognize the words spoken in the segments that it receives from audio classification logic 310; and see Srivastava [0045], which notes topic classification logic 360 may assign topics to the transcription (e.g., to transcription 400, as noted in Srivastava [0043]).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the user selection of transcribed text of (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents).
The combination of Hirschberg and Srivastava includes predictable results, such as a generating of an email alert based on a determination of relevancy of a synonym for a word in a speech segment.

As per claim 5, Hirschberg in view of Srivastava teaches all of the limitations of claim 4 above.
Hirschberg fails to specifically teach wherein identifying the one or more topics further comprises identifying one or more secondary words based on the extracted one or more words and a taxonomy library, wherein identifying the one or more topics is further based on the one or more secondary words.  
However, Srivastava does teach wherein identifying the one or more topics further comprises identifying one or more secondary words based on the extracted one or more 35words (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents) and a taxonomy library (see Srivastava [0045], which notes topic classification logic 360 may assign topics to the transcription, where of the words in the transcription may contribute differently to each of the topics assigned to the transcription. Topic classification logic 360 may generate a rank-ordered list/taxonomy of all possible topics and corresponding scores for the transcription), wherein identifying the one or more topics is further based on the one or more secondary words (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the user selection of transcribed text of the systems and methods as taught by Hirschberg with the relevancy determination based on synonyms as taught by Srivastava so that text of a document may be determined relevant even if it does not have the same words of a particular text source  (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents).
The combination of Hirschberg and Srivastava includes predictable results, such as a generating of an email alert based on a determination of relevancy of a synonym for a word in a speech segment.

As per claim 6, Hirschberg teaches all of the limitations of claim 1 above.
Hirschberg fails to specifically teach wherein the method further comprises: 
analyzing the first audio content file to identify one or more speakers; and 
labeling portions of the text file based on the identified one or more speakers; wherein identifying the plurality of the segments in the text file is further based on the labeled portions of the text file.
However, Srivastava does teach wherein the method further comprises:
analyzing the first audio content file to identify one or more speakers (see Srivastava [0038], which notes audio indexer 122 may receive an input audio stream from audio sources 112 and generate metadata therefrom. For example, indexer 122 may segment the input stream by speaker, cluster audio segments from the same speaker, identify speakers by name or gender, and transcribe the spoken words. Indexer 122 may also segment the input stream based on topic and locate the names of people, places, and organizations. Indexer 122 may further analyze the input stream to identify when each word was spoken (possibly based on a time value). Indexer 122 may include any or all of this information in the metadata relating to the input audio stream); and 
labeling portions of the text file based on the identified one or more speakers; wherein identifying the plurality of the segments in the text file is further based on the labeled portions of the text file (see Srivastava [0041], which notes FIG. 3 is an exemplary diagram of indexer 122; and see Srivastava [0044], which notes with reference to FIG. 3, speaker clustering logic 330 may identify all of the segments from the same speaker in a single document (i.e., a body of media that is contiguous in time (from beginning to end or from time A to time B)) and group them into speaker clusters. Speaker clustering logic 330 may then assign each of the speaker clusters a unique label. Speaker identification logic 340 may identify the speaker in each speaker cluster by name or gender; and see Srivastava [0046], which notes FIG. 5 is a diagram of exemplary text 500 that includes representations of metadata that may be output from story segmentation logic 370. The metadata text/text file may also include other information not shown in FIG. 5, such as data that identifies the type of media involved, data that identifies the source of the input stream, data that identifies relevant topics, data that identifies speaker name or gender, data that identifies names of people, places, or organizations, and data that identifies the start and duration of each word spoken).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the user selection of transcribed text of the systems and methods as taught by Hirschberg with the relevancy determination based on (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents).
The combination of Hirschberg and Srivastava includes predictable results, such as a generating of an email alert based on a determination of relevancy of a synonym for a word in a speech segment.

As per claim 7, Hirschberg in view of Srivastava teaches all of the limitations of claim 6 above.
Hirschberg fails to specifically teach wherein the identified one or more speakers comprise one or more uniquely identified speakers.
 However, Srivastava does teach wherein the identified one or more speakers comprise one or more uniquely identified speakers (see Srivastava [0038], which notes audio indexer 122 may receive an input audio stream from audio sources 112 and uniquely identify speakers by name and transcribe the spoken words; and see Srivastava [0044], which notes speaker identification logic 340 may identify the speaker in each speaker cluster by name or gender). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the user selection of transcribed text of the systems and methods as taught by Hirschberg with the relevancy determination based on synonyms as taught by Srivastava so that text of a document may be determined relevant even if it does not have the same words of a particular text source  (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents).
The combination of Hirschberg and Srivastava includes predictable results, such as a generating of an email alert based on a determination of relevancy of a synonym for a word in a speech segment.

As per claim 8, Hirschberg in view of Srivastava teaches all of the limitations of claim 6 above.
Hirschberg fails to specifically teach wherein the identified one or more speakers comprise one or more generically identified speakers.
 wherein the identified one or more speakers comprise one or more generically identified speakers (see Srivastava [0038], which notes audio indexer 122 may receive an input audio stream from audio sources 112 and identify speakers by gender, and transcribe the spoken words; and see Srivastava [0044], which notes speaker identification logic 340 may identify the speaker in each speaker cluster by name or gender). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the user selection of transcribed text of the systems and methods as taught by Hirschberg with the relevancy determination based on synonyms as taught by Srivastava so that text of a document may be determined relevant even if it does not have the same words of a particular text source  (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents).
The combination of Hirschberg and Srivastava includes predictable results, such as a generating of an email alert based on a determination of relevancy of a synonym for a word in a speech segment.
As per claim 9, Hirschberg teaches all of the limitations of claim 1 above.

However, Srivastava does teach wherein identifying the plurality of segments in the text file is further based on one or more of a lexical feature, a grammatical feature, and a syntactic features of the text file (see Srivastava [0046], which notes story segmentation logic 370 may change the continuous stream of words in the transcription into document-like units with coherent sets of topic labels and other document features generated or identified by the components of indexer 122. This information may constitute metadata corresponding to the input audio stream. FIG. 5 is a diagram of exemplary text 500 that includes representations of metadata that may be output from story segmentation logic 370. Text 500 may include linguistic/grammatical data, such as punctuation and capitalization. The metadata text may also include other information not shown in FIG. 5, such as data that identifies the type of media involved, data that identifies the source of the input stream, data that identifies relevant topics, data that identifies speaker name or gender, data that identifies names of people, places, or organizations, and data that identifies the start and duration of each word spoken. Story segmentation logic 370 may output the metadata in the form of documents to alert logic 130, where a document corresponds to a body of media that is contiguous in time (from beginning to end or from time A to time B)).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the user selection of transcribed text of the systems and methods as taught by Hirschberg with the relevancy determination based on (see Srivastava [0065], which notes the alert logic 130 may also determine the relevance of the documents to the event defined in the user profile (act 730). For example, alert logic 130 may use the statistical language model to find similarities between words in the documents and the words in the example documents. Because the statistical language model looks for similarities based on specific words or word synonyms, a document may be determined relevant even if it does not have the same words as the example documents).
The combination of Hirschberg and Srivastava includes predictable results, such as a generating of an email alert based on a determination of relevancy of a synonym for a word in a speech segment.

Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Hirschberg (US 8600745 B2) in view of Sherwani (US 20080177536 A1).  
As per claim 10, Hirschberg teaches all of the limitations of claim 1 above.
Hirschberg fails to specifically teach wherein identifying the plurality of segments in the text file comprises identifying at least one filtered segment, the at least one filtered segment comprising at least one of an introduction segment, an advertisement segment, and a conclusion segment, a lower quality audio, a less informative audio segment, and instrumental interludes.  
(see Sherwani [0020], which notes, at step 304, words from the speech are recognized by speech recognizer 110 to form a transcript of the audio segment. The words in the transcript are aligned with the speech audio segment at step 306. During alignment, word boundaries within the A/V content 114 are identified. At least a portion of the words are then displayed at step 308 in a user interface, such as user interface 106), the at least one filtered segment comprising at least one of an introduction segment, an advertisement segment, and a conclusion segment, a lower quality audio, a less informative audio segment, and instrumental interludes (see Sherwani [0021], which notes, if desired, the user interface 106 can perform various tasks that allow a user to view, navigate and edit A/V content. For example, the user interface can indicate keywords and a summary at step 310, indicate undesirable audio at step 312, allow editing and navigating through the transcript at step 314 and display A/V content associated with the words at step 316. Undesirable audio can include various audio such as long pauses, vocalized noise, filled pauses such as um, ahh, uh, etc., repeats ("I think uh I think that"), false starts (e.g., "podcas-podcasting"), noise and/or profanity).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Hirschberg with the indication of undesirable audio as taught by Sherwani in order to automatically flag and/or delete undesirable audio (see Sherwani [0021], which notes the user interface 106 can perform various tasks that allow a user to view, navigate and edit A/V content. For example, the user interface can indicate keywords and a summary at step 310, indicate undesirable audio at step 312, allow editing and navigating through the transcript at step 314 and display A/V content associated with the words at step 316. Undesirable audio can include various audio such as long pauses, vocalized noise, filled pauses such as um, ahh, uh, etc., repeats ("I think uh I think that"), false starts (e.g., "podcas-podcasting"), noise and/or profanity. Speech recognizer 110 can be used to flag and/or automatically delete this undesirable audio).
The combination of Hirschberg and Sherwani includes predictable results, such as automatic editing of A/V content to remove undesirable audio.

As per claim 11, Hirschberg in view of Sherwani teaches all of the limitations of claim 10 above.
Hirschberg fails to specifically teach wherein the at least one filtered segment is not included in the at least a subset of the plurality of segments.  
However, Sherwani does teach wherein the at least one filtered segment is not included in the at least a subset of the plurality of segments (see Sherwani [0021], which notes speech recognizer 110 can be used to flag and/or automatically delete this undesirable audio).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Hirschberg with the indication of undesirable audio as taught by Sherwani in order to automatically flag and/or delete undesirable audio (see Sherwani [0021], which notes the user interface 106 can perform various tasks that allow a user to view, navigate and edit A/V content. For example, the user interface can indicate keywords and a summary at step 310, indicate undesirable audio at step 312, allow editing and navigating through the transcript at step 314 and display A/V content associated with the words at step 316. Undesirable audio can include various audio such as long pauses, vocalized noise, filled pauses such as um, ahh, uh, etc., repeats ("I think uh I think that"), false starts (e.g., "podcas-podcasting"), noise and/or profanity. Speech recognizer 110 can be used to flag and/or automatically delete this undesirable audio).
The combination of Hirschberg and Sherwani includes predictable results, such as automatic editing of A/V content to remove undesirable audio.

Claims 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Hirschberg (US 8600745 B2) in view of Chik (US 20180308519 A1).  
As per claim 12, Hirschberg teaches all of the limitations of claim 1 above.
Hirschberg fails to specifically teach wherein the method further comprises scoring each segment of the plurality of segments based on at least one of a determined associated relevance relative to the first audio content file, a determined segment cohesiveness, a determined diversity relative to other segments of the plurality of segments, and a determined representativeness relative to the first audio content file.  
However, Chik does teach wherein the method further comprises scoring (see Chik [0004], which notes one aspect of the present disclosure relates to determining highlight segment sets, where the highlight segment sets may be determined based on diversity scores, quality scores, and/or other information. One or more of the segments included in a given highlight segment set may be determined based on a quality score. Other segments included in the given highlight segment set may be determined based on a diversity score and/or other information. As such, the content segments included in a highlight segment set may be selected to be diverse and/or have a given level of quality) each segment of the plurality of segments (see Chik [0021], which notes by way of another non-limiting example, the highlight segment set may include one or more video segments and/or audio segments (e.g., content segments) from one or more sets of video segments, audio segments, and/or images) based on at least one of a determined associated relevance relative to the first audio content file, a determined segment cohesiveness, a determined diversity relative to other segments of the plurality of segments, and a determined representativeness relative to the first audio content file (see Chik [0021], which notes  highlight set component 108 may be configured to determine a first highlight segment set of content segments included in the first content segment set. Determining the first highlight segment set of content segments may include iterating one or more steps for multiple iterations. The one or more steps iterated for multiple iterations may include (a) selecting an individual content segment based on one or more selection criteria, (b) determining one or more diversity scores for content segments in a content segment set that are not yet selected, (c) disqualifying one or more of the content segments in the content segment set based on one or more diversity scores, and/or other steps and/or operations).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by (see Chik [0004], which notes highlight segment sets may be determined based on one or more of diversity scores, quality scores, and/or other information. One or more of the segments included in a given highlight segment set may be determined based on a quality score. Other segments included in the given highlight segment set may be determined based on a diversity score and/or other information. As such, the content segments included in a highlight segment set may be selected to be diverse and/or have a given level of quality. Manual user selection of content segments from content segment sets that should be included in a highlight segment set may be cumbersome, time consuming, and inefficient).
The combination of Hirschberg and Chik includes predictable results, such as automatically presenting segments for review to a user, where the presented segments have high scores based on at least one criteria.

As per claim 13, Hirschberg in view of Chik teaches all of the limitations of claim 12 above.  
Hirschberg fails to specifically teach wherein the at least a subset of the plurality of segments comprises segments of the plurality of segments associated with scores meeting a threshold.  
However, Chik does teach wherein the at least a subset of the plurality of segments comprises segments of the plurality of segments associated with scores meeting a threshold (see Chik [0046], which notes in some implementations, the threshold diversity score may indicate a minimum level of diversity, compared to the selected content segment, a given content segment is required to have not to be disqualified for inclusion in the first highlight segment set).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Hirschberg with the iterations of diversity scoring and quality scoring as taught by Chik in order to determine content segments having high scores for both the diversity and quality criteria (see Chik [0004], which notes highlight segment sets may be determined based on one or more of diversity scores, quality scores, and/or other information. One or more of the segments included in a given highlight segment set may be determined based on a quality score. Other segments included in the given highlight segment set may be determined based on a diversity score and/or other information. As such, the content segments included in a highlight segment set may be selected to be diverse and/or have a given level of quality. Manual user selection of content segments from content segment sets that should be included in a highlight segment set may be cumbersome, time consuming, and inefficient).
The combination of Hirschberg and Chik includes predictable results, such as automatically presenting segments for review to a user, where the presented segments have high scores based on at least one criteria.

Claims 14-15, 17-19, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Hirschberg (US 8600745 B2) in view of Arngren (US 20130226930 A1).  

Hirschberg fails to specifically teach wherein the method further comprises: receiving a request from a querying system for a shorter-form content file associated with the first audio content file; and transmitting a response to the querying system comprising the second audio content file.
However, Arngren does teach wherein the method further comprises: 
receiving a request from a querying system (see Arngren [0030], which notes an application of 110 may include an interface for receiving a query (e.g., keywords), where application 110 may include a graphical user interface configured to receive a query from a user or where the interface may be a programmatic interface configured to receive a query from another application/system (e.g., locally or from across a network); see Arngren [0077], which notes the query method 600 begins in 605 when the query logic receives the query. As described above, the query may comprise one or more keywords and/or multimedia content. For example, a user may query the database for videos that are similar to a given query video; see Arngren [0078], which notes in 610, the query logic identifies one or more keywords from the query. If the query is provided in the form of text, then identifying the keywords in 610 may simply involve identifying the keywords indicated by the text. If the query is provided as a multimedia file (e.g., another movie, a picture, etc.), then identifying the keywords in 610 may comprise extracting features from the query multimedia and identifying keywords associated with the extracted features, where feature extraction may be performed similarly to the feature extraction used to create the index (e.g., 220-230 of FIG. 2)) for a shorter-form content file associated with the first audio content file (see Arngren [0057], which notes the system may first decompose the video [first audio content file] into segments [shorter-form content files] according to various criteria and create a segment data structure (e.g., 410) for each segment, where the system is pre-configured to segment video into five-second segments (although any static or dynamic criteria may be used as discussed above) and therefore segments the ten-second video into two five-second segments, so that the system therefore creates a separate segment data structure 410 for each segment); and 
transmitting a response to the querying system comprising the second audio content file (see Arngren [0079], which notes in 615 the query logic uses the index to identify files and/or segments that are most relevant to the identified query keywords, and which notes in 620 the query logic returns an indication of the most relevant files and/or segments identified in 615. In some embodiments, the query logic may return the files and/or segments themselves, where each most-relevant segment is a shorter-form content file).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the secondary indicators of content of the systems and methods as taught by Hirschberg with the search indexes based on primary content of multimedia files as taught by Arngren to facilitate effective search and retrieval of multimedia content (see Arngren [0022], which notes traditional search engines are most naturally suited to finding textual content that is relevant to textual keywords. However, as the amount and importance of multimedia content increases, new techniques must be developed to facilitate search and retrieval of multimedia content. Traditional techniques have attempted to correlate keywords with textual metadata associated with a multimedia file (e.g., filename), but such approaches are inherently limited because they search only secondary indicators and ignore the primary content of those multimedia files; and see Arngren [0023], which notes a search engine may be configured to facilitate effective search and retrieval of multimedia content by building a search index based on primary content of multimedia files. In some embodiments, an indexer may segment each multimedia file into a plurality of segments (e.g., time segments of a movie), and for each segment, identify various "features" present in the segment. The different features may be of different media types, such as a visual feature (e.g., image of a bird), an audio feature (e.g., call of the bird), a textual feature (e.g., image text describing the bird), a voice feature (e.g., narration describing the bird), or others. For each identified feature, the indexer may employ various algorithmic techniques to identify one or more respective keywords that correspond to that feature. For each keyword identified for a given segment, the indexer may increase the relevance of the keyword (with relation to the segment) according to various parameters, such as a predefined weight given to features of that type, a determined degree of prominence of that feature in the segment, and/or other considerations).
The combination of Hirschberg and Arngren includes predictable results, such as a generating of a rank-ordered list based on segment media types, of all possible topics and corresponding scores.

As per claim 15, Hirschberg teaches all of the limitations of claim 1 above.

However, Arngren does teach wherein the second audio content file (see Arngren [0079], which notes identified files) further comprises audio content (see Arngren [0079], which notes identified segments) corresponding to at least a subset of a plurality of segments (see Arngren [0078], which notes identifying the keywords in 610 may comprise extracting features from the query multimedia, where feature extraction may be performed similarly to the feature extraction used to create the index (e.g., 220-230 of FIG. 2); see Arngren [0038], which notes for each segment of the file, the method identifies relevant keywords by executing 220 to 230) identified in a third audio content file (and see Arngren [0078], which notes a query is provided as a multimedia file).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the secondary indicators of content of the systems and methods as taught by Hirschberg with the search indexes based on primary content of multimedia files as taught by Arngren to facilitate effective search and retrieval of multimedia content (see Arngren [0022], which notes traditional search engines are most naturally suited to finding textual content that is relevant to textual keywords. However, as the amount and importance of multimedia content increases, new techniques must be developed to facilitate search and retrieval of multimedia content. Traditional techniques have attempted to correlate keywords with textual metadata associated with a multimedia file (e.g., filename), but such approaches are inherently limited because they search only secondary indicators and ignore the primary content of those multimedia files; and see Arngren [0023], which notes a search engine may be configured to facilitate effective search and retrieval of multimedia content by building a search index based on primary content of multimedia files. In some embodiments, an indexer may segment each multimedia file into a plurality of segments (e.g., time segments of a movie), and for each segment, identify various "features" present in the segment. The different features may be of different media types, such as a visual feature (e.g., image of a bird), an audio feature (e.g., call of the bird), a textual feature (e.g., image text describing the bird), a voice feature (e.g., narration describing the bird), or others. For each identified feature, the indexer may employ various algorithmic techniques to identify one or more respective keywords that correspond to that feature. For each keyword identified for a given segment, the indexer may increase the relevance of the keyword (with relation to the segment) according to various parameters, such as a predefined weight given to features of that type, a determined degree of prominence of that feature in the segment, and/or other considerations.).
The combination of Hirschberg and Arngren includes predictable results, such as a generating of a rank-ordered list based on segment media types, of all possible topics and corresponding scores.

As per claim 17, Hirschberg teaches all of the limitations of claim 16 above.
Hirschberg fails to specifically teach wherein the method further comprises: receiving a search request from a querying system comprising one or more query terms; and determining a 
However, Arngren does teach wherein the method further comprises: 
receiving a search request from a querying system comprising one or more query terms (see Arngren [0030], which notes an application of 110 may include an interface for receiving a query (e.g., keywords), where application 110 may include a graphical user interface configured to receive a query from a user or where the interface may be a programmatic interface configured to receive a query from another application/system (e.g., locally or from across a network); see Arngren [0077], which notes the query method 600 begins in 605 when the query logic receives the query. As described above, the query may comprise one or more keywords and/or multimedia content. For example, a user may query the database for videos that are similar to a given query video; see Arngren [0078], which notes in 610, the query logic identifies one or more keywords from the query. If the query is provided in the form of text, then identifying the keywords in 610 may simply involve identifying the keywords indicated by the text. If the query is provided as a multimedia file (e.g., another movie, a picture, etc.), then identifying the keywords in 610 may comprise extracting features from the query multimedia and identifying keywords associated with the extracted features, where feature extraction may be performed similarly to the feature extraction used to create the index (e.g., 220-230 of FIG. 2)); and 
determining a relevance of at least one of the one or more query terms relative to at least one of segment of the at least a subset of the plurality of segments (see Arngren [0079], which notes in 615 the query logic uses the index to identify files and/or segments that are most relevant to the identified query keywords, and which notes in 620 the query logic returns an indication of the most relevant files and/or segments identified in 615. In some embodiments, the query logic may return the files and/or segments themselves, where the returned segments are a shorter-form content file).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the secondary indicators of content of the systems and methods as taught by Hirschberg with the search indexes based on primary content of multimedia files as taught by Arngren to facilitate effective search and retrieval of multimedia content (see Arngren [0022], which notes traditional search engines are most naturally suited to finding textual content that is relevant to textual keywords. However, as the amount and importance of multimedia content increases, new techniques must be developed to facilitate search and retrieval of multimedia content. Traditional techniques have attempted to correlate keywords with textual metadata associated with a multimedia file (e.g., filename), but such approaches are inherently limited because they search only secondary indicators and ignore the primary content of those multimedia files; and see Arngren [0023], which notes a search engine may be configured to facilitate effective search and retrieval of multimedia content by building a search index based on primary content of multimedia files. In some embodiments, an indexer may segment each multimedia file into a plurality of segments (e.g., time segments of a movie), and for each segment, identify various "features" present in the segment. The different features may be of different media types, such as a visual feature (e.g., image of a bird), an audio feature (e.g., call of the bird), a textual feature (e.g., image text describing the bird), a voice feature (e.g., narration describing the bird), or others. For each identified feature, the indexer may employ various algorithmic techniques to identify one or more respective keywords that correspond to that feature. For each keyword identified for a given segment, the indexer may increase the relevance of the keyword (with relation to the segment) according to various parameters, such as a predefined weight given to features of that type, a determined degree of prominence of that feature in the segment, and/or other considerations).
The combination of Hirschberg and Arngren includes predictable results, such as a generating of a rank-ordered list based on segment media types, of all possible topics and corresponding scores.
 
As per claim 18, Hirschberg in view of Arngren teaches all of the limitations of claim 17 above.
Hirschberg fails to specifically teach wherein the method further comprises transmitting a response to the querying system comprising the second audio content file based on the determined relevance.
However, Arngren does teach wherein the method further comprises transmitting a response to the querying system comprising the second audio content file based on the determined relevance (see Arngren [0079], which notes in 615 the query logic uses the index to identify files and/or segments that are most relevant to the identified query keywords, and which notes in 620 the query logic returns an indication of the most relevant files and/or segments identified in 615. In some embodiments, the query logic may return the files and/or segments themselves, where the returned segments are a shorter-form content file).
 to facilitate effective search and retrieval of multimedia content (see Arngren [0022], which notes traditional search engines are most naturally suited to finding textual content that is relevant to textual keywords. However, as the amount and importance of multimedia content increases, new techniques must be developed to facilitate search and retrieval of multimedia content. Traditional techniques have attempted to correlate keywords with textual metadata associated with a multimedia file (e.g., filename), but such approaches are inherently limited because they search only secondary indicators and ignore the primary content of those multimedia files; and see Arngren [0023], which notes a search engine may be configured to facilitate effective search and retrieval of multimedia content by building a search index based on primary content of multimedia files. In some embodiments, an indexer may segment each multimedia file into a plurality of segments (e.g., time segments of a movie), and for each segment, identify various "features" present in the segment. The different features may be of different media types, such as a visual feature (e.g., image of a bird), an audio feature (e.g., call of the bird), a textual feature (e.g., image text describing the bird), a voice feature (e.g., narration describing the bird), or others. For each identified feature, the indexer may employ various algorithmic techniques to identify one or more respective keywords that correspond to that feature. For each keyword identified for a given segment, the indexer may increase the relevance of the keyword (with relation to the segment) according to various parameters, such as a predefined weight given to features of that type, a determined degree of prominence of that feature in the segment, and/or other considerations).
The combination of Hirschberg and Arngren includes predictable results, such as a generating of a rank-ordered list based on segment media types, of all possible topics and corresponding scores.

As per claim 19, Hirschberg in view of Arngren teaches all of the limitations of claim 18 above.
Hirschberg fails to specifically teach wherein the method further comprises scoring each segment of the plurality of segments, and wherein the determined relevance is based on a score of the at least one segment of the at least a subset of the plurality of segments.
However, Arngren does teach wherein the method further comprises scoring each segment of the plurality of segments, and wherein the determined relevance is based on a score of the at least one segment of the at least a subset of the plurality of segments (see Arngren [0075], which notes STOC 520 is an example of a static table of contents created from SSO table 500 and one or more other SSO objects relating to a separate file, where the STOC can be queried with keywords to identify relevant files and segments. For example, suppose a user queries for the keyword "Lisa." In response, the application may perform a lookup in STOC 520 to find all entries associated with the keyword "Lisa" and find relevance in files xxx and yyy. The application may then find the SSO structures associated with files xxx and yyy and determine and rank the relevance of each segment in those files to the keyword Lisa. For example, the application may use STOC 520 to determine that the most relevant segment is segment 3 of file yyy, then segment 1 of file xxx, and finally segment 2 of file xxx. The application may then return the results ranked in that order).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the secondary indicators of content of the systems and methods as taught by Hirschberg with the search indexes based on primary content of multimedia files as taught by Arngren to facilitate effective search and retrieval of multimedia content (see Arngren [0022], which notes traditional search engines are most naturally suited to finding textual content that is relevant to textual keywords. However, as the amount and importance of multimedia content increases, new techniques must be developed to facilitate search and retrieval of multimedia content. Traditional techniques have attempted to correlate keywords with textual metadata associated with a multimedia file (e.g., filename), but such approaches are inherently limited because they search only secondary indicators and ignore the primary content of those multimedia files; and see Arngren [0023], which notes a search engine may be configured to facilitate effective search and retrieval of multimedia content by building a search index based on primary content of multimedia files. In some embodiments, an indexer may segment each multimedia file into a plurality of segments (e.g., time segments of a movie), and for each segment, identify various "features" present in the segment. The different features may be of different media types, such as a visual feature (e.g., image of a bird), an audio feature (e.g., call of the bird), a textual feature (e.g., image text describing the bird), a voice feature (e.g., narration describing the bird), or others. For each identified feature, the indexer may employ various algorithmic techniques to identify one or more respective keywords that correspond to that feature. For each keyword identified for a given segment, the indexer may increase the relevance of the keyword (with relation to the segment) according to various parameters, such as a predefined weight given to features of that type, a determined degree of prominence of that feature in the segment, and/or other considerations).
The combination of Hirschberg and Arngren includes predictable results, such as a generating of a rank-ordered list based on segment media types, of all possible topics and corresponding scores.

As per claim 21, Hirschberg in view of Arngren teaches all of the limitations of claim 18 above.
Hirschberg teaches wherein the response to the user selection further comprises the second text file (see Hirschberg col. 4., lines 6-19, which notes one or more speech file(s)/first and third audio content files are received in step 310, ASR is performed on such speech file(s) in step 320, the speech file(s) are indexed in step 330, a transcript of the indexed speech file(s) is provided to a user in step 340, the user's selection of one or more portion(s) of the speech file(s)/first and third audio content files’ transcript is received in step 350, the respective selected portion(s) of speech file transcript is provided to one or more entities or parties specified by user in step 360).
Hirschberg fails to specifically teach wherein the response to the querying system further comprises the second text file.
However, Arngren does teach wherein the response to the querying system further comprises a text file (see Arngren [0079], which notes in 615 the query logic uses the index to identify files and/or segments that are most relevant to the identified query keywords, and which notes in 620 the query logic returns an indication of the most relevant files and/or segments identified in 615. In some embodiments, the query logic may return the files and/or segments themselves; see Arngren [0044], which notes in 315, the method determines a weight associated with the feature's media [file] type, where in some embodiments, each type of media (e.g., image, sound, text, voice, etc.) may be associated with a given weight that reflects how important that type of feature is to the relevance of a keyword; and see Arngren FIG. 5, which notes SSO table 500 shows a media type of “text” and weight table 510 shows a weight of “5” given for the media type of text, so that a file of media type “text” will be returned when the initial relevance ranking is comparatively high; and see Arngren [0030], which notes an application of 110 may include an interface for receiving a query, where application 110 may include a graphical user interface configured to receive a query from a user or where the interface may be a programmatic interface configured to receive a query from another application (e.g., a system from across a network)).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the secondary indicators of content of the systems and methods as taught by Hirschberg with the search indexes based on primary content of multimedia files as taught by Arngren to facilitate effective search and retrieval of multimedia content (see Arngren [0022], which notes traditional search engines are most naturally suited to finding textual content that is relevant to textual keywords. However, as the amount and importance of multimedia content increases, new techniques must be developed to facilitate search and retrieval of multimedia content. Traditional techniques have attempted to correlate keywords with textual metadata associated with a multimedia file (e.g., filename), but such approaches are inherently limited because they search only secondary indicators and ignore the primary content of those multimedia files; and see Arngren [0023], which notes a search engine may be configured to facilitate effective search and retrieval of multimedia content by building a search index based on primary content of multimedia files. In some embodiments, an indexer may segment each multimedia file into a plurality of segments (e.g., time segments of a movie), and for each segment, identify various "features" present in the segment. The different features may be of different media types, such as a visual feature (e.g., image of a bird), an audio feature (e.g., call of the bird), a textual feature (e.g., image text describing the bird), a voice feature (e.g., narration describing the bird), or others. For each identified feature, the indexer may employ various algorithmic techniques to identify one or more respective keywords that correspond to that feature. For each keyword identified for a given segment, the indexer may increase the relevance of the keyword (with relation to the segment) according to various parameters, such as a predefined weight given to features of that type, a determined degree of prominence of that feature in the segment, and/or other considerations).
The combination of Hirschberg and Arngren includes predictable results, such as a generating of a rank-ordered list based on segment media types, of all possible topics and corresponding scores.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Hirschberg (US 8600745 B2) in view of Arngren (US 20130226930 A1) and in further view of Surendran (US 20100318537 A1).  

As per claim 20, Hirschberg in view of Arngren teaches all of the limitations of claim 19 above.
Hirschberg in view of Arngren fails to specifically teach wherein the method further comprises generating a content graph associated with the first audio content file based on the one or more words extracted from the first text file, and wherein determining the relevance of the at least one of the one or more query terms relative to the at least one segment is further based on a comparison between the content graph and the at least one of the one or more query terms.

However, Surendran does teach wherein the method further comprises generating a content graph (see Surendran [0044], which notes the content graph of the knowledge content database 254 may include one or more attributes of entities, attributes comprising keywords, metadata, meanings, associations, properties, content, query, query results, annotation, and semantified data entities; and see Surendran [0047], which notes the knowledge content database 254 may additionally include a semantified data component (not shown) configured for storing semantified data in relational tables or in graph tables. Semantified data may be managed by the dimensional indexing sub-component 244 and/or other relational database managers (not shown)), and 
(see Surendran [0044], which notes keywords entered into a query may be used to determine a best match within the knowledge base that corresponds best with the intended meaning behind the query. To do this, documents and queries are analyzed to discern the entities, relationships and facts contained in the documents. For example, a keyword phrase "President of the United States" may be understood as referencing knowledge related to a position of political office, but it may also be understood as referencing knowledge related to the country of the United States of America. Additionally, the keyword phrase may reference a series of time periods associated with past presidents, and/or it may reference a time period (e.g., 4 years) that comprises one term of a presidency. In order to present these relationships to the user, the keyword query may be translated to a query graph. The query graph may be a sub-graph that matches against a series of graphs in the knowledge database. The query graph may be presented to the user in response to a keyword query. The query graph may contain the type of references described above, and may be accessed using a pivot table functionality).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Hirschberg with the pivot table-based access of a query graph as taught by Surendran in order to increase the relationship between objects on the knowledge graph if it is determined that those relationships are tied to the search query (see Surendran [0010], which notes one example that may be used to distinguish the use of a pivot table over a general search engine includes the way in which a search query is interpreted using the system architecture 200. Similar to prior methods of searching, a user may input a search query comprising one or more keywords. In prior methods, the search query is then matched against a set of documents on an inverted index. In embodiments of the present invention, the inverted index may be replaced by a more powerful "pivot table" to pivot around large numbers of objects. In contrast to a search results ranking, the relationship between objects on the knowledge graph may be increased if it is determined that those relationships are tied to the search query. The user may then be presented with results to his or her search query.).
The combination of Hirschberg and Surendran includes predictable results, such as determining a best match within the knowledge base that corresponds best with the intended meaning behind the query.



Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Hirschberg (US 8600745 B2) in view of Arngren (US 20130226930 A1) and in further view of Fellenstein (US 20040062367 A1).  
As per claim 22, Hirschberg in view of Arngren teaches all of the limitations of claim 18 above.
Hirschberg in view of Arngren fails to specifically teach wherein the response to the querying system further comprises metadata associated with the second audio content file.
(and see Fellenstein Abstract, which notes the metadata includes an audible introduction which a user records and associates the audible introduction to a voicemail; see Fellenstein [0033], which notes the metadata manager 130 processes metadata information 175 and stores a voicemail message with metadata 180 in saved message store 190.  The voicemail message with metadata 180 includes the original voicemail message, the audible introduction, and may include other types of metadata. The saved message store 190 may be stored on a non-volatile storage area, such as a computer hard drive, where a user 100 may access the voicemail message at a later time by first listening to the audible introduction and/or searching the metadata; and see Fellenstein [0035], which notes user 200 sends request 205 to voicemail manager 220 through phone network 210, where the request 205 includes a request to receive an audible introduction associated with a voicemail mail message, and where the request 205 may also include metadata search criteria that instructs metadata manager 225 to retrieve audible introductions that are associated with the search criteria).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Hirschberg with the indication of metadata-based audible introduction as taught by Fellenstein in order to personally create and automatically retrieve voicemail summaries (see Fellenstein [0010], which notes associating metadata to a voicemail message and using the metadata to manage the voicemail message. A user listens to a voicemail message and associates an audible introduction to the voicemail message. The user is also able to associate other types of metadata to the voicemail message in order to manage voicemail retrieval using a particular search criterion; and see Fellenstein [0011], which notes a scenario in which a caller leaves a message for the user on the user's voicemail system. The user accesses his voicemail system, listens to the voicemail message, and determines whether he wishes to associate an audible introduction with the voicemail message. For example, the user may have stepped out of a meeting to check his voicemail and may not have time at the moment to respond to the voicemail message. Another example is that the message may be lengthy and the user may wish to add an audible introduction which summarizes the voicemail message).
The combination of Hirschberg and Fellenstein includes predictable results, such as retrieval of voicemail messages using metadata as search criteria.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK R HENNINGS whose telephone number is (571) 272-9676. The examiner can normally be reached on Monday-Friday 8:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/MARK HENNINGS/
Examiner, Art Unit 2659

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659