DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
The information disclosure statement (IDS) submitted on 27 August 2019 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.


Claim Objections
Claim 3 is objected to as being dependent upon itself. Correction is required.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1 and 7-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by US 20080270138, hereinafter referred to as Knight et al.

claim 1, Knight et al. discloses a method for searching an audio recording using text, the method comprising: 

accepting a text search query (“In an operation 120, a search query is received from a user of the rich media content search system. The search query can be any type of query known to those of skill in the art. For example, the search query can be one or more words entered as text,” Knight et al., para [0043].); 

converting the text search query to a phonetic representation of the text search query (“In another exemplary embodiment, the system can generate the audio content index by converting the output of an automatic speech recognition (ASR) algorithm into phonetic data using a phonetic data algorithm. In an exemplary embodiment, correlated, time-stamped textual content can be used in conjunction with the ASR algorithm such that the accuracy and reliability of recognized words can be increased,” Knight et al., para [0048].); 

searching over an ASR index created for an audio file using the text search query to produce ASR search results wherein the ASR index comprises textual representations of words (“In another exemplary embodiment, the system can generate the audio content index by converting the output of an automatic speech recognition (ASR) algorithm into phonetic data using a phonetic data algorithm. In an exemplary embodiment, correlated, time-stamped textual content can be used in conjunction with the ASR algorithm such that the accuracy and reliability of recognized words can be increased,” Knight et al., para [0048].), each textual representation associated with a confidence score (“In an operation 250, a time-stamped transcript of the audio content ASR algorithm can also create a word lattice which includes word hypotheses, word times, word scores, and/or transition data regarding different paths used during the HMM evaluation,” Knight et al., para [0066]. A word score is interpreted as a confidence score.); 

searching over a phonetic representation of the audio file using the phonetic representation of the text search query to produce phonetic search results (“Phoneme matching can be implemented in a forward direction starting at the audio content index starting location and a backward direction starting at the audio content index starting location. In an exemplary embodiment, a score can be assigned to potential matches as the phoneme matching is being implemented. As such, each potential match can receive a score for matching in the forward direction and a score for matching in the backward direction. A composite score for the potential match can be obtained by summing the forward direction score and backward direction score. In an exemplary embodiment, if a score in either direction is less than a predetermined threshold, the matching process can be aborted and the phoneme matching and scoring algorithm can move on and begin performing one or more matches at the next audio content index starting location,” Knight et al., para [0099].); and 

FIG. 6 is a flow diagram illustrating operations performed by the system during the creation of audio content search results in accordance with an exemplary embodiment. Additional, fewer, or different operations may be performed in alternative embodiments. In an operation 600, an audio content index starting location is selected from the list of audio content index starting locations. The selected audio content index starting location can be any of the audio content index starting locations identified during the comparison of the extracted k-phoneme index to the k-phoneme search query index. In an exemplary embodiment, the selected audio content index starting location can be the starting location with the earliest timestamp. Alternatively, the audio content index starting location can be randomly selected or selected using any other criteria,” Knight et al., para [0096]. Thus, the returned search results depend on phonetic searching. See steps 600-615 of Knight et al., fig. 6.).  

Regarding claim 7, Knight et al. discloses a method for searching an audio recording using text, the method comprising: 

accepting a text search query comprising a plurality of words (“In an operation 120, a search query is received from a user of the rich media content search system. The search query can be any type of query known to those of skill in the art. For example, the search query can be one or more words entered as text,” Knight et al., para [0043].); 

searching over an ASR index created for an audio recording using the text search query to produce ASR search results (“In another exemplary embodiment, the system can generate the audio content index by converting the output of an automatic speech recognition (ASR) algorithm into phonetic data using a phonetic data algorithm. In an exemplary embodiment, correlated, time-stamped textual content can be used in conjunction with the ASR algorithm such that the accuracy and reliability of recognized words can be increased,” Knight et al., para [0048].), the ASR search results comprising words, each word associated with a confidence score (“In an operation 250, a time-stamped transcript of the audio content is created based on the HMM evaluation and the results evaluation and refinement processes. The time-stamped transcript can be a best guess of the most likely sequence of words included within the audio content. The time-stamped transcript can include the starting time and ending time for each word within the transcript. In an exemplary embodiment, the ASR algorithm can also create a word lattice which includes word hypotheses, word times, word scores, and/or transition data regarding different paths used during the HMM evaluation,” Knight et al., para [0066]. A word score is interpreted as a confidence score.); 

for each of the words comprised in the ASR search results associated with a confidence score below a threshold and having one or more preceding words in the ASR index and one or more subsequent words in the ASR index, searching over a phonetic representation of the audio recording for the word associated with a confidence score below the threshold where it occurs in the audio recording after the one or more preceding words and in the audio recording before the one or more subsequent words, to produce phonetic search results (“Phoneme matching can be implemented in a forward direction starting at the audio content index starting location and a backward direction starting at the audio content index starting location. In an exemplary In an exemplary embodiment, if a score in either direction is less than a predetermined threshold, the matching process can be aborted and the phoneme matching and scoring algorithm can move on and begin performing one or more matches at the next audio content index starting location,” Knight et al., para [0099].); and 
returning as search results ASR search results and phonetic search results (“FIG. 6 is a flow diagram illustrating operations performed by the system during the creation of audio content search results in accordance with an exemplary embodiment. Additional, fewer, or different operations may be performed in alternative embodiments. In an operation 600, an audio content index starting location is selected from the list of audio content index starting locations. The selected audio content index starting location can be any of the audio content index starting locations identified during the comparison of the extracted k-phoneme index to the k-phoneme search query index. In an exemplary embodiment, the selected audio content index starting location can be the starting location with the earliest timestamp. Alternatively, the audio content index starting location can be randomly selected or selected using any other criteria,” Knight et al., para [0096]. Thus, the returned search results depend on phonetic searching. See steps 600-615 of Knight et al., fig. 6.).  
claim 14, system claim 14 and method claim 7 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 14 is similarly rejected under the same rationale as applied above with respect to method claim. And, Knight et al., para [0010] teach memory, a processor, and computer code. 

Regarding claim 8, Knight et al. discloses the method of claim 7, comprising searching over a phonetic representation of the audio file before the end of a preceding word and after the beginning of a subsequent word, to produce phonetic search results (“Phoneme matching can be implemented in a forward direction starting at the audio content index starting location and a backward direction starting at the audio content index starting location. In an exemplary embodiment, a score can be assigned to potential matches as the phoneme matching is being implemented. As such, each potential match can receive a score for matching in the forward direction and a score for matching in the backward direction. A composite score for the potential match can be obtained by summing the forward direction score and backward direction score. In an exemplary embodiment, if a score in either direction is less than a predetermined threshold, the matching process can be aborted and the phoneme matching and scoring algorithm can move on and begin performing one or more matches at the next audio content index starting location,” Knight et al., para [0099].)  fig. 7A).  
As to claim 15, system claim 15 and method claim 8 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 15 is similarly rejected under the same rationale as applied above 

Regarding claim 9, Knight et al. discloses the method of claim 7 wherein the phonetic representation and ASR index are comprised within a composite index (Knight et al., para [0094]-[0095]. The ASR index (composite index) is derived from the phonetic index (Knight et al., fig. 4(445). This corresponds to the applicant’s fig. 2.), and wherein the phonetic representation represents only30Attorney Docket No.: P-587655-US portions of the audio file comprising words associated with a confidence score below the threshold (“As shown by threshold 320, words or phrases found by a phonetic search and assigned, or associated with, a score lower than a threshold (e.g., a score lower than 81) may be excluded from, or ignored in, further processing. As shown by block 325, a group or list of words or phrases that includes words and phrases produced by the ASR search may be created wherein the list is created or generated based on the threshold. For example, the word "computers" may be included in the list shown by block 325 since the phonetic score for this word is 0.94 which is higher than a threshold of 0.8 but the word "complain" may be excluded from the list since its phonetic score is 0.78 which is lower than the threshold of 0.8,” Morris et al., para [0051].).  
As to claim 16, system claim 16 and method claim 9 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 16 is similarly rejected under the same rationale as applied above with respect to method claim. And, Knight et al., para [0010] teach memory, a processor, and computer code. 

Regarding claim 10, Knight et al. discloses the method of claim 7 wherein the phonetic representation and ASR index are comprised within a composite index (Knight et al., para [0094]-[0095]. The ASR index (composite index) is derived from the phonetic index (Knight et al., fig. 4(445). This corresponds to the applicant’s fig. 2.), and wherein the phonetic representation represents portions of the audio file comprising words associated with a confidence score below the threshold and an overlap portion including words associated with a confidence score not below the threshold (Knight et al., para [0121]. See also Knight et al., fig. 6(630).).  
As to claim 17, system claim 17 and method claim 10 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 17 is similarly rejected under the same rationale as applied above with respect to method claim. And, Knight et al., para [0010] teach memory, a processor, and computer code. 

Regarding claim 11, Knight et al. discloses the method of claim 7, wherein the confidence score indicates the confidence that the word accurately represents the corresponding word in the audio recording (“In an operation 640, the system calculates a confidence score for each valid sequence in the list of valid sequences. In an exemplary embodiment, a confidence score can be any score capable of indicating the likelihood that a given valid sequence is a true occurrence of the search query within the audio content and not a false positive. Confidence scores can be used to sort audio content time segments by relevance and/or to compare audio content search results with individual textual content search results and/or individual visual content search results,” Knight et al., para [0122].).  
As to claim 18, system claim 18 and method claim 11 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 18 is similarly rejected under the same rationale as applied above with respect to method claim. And, Knight et al., para [0010] teach memory, a processor, and computer code. 

Regarding claim 12, Knight et al. discloses the method of claim 7, wherein the search results comprise a location in the audio recording corresponding to the text search query (“In an exemplary embodiment, correlated, time-stamped textual content can be used in conjunction with the ASR algorithm such that the accuracy and reliability of recognized words can be increased. The correlated, time-stamped textual content can provide the ASR algorithm with clues regarding the likelihood that a particular word is contained within audio content corresponding to the rich media presentation,” Knight et al., para [0048].).  
As to claim 19, system claim 19 and method claim 12 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 19 is similarly rejected under the same rationale as applied above with respect to method claim. And, Knight et al., para [0010] teach memory, a processor, and computer code. 

claim 13, Knight et al. discloses the method of claim 7, wherein searching over the ASR index comprises: 

converting the text search query to a phoneme representation (“In an exemplary embodiment, the system can use the time-stamped transcript created by the ASR algorithm to create a phoneme-based audio content index corresponding to the audio content time segment. In an operation 255, the system uses a phonetic data algorithm to determine a phonetic pronunciation for words in the time-stamped transcript,” Knight et al., para [0068]. Also, “In an operation 260, the system uses the phonetic data algorithm to assemble a phoneme sequence corresponding to the time-stamped transcript. The phoneme sequence can include the phonemes determined in operation 255, the location (or order) of each phoneme within each word, and/or one or more timestamps associated with each phoneme or word,” Knight et al., para [0069].); and 

using the phoneme representation to access a phoneme sequence lookup table, to return an index to the ASR index (“In another alternative embodiment, the audio content index may not be in the form of a lookup table. For example, the audio content index can be in the form of any other data structure which can be used by the system to efficiently locate phonemes which occur in an audio content time segment,” Knight et al., para [0079].).  
As to claim 20, system claim 20 and method claim 13 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 20 is similarly rejected under the same rationale as . 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20080270138, hereinafter referred to as Knight et al., in view of US 20180182378, hereinafter referred to as Morris et al.

Regarding claim 2, Knight et al. discloses the method of claim 1, but not wherein the phonetic representation represents portions of the audio file corresponding to low confidence scores. Morris et al. is cited to disclose. Morris et al. is cited to disclose wherein the phonetic representation represents portions of the audio file corresponding to low confidence scores (“As shown by threshold 320, words or phrases found by a phonetic search and assigned, or associated with, a score lower than a threshold (e.g., a score lower than 81) may be excluded from, or ignored in, further processing. As shown by block 325, a group or list of words or phrases that includes words and phrases produced by the ASR search may be created wherein the list is created or generated based on the threshold. For example, the word "computers" may be included in the list shown by block 325 since the phonetic score for this word is 0.94 which is higher than a threshold of 0.8 but the word "complain" may be excluded from the list since its phonetic score is 0.78 which is lower than the threshold of 0.8,” Morris et al., para [0051].). Morris et al. benefits Knight et al. by providing a method for avoiding false alarms in speech recognition caused by phonetically similar phrases (Morris et al., para [0003]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Knight et al. with those of Morris et al. to enhance the audio content search engine of Knight et al.  

Regarding claim 3, Knight et al., as modified by Morris et al., discloses the method of claim 3 wherein the phonetic representation and ASR index are comprised within a composite index (Knight et al., para [0094]-[0095]. The ASR index (composite index) is derived from the phonetic index (Knight et al., fig. 4(445). This corresponds to the applicant’s fig. 2.) and a comparison between the phoneme index and the phoneme search query index.), and wherein the phonetic representation represents only portions of the audio file comprising words associated with a confidence score below the threshold (“As shown by threshold 320, words or phrases found by a phonetic search and assigned, or associated with, a score lower than a threshold (e.g., a score lower than 81) may be excluded from, or ignored in, further processing. As shown by block 325, a group or list of words or phrases that includes words and phrases produced by the ASR search may be created wherein the list is created or generated based on the threshold. For example, the word "computers" may be included in the list shown by but the word "complain" may be excluded from the list since its phonetic score is 0.78 which is lower than the threshold of 0.8,” Morris et al., para [0051].).  

Regarding claim 4, Knight et al., as modified by Morris et al., discloses the method of claim 3 wherein the phonetic representation and ASR index are comprised within a composite index (Knight et al., para [0094]-[0095]. The ASR index (composite index) is derived from the phonetic index (Knight et al., fig. 4(445). This corresponds to the applicant’s fig. 2.), and wherein the phonetic representation represents portions of the audio file comprising words associated with a confidence score below the threshold and an overlap portion including words associate with a confidence score not below the threshold (Knight et al., para [0121]. See also Knight et al., fig. 6(630).).  

Regarding claim 5, Knight et al., as modified by Morris et al., discloses the method of claim 3, wherein the confidence score indicates the confidence that the word accurately represents the corresponding word in the audio recording (“In an operation 640, the system calculates a confidence score for each valid sequence in the list of valid sequences. In an exemplary embodiment, a confidence score can be any score capable of indicating the likelihood that a given valid sequence is a true occurrence of the search query within the audio content and not a false positive. Confidence scores can be used to sort audio content time segments by relevance and/or to compare audio content search results with individual textual content search results and/or individual visual content search results,” Knight et al., para [0122].).  

Regarding claim 6, Knight et al., as modified by Morris et al., discloses the method of claim 3, wherein the search results comprise a location in the audio recording corresponding to the text search query (“In an exemplary embodiment, correlated, time-stamped textual content can be used in conjunction with the ASR algorithm such that the accuracy and reliability of recognized words can be increased. The correlated, time-stamped textual content can provide the ASR algorithm with clues regarding the likelihood that a particular word is contained within audio content corresponding to the rich media presentation,” Knight et al., para [0048].).  


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure and listed in form 892. In particular, the examiner notes US 20120116766 (Wasserblat et al.), US 20110295605 (Lin), and US 20090043581 (Abbott et al.), and US 20020052870 (Charlesworth et al.).  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre L Desir can be reached on 5712727799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656