DETAILED ACTION
1.	This communication is in response to the Application filed on 8/16/2019. Claims 1-13 are pending and have been examined.
Allowable Subject Matter
2.	Claims 5, 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 103
3.	Claims 1-2, 4, 7-8, 10, 13 are rejected under 35 U.S.C. 103 as being unpatentable over Ellozy, et al. (US 5649060; hereinafter ELLOZY) in view of Lee, et al. (US 20070156404; hereinafter LEE).
As per claim 1, ELLOZY (Title: Automatic indexing and aligning of audio and text using speech recognition) discloses “A method for matching a speech with a text (ELLOZY, Abstract, The automatic speech recognizer decodes speech .. and produces a file with a decoded text. This decoded text is then matched with the original written transcript), comprising:    
acquiring a speech identification text by identifying a received speech signal (ELLOZY, Abstract, The automatic speech recognizer decodes speech .. and produces a file with a decoded text); 
5comparing the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text (ELLOZY, Abstract, This decoded text is then matched with the original written transcript via identification of similar words or clusters of words <read on a ready mechanism matching to any of a multiple candidate texts> .. The results of this matching is an alignment of the speech with the original transcript); and 
[ comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts ] in a second matching mode to determine a second matching text, in response to not determining the first matching text (ELLOZY, [col. 9, lines 23-28], Match this phonetic string with the decoded text DTi .. Use rules (or a table) to produce a phonetic string for the reference text Ti. Then consider words in texts DTi and Ti as being in the same place if they are surrounded by similar phonetic substrings; Abstract, The results of this matching is an alignment of the speech with the original transcript <whether to conduct the second matching based on what condition is a system design choice>).” 
ELLOZY does not expressly disclose “comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts ..” However, the limitation is taught by LEE (Title: String matching method and system using phonetic symbols and computer-readable recording medium storing computer program for executing the string matching method).
In the same field of endeavor, LEE teaches: [0009] “a string matching method of outputting a representative string that matches an input string .. converting the input string into one or more phonetic symbol strings .. searching a representative list database .. the 103 may automatically determine a representative string corresponding to the phonetic symbol string that has the highest matching score among the phonetic symbol strings included in the candidate list for the input string as an output representative string for the input string ..”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of LEE in the system taught by ELLOZY to provide phonetic string conversion and comparison for the purpose of text matching. 
As per claim 2 (dependent on claim 1), ELLOZY in view of LEE further discloses “further comprising:
outputting the first matching text as a matched candidate text, in response to determining the first matching text; and outputting the second matching text as the matched candidate text, in response to determining 15the second matching text (ELLOZY, Abstract, This decoded text is then matched with the original written transcript via identification of similar words or clusters of words. The results of this matching is an alignment of the speech with the original transcript <which matching text is to output is a system design choice>; [col. 9, lines 23-28], Match this phonetic string with the decoded text DTi via the correspondence with the acoustic frame string. Use rules (or a table) to produce a phonetic string for the reference text Ti. Then consider words in texts DTi and Ti as being in the same place if they are surrounded by similar phonetic substrings <also see LEE above>).”
claim 4 (dependent on claim 1), ELLOZY in view of LEE further discloses “wherein the comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text comprises: 
25converting the speech identification text into the phonetic symbols of the speech identification text and converting the multiple candidate texts into the phonetic symbols of the multiple candidate texts; calculating a similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts; and 30determining a candidate text with a largest similarity as a matched candidate text in response to determining that the largest similarity is larger than a set threshold; and outputting the matched candidate text  (LEE, [0009], converting the input string into one or more phonetic symbol strings .. searching a representative list database .. the representative list DB storing a plurality of records, each record comprising a representative string and a representative phonetic symbol string corresponding to the representative string; and determining a representative string included in one of the records included in the candidate list as an output representative string; [0047], The representative string determination unit 103 may automatically determine a representative string corresponding to the phonetic symbol string that has the highest matching score <read on a ready mechanism to calculate and determine ‘a largest similarity’ .. while the largest similarity must be greater than a threshold or not is a system design choice> among the phonetic symbol strings included in the candidate list for the input string as an output representative string for the input string).” 
Claims 7-8, 10 (similar in scope to claims 1-2, 4) are rejected under the same rationale as applied above for claims 1-2, 4. 
Claim 13 (similar in scope to claim 1) is rejected under the same rationale as applied above for claim 1.  
4.	Claims 3, 6, 9, 12 are rejected under 35 U.S.C. 103 as being unpatentable over ELLOZY in view of LEE, and further in view of Modani, et al. (US 9454524; hereinafter MODANI).
As per claim 3 (dependent on claim 1), ELLOZY in view of LEE further discloses “[ calculating a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts ], in response to not determining the second 20matching text; and [ outputting a candidate text with a largest similarity ] as a matched candidate text (ELLOZY, Abstract, The results of this matching is an alignment of the speech with the original transcript <whether to conduct the third matching based on what condition is a system design choice>; LEE, [0047], The representative string determination unit 103 may automatically determine a representative string corresponding to the phonetic symbol string that has the highest matching score).” 
ELLOZY in view of LEE does not expressly disclose “calculating a similarity between a sentence vector .. and a sentence vector .. outputting a candidate text with a largest similarity ..” However, the limitation is taught by MODANI (Title: Determining quality of a summary of multimedia content).
In the same field of endeavor, MODANI teaches: [col. 6, lines 2-8] “The vectors generated for each sentence are then used for computing a cosine similarity between a sentence of the multimedia content item and corresponding sentences of a summary. The similarity ST(u,v) between the sentences of the text portions of the multimedia content item and the text portions of the summary is determined based on cosine similarity ..” and [col. 3, lines 29-37] “determining coherence is accomplished by generating vectors from both segments of a text portion and from segments of an image portion and projecting the vectors onto a common unit 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of MODANI in the system taught by ELLOZY and LEE to provide similarity computation based on cosine similarity between sentence vectors representing two different text strings or segments.
As per claim 6 (dependent on claim 3), ELLOZY in view of LEE and MODANI further discloses “wherein the calculating a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts comprises:
segmenting the speech identification text and the multiple candidate texts into words (ELLOZY, Abstract, This decoded text is then matched with the original written transcript via identification of similar words or clusters of words <read on the associated ‘segmentation’>); 
15acquiring a word vector of each word; adding word vectors of words of the speech identification text to obtain the sentence vector of the speech identification text, and adding word vectors of words of one of the multiple candidate texts to acquire a sentence vector of the one of the multiple candidate texts; and calculating a cosine similarity between the sentence vector of the speech identification text and 20the sentence vector of the one of the multiple candidate texts, as the similarity between the sentence vector of the speech identification text and the sentence vector of the one of the multiple candidate texts (MODANI, [col. 5, line 22 - col. 6, line 5], generating 216 vectors for sentences in the text portions .. first generates a syntactic parse tree for at least one training sentence. A semantic vector for each word and clause within each adding word vectors’>. The vectors generated for each sentence are then used for computing a cosine similarity between a sentence of the multimedia content item and corresponding sentences of a summary).”
Claims 9, 12 (similar in scope to claims 3, 6) are rejected under the same rationale as applied above for claims 3, 6. 
 				Conclusion
4.	 Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:00-5:30). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on (571)272-7799.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/	3/24/2021

Primary Examiner, Art Unit 2659