DETAILED ACTION
Applicant’s arguments filed in the reply on 6/1/2021 were received and fully considered. Claims 1, 3, 5, 7, 9, 10, and 14 were amended.  The current office is FINAL. Please see corresponding rejection headings and response to arguments section below for more detail.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed with respect to the claim objections with respect to claim 4 raised in the previous office action have been fully considered, and are persuasive in view of amendment. Therefore, the claim objections are withdrawn.

Applicant's arguments filed with respect to the 35 USC 101 rejections with respect to claims 1-14 raised in the previous office action have been fully considered, and are persuasive in view of amendment. Therefore, the 35 USC 101 rejections are withdrawn in view of amendment.

Applicant’s arguments with respect to the prior art rejections raised in the previous office action have been considered but are moot because the new ground of rejection does not rely on the combination of references that are currently applied. The amended limitation that 
Applicant, in his argument states: the "key phrase" in the present invention is used for retrieving key words or key phrases in audio contents accurately and efficiently”.  In the issued remark, Applicant traversed the previous office action for at least the following reasons. 
Daya appears to disclose a method and apparatus for enhancing the accuracy and reducing errors in speech to text systems. See paragraph [0001]. In light of the technical object, Daya describes improving the accuracy in the speech to text system, rather than to retrieve key words from the speech, leading to a technical solution different from that of the present invention. Specifically, although Daya involves key phrases, the meaning of the "key phrase" is different from the "key phrase" in the present invention. Daya mainly involves assessing the importance or significance of key phrases detected in recognized speech (see paragraph [0007]). Daya further explains the meaning of the key phrases in paragraph [0037]: "Key phrases are generally combinations of one or more words which are logically related, whether linguistically or in the context of the environment. Such key phrases first have to be identified. The questions relevant to key phrases are their correctness, similarly to the word correctness disclosed above, and their importance or significance". As can be seen, the "key phrase" in Daya actually represents those words that are very important for converting the speech into text correctly. On the other hand, the "key phrase" in the present invention is used for retrieving key words or key phrases in audio contents accurately and efficiently. 

Daya states: Par.0008:”  In one embodiment of the disclosure there is thus provided a method for enhancing the analysis of one or more test words extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the method comprising: a receiving step for receiving one or more training words extracted from a training audio source; a first feature extraction step for extracting a first feature from each training word, from the environment, or from the acoustic environment; a second receiving step for receiving an indication whether training word appears in the training audio source; and a model generation step for generating a model using the training words and the first features, and the indication: a third receiving step for receiving one or more test words extracted from the test audio source; a second feature extraction step for extracting second features from the test audio source, from the environment or from the acoustic environment; and a classification step for applying the word training model on the test words and the second features, thus obtaining a confidence score for the test words. The method optionally comprises a first text extraction step for extracting the training words from the training audio source, or a second text extraction step for extracting the test word from the test audio source.”
Daya also teaches keyword/keyphrase extraction from audio content as recited above. Consequently, Daya’s teaching is similar to the Keyphrase identification after it is being or by any other source of information on step 220. Training is preferably performed using methods such as Neural networks, Support Vector Machines (SVM) as described for example in.” Here Daya read on keyphrase identification where neural network also can be used to train keyphrase. Daya further teaches in Par. 0059:” Generating the key phrase confidence or correctness model is performed similarly to step 224 of FIG. 2. However, the model may be more complex, since there are also cases relating to partial recognition of a key phrase.” Also, in Par 0030, Daya teaches: “The model will then provide an indication to whether the particular word is correct, i.e. appears in the audio, or not.” Therefore, Daya teaches keyphrase identification as recited above since the correctness is dealing with the issue of whether the word/keyphrase exist in the audio or not which is further assurance of the keyphrase detection. 

With respect to claim 8, applicant’s states: Second, the labelling method disclosed by Liu is only based on the part of speech for each word, which is significantly different from the scheme of the present invention which trains the key phrase recognition model by using both training samples processed by a natural language processing and training samples labeled manually in sequence. Whereas Liu states: Par. 0019:” Further, the keyword tag in every sentence of the text message is the first preset mark, other words Labeled as the second preset keyword identifying device is additionally provided, including Vector determines Unit, for carrying out word segmentation processing to text to be measured, determines the corresponding term vector of each word indexing unit, uses In in units of the sentence in the text to be measured, the corresponding term vector of each word in every sentence is input in network model, the keyword in the text to be measured is marked using the neural network model.”
As recited Liu also uses keyphrase “unit of sentence” not just POS as argued by the applicant.
Furthermore, in the last portion of the applicant’s argument with respect to claim 8, applicant stated: a natural language processing and training samples labeled manually in “sequence”, which is not claimed.
In response to applicant's argument that the references fail to show certain features of applicant's invention, it is noted that the features upon which applicant relies (i.e., sequence) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Examiner does not agree that the labeling method disclosed by Liu is only for part of speech as it is evident by the aforementioned Paragraphs from Liu’s teaching.
Applicant’s further states that:” Su also does not consider the key phrase (the second training data) determined based on the user's intent, nor does it consider using both the first 
Examiner, respectfully disagrees since phrase is a combination of 2 or more words and as per the Par. quoted by the applicant, there exist entities with multiple words which constitute phrase. Su states in Par. 0069 shows the label being applied to multiple words “label can be defined by the following: NER name (PER), address (LOC), identifying the organization name (ORG) as an example, the following input text: " Zhangsan from Xian, Peking University, the sequence result is marked”
Lastly, per MPEP, applicant’s is addressing his argument individually and not in combination.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 5, 7, 10 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Daya, Dolph et al. (US20190138270A1)(hereinafter “Dolph”), and Liu.

Daya, and Liu were applied in the previous office action.
Regarding claim 1, Daya teaches a method for training a key phrase identification model, comprising: obtaining first training data for identifying feature information of words in a first training text; (Par. 0038:” Referring now to FIG. 4 showing a flowchart of the main steps in training a model for enhanced key phrase recognition …On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. … Step 413 uses a set of linguistic rules 414 for identifying potential key phrases.”, and Par. 0059:”On step 416 each key phrase is represented as a feature vector for further processing in training step 424”)
obtaining second training data for identifying a key phrase in a second training text;(Par. 0038:” On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. The training key phrases are extracted according to linguistic rules 414, which indicate combination having high likelihood of being key phrases.").
	and training the key phrase identification model to be used as a neural network, based on the first training data and the second training data, so as to identify the key phrase in audio data by using the trained key phrase identification model, (Par. 0030:” On step 224 a model is trained, based upon the input training data which consist of a set of pairs, each pair comprising a feature vector constructed in step 216 and a corresponding correctness indication extracted from the manual transcription or received explicitly by manual tagging or by any other source of information on step 220. Training is preferably performed using methods such as Neural networks, Support Vector Machines (SVM) as described for example in. “, and Par. 0059:” On step 424 training is performed for generating key phrase confidence or correctness model 425 and key phrase importance model 426, which preferably include pairs, each pair consisting of a feature vector representation and an indication… Key phrase importance model 426 relates to the importance or significance of the detected key phrases. On step 416 each key phrase is manual indication 415 is received, in which each key phrase is tagged as important or unimportant.”, and Par. 0030:” Training is preferably performed using methods such as neural networks, Support Vector Machines [SVM] ...”, and Par. 0023:” Each example in the training data consists of a pair of a feature vector that represents a single key phrase, and its class label or correctness indication.”, and Par. 0067:” Training components 644 further comprise optional phrase training component 647 for generating a model based on key phrases generated by key phrase extraction component 660 ....”). Note: Two sets of data are being fed into training block 424, training data set from main branch through 416 and the second set from 415.
wherein obtaining the first training data comprises: obtaining the first training text; (Par. 0027:” Training is performed on a training audio source. On step 204 a corpus comprising one or more training audio files or streams is received by a training system. … During step 209 training words are extracted.”)
splitting the first training text into at least one sentence; and (Par. 0038:” Training words extracted on textual extraction step 409 whether by speech to text, word spotting or otherwise receiving text extracted from the audio files, and acoustic features are extracted on step 410.”).
determining the feature information of the words in the at least one sentence using a natural language processing technology, and (Par. 0038:” On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. … Step 413 uses a set of linguistic rules 414 for identifying potential key phrase is represented as a feature vector for further processing in training step 424”).
wherein obtaining the second training data comprises: obtaining the second training text; (Par. 0038:”On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. The training key phrases are extracted according to linguistic rules 414, which indicate combination having high likelihood of being key phrases”).
Daya does not teach the key phrase being one or more key words determined based on a user's intent, splitting the second training text into at least one sentence; and identifying the key phrase in the at least one sentence using a plurality of labels.
Dolph teaches the key phrase being one or more key words determined based on a user's intent, (Par. 0047:” Speech training data corpus 432 is an area or portion of the memory [i.e., memory 220 or 320] that contains training data of the VUI of the application under development within computing system 200. Training data is words, phrases, sentences, or the like [collectively referred herein as training phrases], that are associated with user intents, that when conveyed to and understood by the VUI begins or continues an associated workflow of the application.”).
Therefore, it would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya in view of Dolph to incorporate the key phrase being one or more key words determined based on a user's intent, in order to integrate audio variables into the synthesized audio speech output to simulate 
Neither Daya nor Dolph tech splitting the second training text into at least one sentence; and identifying the key phrase in the at least one sentence using a plurality of labels.
Liu teaches splitting the second training text into at least one sentence; (Par. 0020:” …vector determines Unit, for carrying out word segmentation processing to text to be measured…”, and Par. 0011:”… there is provided a kind of model training method, including: obtain and carry part of speech the text message of mark, wherein, the text message includes a plurality of sentences, and each word in every sentence is carried The part of speech mark of corresponding part of speech type”).
and identifying the key phrase in the at least one sentence using a plurality of labels. (Par. 0014:” Further, keywords mark the text information of each sentence in the first pre-set mark, other words labeled as second pre-set mark, so that when using the neural network model to identify words, marking the keyword is the first pre-set mark.”).
It would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya and Dolph in view of Liu to include splitting the second training text into at least one sentence and identifying the key phrase in the at least one sentence using a plurality of labels in order to provide a model training and device and keyword recognition method to solve the technical problem in the prior art for the keyword identification accuracy difference in sentence, as evidence by Liu (see Par. 0010).

extracted from audio source in which words were found and from the environment and acoustic environment”, and Par. 0008:” ...training words extracted from a training audio source”, and Par. 0027:” Training is performed on a training audio source. … During step 209 training words are extracted.”)
	and converting the first audio sample into the first training text using a speech recognition technology. (Par. 0027:” Substep 209 comprises text extraction by text -related analysis, such as STT or word spotting, otherwise receiving text extracted from the audio files. … During step 209 training words are extracted.”).

Regarding claim 5, Daya teaches the method of claim 4, wherein the feature information includes at least one of the following information about the words: text, part of speech, semantic meaning and syntax information. (Par. 0064:” Natural Language Processing [NLP] tagging components 624 comprise Parts of Speech [POS] tagging engine 628 for assigning a part of speech indication, such as noun, verb, preposition, adverb, adjective or others to words extracted by engine 604 or engine 608. NLP analyses components 624 further comprise stemming engine 632 for reducing words to their basic form, for example "books" will be stemmed to "book", "going" will be stemmed to "go" and the like.”).
Regarding claim 7. Daya teaches the method of claim 6, wherein obtaining the second training text includes: obtaining a second audio sample for training the key phrase identification model; (Par. 0011:”… a key phrase training component for receiving indications and generating key phrase training model between the training key phrase and the features, and an indication; and a classification engine for applying the key phrase training model on the test key phrase and the features… Within the apparatus the features optionally relate to a second audio source.”).
	and converting the second audio sample into the second training text using a speech recognition technology. (Par. 0038:” Referring now to FIG. 4 showing a flowchart of the main steps in training a model for enhanced key phrase recognition and importance testing. Training is performed on a training audio source coming through block 410.”, and Par. 0059:” Key phrase confidence or correctness model 425 relates to the confidence or the correctness of words and word combinations using manual transcription 420 [speech recognition technology].”).

Regarding claim 10, Daya teaches a method for identifying a key phrase in audio, comprising: obtaining audio data to be identified; (Par. 0021:” ... multiple features are determined or extracted from audio source in which words were found and from the environment and acoustic environment”, and Par. 0008:” ...training words extracted from a training audio source”, and Par. 0034:” Referring now to FIG. 3, showing a flowchart of the main steps in enhancing speech to text results, once training is completed”).
	and identifying the key phrase in the audio data by using a trained key phrase identification model as a neural network, (Par. 0034:” FIG. 3, showing a flowchart of the main steps in enhancing speech to text results, once training is completed.”, and Par. 0035:”On step 320, the word training model generated on step 224 above and stored on step 228 above is Key phrases are generally combinations of one or more words which are logically related, whether linguistically or in the context of the environment”, and Par. 0034:”Training is preferably performed using methods such as neural networks, Support Vector Machines (SVM) ...”).
	wherein the key phrase identification model is trained based on first training data for identifying feature information of words in a first training text and second training data for identifying the key phrase in a second training text.(Par. 0022:” A model is then trained on the set of feature vectors and their corresponding labels. After the training step is completed, during on-going usage, also referred to as testing step or production stage, the features are extracted for each found word, followed by the determination of a confidence score, according to the model. The confidence score is then used for determining whether the found word is correct or incorrect.”, and Par. 0023:” Key phrases are located in a text that has been extracted from an audio source, according to a set of linguistic rules, and additional or alternative features are determined for the key phrases”).

wherein obtaining the first training data comprises: obtaining the first training text; (Par. 0027:” Training is performed on a training audio source. On step 204 a corpus comprising one or more training audio files or streams is received by a training system. … During step 209 training words are extracted.”)
Training words extracted on textual extraction step 409 whether by speech to text, word spotting or otherwise receiving text extracted from the audio files, and acoustic features are extracted on step 410.”).
determining the feature information of the words in the at least one sentence using a natural language processing technology, and (Par. 0038:” On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. … Step 413 uses a set of linguistic rules 414 for identifying potential key phrases.”, and Par. 0059:”On step 416 each key phrase is represented as a feature vector for further processing in training step 424”).
wherein obtaining the second training data comprises: obtaining the second training text; (Par. 0038:”On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. The training key phrases are extracted according to linguistic rules 414, which indicate combination having high likelihood of being key phrases”).
Daya does not teach the key phrase being one or more key words determined based on a user's intent, splitting the second training text into at least one sentence; and identifying the key phrase in the at least one sentence using a plurality of labels.
Dolph teaches the key phrase being one or more key words determined based on a user's intent, (Par. 0047:” Speech training data corpus 432 is an area or portion of the memory [i.e., memory 220 or 320] that contains training data of the VUI of the application under development within computing system 200. Training data is words, phrases, sentences, or the training phrases], that are associated with user intents, that when conveyed to and understood by the VUI begins or continues an associated workflow of the application.”).
Therefore, it would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya in view of Dolph to incorporate the key phrase being one or more key words determined based on a user's intent, in order to integrate audio variables into the synthesized audio speech output to simulate differences in user speech voices, accents, volume, background noise, distortions, or the like, as evidence by Dolph (See Par. 0018).
Neither Daya nor Dolph tech splitting the second training text into at least one sentence; and identifying the key phrase in the at least one sentence using a plurality of labels.
Liu teaches splitting the second training text into at least one sentence; (Par. 0020:” …vector determines Unit, for carrying out word segmentation processing to text to be measured…”, and Par. 0011:”… there is provided a kind of model training method, including: obtain and carry part of speech the text message of mark, wherein, the text message includes a plurality of sentences, and each word in every sentence is carried The part of speech mark of corresponding part of speech type”).
and identifying the key phrase in the at least one sentence using a plurality of labels. (Par. 0014:” Further, keywords mark the text information of each sentence in the first pre-set mark, other words labeled as second pre-set mark, so that when using the neural network model to identify words, marking the keyword is the first pre-set mark.”).


Regarding claim 14, Daya teaches an apparatus for training a key phrase identification model, comprising: one or more processors, and a storage device, configured to store one or more programs, wherein, when the one or more programs are executed by the one or more processors, the one or more processors are configured to implement a method for training a key phrase identification model, comprising: (Par. 0026:” ...a Central Processing Unit (CPU) or microprocessor device, … computing platform provisioned with a memory device...Each application is a set of logically interrelated computer programs, modules, or other units and associated data structures that interact to perform one or more specific tasks.” and Par. 0010:” ...The apparatus can further comprise a key phrase extraction component for extracting a training key phrase from the at least one training word and a test key phrase from the at least one test word ...”).
	obtaining first training data for identifying feature information of words in a first training text; (Par. 0059:” On step 424 training is performed for generating key phrase confidence or correctness model 425 and key phrase importance model 426, which preferably include pairs, each pair consisting of a feature vector representation and an indication… Key phrase importance model 426 relates to the importance or significance of the detected key phrases. On step 416 each key phrase is represented as a feature vector for further processing in training step 424. On step 424 manual indication 415 is received, in which each key phrase is tagged as important or unimportant.”).
obtaining second training data for identifying a key phrase in a second training text; (Par. 0038:” On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. The training key phrases are extracted according to linguistic rules 414, which indicate combination having high likelihood of being key phrases"). Note: Two sets of data are being fed into training block 424, training data set from main branch through 416 and the second set from 415.
and training the key phrase identification model to be used as a neural network, based on the first training data and the second training data, so as to identify the key phrase in audio data by using the trained key phrase identification model, (Par. 0059:” On step 424 training is performed for generating key phrase confidence or correctness model 425 and key phrase importance model 426, which preferably include pairs, each pair consisting of a feature vector representation and an indication… Key phrase importance model 426 relates to the importance or significance of the detected key phrases. On step 416 each key phrase is represented as a feature vector for further processing in training step 424. On step 424 manual indication 415 is received, in which each key phrase is tagged as important or unimportant.”, and Par. 0030:” Training is preferably performed using methods such as neural networks, Support Vector Machines [SVM] ...”, and Par. 0023:” Each example in the training data consists of a pair of a feature vector that represents a single key phrase, and its class label or correctness phrase training component 647 for generating a model based on key phrases generated by key phrase extraction component 660 ....”). Note: Two sets of data are being fed into training block 424, training data set from main branch through 416 and the second set from 415.
wherein obtaining the first training data comprises: obtaining the first training text; (Par. 0027:” Training is performed on a training audio source. On step 204 a corpus comprising one or more training audio files or streams is received by a training system. … During step 209 training words are extracted.”)
splitting the first training text into at least one sentence; and (Par. 0038:” Training words extracted on textual extraction step 409 whether by speech to text, word spotting or otherwise receiving text extracted from the audio files, and acoustic features are extracted on step 410.”).
determining the feature information of the words in the at least one sentence using a natural language processing technology, and (Par. 0038:” On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. … Step 413 uses a set of linguistic rules 414 for identifying potential key phrases.”, and Par. 0059:”On step 416 each key phrase is represented as a feature vector for further processing in training step 424”).
wherein obtaining the second training data comprises: obtaining the second training text; (Par. 0038:”On step 412 the text undergoes NLP analysis, including for example POS tagging and stemming. On step 413 training key phrases are extracted from the text. The training key phrases are extracted according to linguistic rules 414, which indicate combination having high likelihood of being key phrases”).
Daya does not teach the key phrase being one or more key words determined based on a user's intent, splitting the second training text into at least one sentence; and identifying the key phrase in the at least one sentence using a plurality of labels.
Dolph teaches the key phrase being one or more key words determined based on a user's intent, (Par. 0047:” Speech training data corpus 432 is an area or portion of the memory [i.e., memory 220 or 320] that contains training data of the VUI of the application under development within computing system 200. Training data is words, phrases, sentences, or the like [collectively referred herein as training phrases], that are associated with user intents, that when conveyed to and understood by the VUI begins or continues an associated workflow of the application.”).
Therefore, it would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya in view of Dolph to incorporate the key phrase being one or more key words determined based on a user's intent, in order to integrate audio variables into the synthesized audio speech output to simulate differences in user speech voices, accents, volume, background noise, distortions, or the like, as evidence by Dolph (See Par. 0018).
Neither Daya nor Dolph tech splitting the second training text into at least one sentence; and identifying the key phrase in the at least one sentence using a plurality of labels.
Liu teaches splitting the second training text into at least one sentence; (Par. 0020:” …vector determines Unit, for carrying out word segmentation processing to text to be training method, including: obtain and carry part of speech the text message of mark, wherein, the text message includes a plurality of sentences, and each word in every sentence is carried The part of speech mark of corresponding part of speech type”).
and identifying the key phrase in the at least one sentence using a plurality of labels. (Par. 0014:” Further, keywords mark the text information of each sentence in the first pre-set mark, other words labeled as second pre-set mark, so that when using the neural network model to identify words, marking the keyword is the first pre-set mark.”).
It would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya in view of Liu to include splitting the second training text into at least one sentence and identifying the key phrase in the at least one sentence using a plurality of labels in order to provide a model training and device and keyword recognition method to solve the technical problem in the prior art for the keyword identification accuracy difference in sentence, as evidence by Liu (see Par. 0010).

	Claims 9, 11, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Daya, Dolph and Liu as applied to claim 1, 10, 11, and 12 respectively,  in further view of Su.

Su was applied in the previous office action.
Regarding claim 9, Daya, Dolph, and Liu teach a method for training a key phrase identification model.


Su teaches identifying a starting character of the key phrase using a first label; (Par. 0068:” Process of NER is sequence annotation problem, namely for the given input text sequence, to each character (or word) label”, and Par. 0072:” B-PER labelling meanings name start character I-PER name of intermediate and end character B-LOC name start character I-LOC name in the middle and ending character B-ORG organization name of the initial character I-ORG organization name  ...”)
identifying a subsequent character of the key phrase using a second label, the subsequent character following the starting character; (Par. 0069:” label can be defined by the following: NER name [PER], address [LOC], identifying the organization name [ORG] as an example, the following input text: " Zhangsan from Xian, Peking University, the sequence result is marked:”, and “Par. 0070:” Zhang B-PER /I-PER to I/O from the I/O /B - LOC /I-LOC, I/O I/O I/O to I/O /B - ORG /I-ORG large /I-ORG /I-ORG…”).
and identifying a character in the at least one sentence that does not belong to the key phrase using a third label. (Par. 0072:” ...start character I-LOC name in the middle and ending other characters”).

It would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya in view of Su to include identifying a starting character of the key phrase using a first label, identifying a subsequent character of the key phrase, and other character in order to process the text character sequence to obtain named entity recognition results corresponding to the input sequence, as evidence by Su (see Par. 0012).

Regarding claim 11, Daya, Dolph, and Liu teach a method for training a key phrase identification model.
Daya further teaches converting the audio data into a text corresponding to the audio data using a speech recognition technology; (Par. 0022:” In addition, manual transcription or an indication to particular words spotted in the audio source is provided for the same audio files or streams”, and Par. 0025:” …capturing/logging unit 132 without being stored, to an enhanced speech to text (STT) engine 136 which transcribes or spots the words...”).

Regarding claim11, Daya, Dolph, and Liu do not teach the method of claim 10, wherein identifying the key phrase in the audio data includes: splitting the text into at least one sentence; determining a corresponding label for a character in the at least one sentence using 
Su teaches splitting the text into at least one sentence; (Par. 0083:” when the input sequence is a text sequence, the input sequence character segmentation processing to obtain the input sequence of the character sequence (x1, x2. .., Xn)…”, and Par. 0012:” processing the text character sequence [sentence] using the condition random field to obtain named entity [key phrase] recognition results corresponding to the input sequence [input text]”).
determining a corresponding label for a character in the at least one sentence using the key phrase identification model; (Par. 0110:” In step 303, processing the character vector sequence using a neural network algorithm [key phrase identification model] to obtain the input sequence of the text character sequence”, and Par. 0111:”It can be understood that, for each character in the input sequence of marking the marking action, can be abstracted as a sequence labeling problem, which essentially is a classification task, namely to determine classification category of each character.”, and Par.  0068:”Process of NER is sequence annotation problem, namely for the given input text sequence, to each character (or word) label”).
and identifying the key phrase in the audio data based on the corresponding label. (Par. 0068:” Process of NER is sequence annotation problem, namely for the given input text sequence, to each character [or word] label.”).
It would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya, Dolph, and Liu in view of Su to include converting the audio data into a text, splitting the text into at least one sentence; 

Regarding claim 12, Daya, Dolph, and Liu teach a method for training a key phrase identification model.
Regarding claim 12, Daya, Dolph, and Liu do not teach the method of claim 11, wherein the corresponding label includes one of: a first label for indicating the character as being a starting character of the key phrase, a second label for indicating the character as being a subsequent character of the key phrase, the subsequent character following the starting character; and a third label for indicating the character as not belonging to the key phrase.
Su teaches the method of claim 11, wherein the corresponding label includes one of: a first label for indicating the character as being a starting character of the key phrase; (Par. 0072:” B-PER labelling meanings name start character I-PER name of intermediate and end character B-LOC name start character I-LOC name in the middle and ending character B-ORG organization name of the initial character I-ORG organization name ...”).
a second label for indicating the character as being a subsequent character of the key phrase, the subsequent character following the starting character; (Par. 0072:” B-PER labelling meanings name start character I-PER name of intermediate and end character B-LOC name start character I-LOC name in the middle and ending character B-ORG organization name of the initial character I-ORG organization name ...”).
other characters.”).
It would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya, Dolph, and Liu in view of Su to include first label for indicating the character as being a starting character, second label for indicating the character as being a subsequent character, and a third label for indicating the character as not belonging to the key phrase in order to process the text character sequence to obtain named entity recognition results corresponding to the input sequence, as evidence by Su (see Par. 0012).

Regarding claim 13, Daya, Dolph, and Liu teach a method for training a key phrase identification model.
Regarding claim13, Daya, Dolph, and Liu do not teach the method of claim 12, wherein identifying the key phrases in the audio data based on the corresponding labels includes: identifying a set consisting of the starting character identified by the first label and the subsequent character identified by the second label, as the key phrase.
Su teaches identifying a set consisting of the starting character identified by the first label and the subsequent character identified by the second label, as the key phrase. (Par. 0081:” In the embodiment of input sequence can be the text sequence, also can be the speech segment”, and Par. 0068:”Process of NER is sequence annotation problem, namely for the given input text sequence, to each character [or word] label”, and Par. 0069:”label can be defined by the following: NER name [PER], address [LOC], identifying the organization name [ORG] as an text: " Zhangsan from Xian, Peking University, the sequence result is marked…”).
It would have been prima facie obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Daya, Dolph, and Liu in view of Su to include identifying a set consisting of the starting character identified by the first label and the subsequent character identified by the second label in order to process the text character sequence to obtain named entity recognition results corresponding to the input sequence, as evidence by Su (see Par. 0012).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689.  
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/D.A./Examiner, Art Unit 2656                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        

07/20/2021