DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/17/2022 has been entered.

Response to Arguments
Applicant’s arguments, see Remarks, filed 02/17/2022, with respect to the rejection(s) of claim(s) 1-3 and 9-12 under 35 U.S.C. 102(a)(1) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Konopka and Weber for claims 1-3 and 9-12. Furthermore, a rejection under 35 U.S.C. 103 has been made in view of Konopka, Weber and Yoshizawa for claims 5 and 6; and Konopka, Weber and Scarano for claims 7 and 8.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-3, and 9-12  is/are rejected under 35 U.S.C. 103 as being unpatentable over Konopka et al. (US PG Pub 20030036903) in view of Weber et al. (US Patent 9495955; hereinafter “Weber”).

With respect to Claim 1, Konopka discloses: 	A recognition device (Figure 1, 10--1, block diagram of the speech recognition system…system includes user sites 101, 102, …10n, [0019]), comprising: 	storage (Figure 1, Database 18, best-matched speech model and the correction data needed to correct that speech model to achieve a closer match to the user's utterance are accumulated for later transmission to database 18… data is transferred to database 18 whereat it is associated with the class registered by the user…database 18 collects utterances, speech models and correction data from several different users, and this information is stored in association with the individual classes that have been registered, [0024]) configured to store first recognition models created based on first data sets, each of the first data sets comprising first recognition target data collected under a predetermined condition and first correct data to be recognized from the first recognition target data, the first data sets, and tags indicative of the predetermined condition in each of the first data sets (…samples of calibrated utterances of the user are entered by way of user input 12. Such calibrated utterances are predetermined words, phrases and sentences which are compared to the stored speech models in speech recognition apparatus 14 for determining the rejection score of those calibrated utterances and for establishing the type and degree of correction needed to conform the stored speech models to those utterances, [0023]… Criteria that identify a user's class may include, but are not limited to, the primary language spoken by the user, the user's gender, the user's age, the number of years the user has spoken the system target language, user's height, user's weight, the age of the user when he first learned the system target language, and the like. [0023]); and 	a processor configured to (Figure 1, 20, Speech Recognition Processor Module, …a speech recognition processor module 20 that operates to correct and retrain speech models stored in central database 18, [0019]): 	acquire a second data set comprising second recognition target data and second correct data to be recognized from the second recognition target data (Then, as depicted in step 44, the new data presented to this fully shared network is used to find transcriptions, or paths, through those models whose rejection scores are greater than the rejection scores for the initial transcriptions. This results in improved subword transcriptions that return higher rejection scores, [0030]; see also [0023]); 	execute recognition processing of the second recognition target data in the second data set by using each of the first recognition models stored in the storage (Figure 2a, step 34…the user also enters utterances which are sampled and compared to a predetermined set of stored speech models by speech recognition apparatus 14, [0028]);	extract a particular tag of the tags stored in the storage (…sensed utterance is sampled to extract therefrom identifiable speech features, [0028]) based on the acquired representative value; (These speech features are compared to the stored speech models and the optimal match between a sequence of features and the stored models is obtained, [0028]; When the accumulated number of such underperforming utterances exceeds a preset threshold, it is concluded that a sufficient number of underperforming utterances has been accumulated and either retraining of the set of speech which resulted in these Viterbi scores models or the derivation of a new class and corresponding speech models is initiated, [0031]); and 	create a second recognition model based on the acquired second data set and a first data set comprising first recognition target data collected under a condition indicated by the extracted tag and …the best-matched model nevertheless differs from the sampled feature sequence to the extent that an improved set of speech models is needed to optimize recognition performance...; the speech recognition apparatus at the user's site uses both the updated speech models and the original speech models stored thereat to determine respective best matches to new utterances. If subsequent utterances are determined to be better matched to the updated speech models, the original speech models are replaced by the updated models, [0028; 0036]).  	Konopka, however, does teach compute a recognition accuracy of each of the first recognition models by comparing a result of the recognition processing of the second recognition target data with the second correct data; use the computed recognition accuracy of each of the first recognition models as an accuracy of each of the tags indicative of the predetermined condition in each of the first data set used to create each of the first recognition models; acquire a representative value of the accuracies of the tags.	Within the same field of speech recognition, Weber does teach compute a recognition accuracy of each of the first recognition models by comparing a result of the recognition processing of the second recognition target data with the second correct data; use the computed recognition accuracy of each of the first recognition models as an accuracy of each of the tags indicative of the predetermined condition in each of the first data set used to create each of the first recognition models; acquire a representative value of the accuracies of the tags (In some implementations, speech recognition results for an existing acoustic model may be generated and evaluated. Speech recognition results having low confidence (accuracy) values may be identified and the acoustic model may be updated or reconfigured by finding examples of utterances within the corpus having characteristics (tags) in common with the utterances resulting in the low confidence values. For example, speech recognition may provide poor results (have low confidence values) for men with deep voices (acquiring tag representing men with deep voices) or for words with relatively rare triphones. In such instances, examples of utterances or portions of utterances within the corpus from male speakers with deep voices or of words with the relatively rare triphones may be identified and used to update or adapt an acoustic model. This process of identifying weak areas in an acoustic model and updating or adapting the model may advantageously improve the accuracy of acoustic models, Col. 3, lines 58-67 & Col. 4, lines 1-6; see also Col. 5, lines 38-54).
Therefore, it would have been obvious to one of ordinary skill in the art to modify the recognition device of Konopka to include compute a recognition accuracy of each of the first recognition models by comparing a result of the recognition processing of the second recognition target data with the second correct data; use the computed recognition accuracy of each of the first recognition models as an accuracy of each of the tags indicative of the predetermined condition in each of the first data set used to create each of the first recognition models; acquire a representative value of the accuracies of the tags, as taught by Weber, in order to improve the training of acoustic models by obtaining training data from a pre-existing corpus of audio data and corresponding transcription data (Col. 2, lines 9-19).

With respect to Claim 2, Konopka discloses: 	The recognition device of Claim 1, wherein the first recognition target data in the first data set comprises first voice data (…samples of calibrated utterances of the user are entered by way of user input 12. Such calibrated utterances are predetermined words, phrases and sentences which are compared to the stored speech models in speech recognition apparatus 14 for determining the rejection score of those calibrated utterances and for establishing the type and degree of correction needed to conform the stored speech models to those utterances, [0023]), the first correct data in the first data set 30comprises a first text written from the first voice data (...user input 12 is provided with, for example, a keyboard or other data input device, by which the user may enter predetermined criteria that characterize his speech as being spoken by users of a particular class, [0023]), the second recognition …the class of the user may be determined to be French male, age 35, the system target language spoken for 15 years; or Japanese female, age 22, the system target language learned at the age of 14 and spoken for eight years; or an Australian male, [0023]; Examiner interprets multiple utterances and classes of users may be accepted since multiple users and their associated utterances can be stored.), and the recognition processing comprises processing of recognizing voice from the voice data and converting the voice into a text (…the samples that are uploaded to the central site are phonetically transcribed using a context-constrained (e.g. phono tactically-constrained) network at that site, as represented by step 40 [fig 2a]. That is, the system attempts to transcribe speech in a subword-by-subword manner and then link, or string those subwords together to form words or speech passages…transcribed samples are linked, or strung together using canonical (i.e. dictionary-supplied) subword spellings, resulting in words, as represented by step 42…Inquiry 46 then is made to determine if there are, in fact, a sufficient number of utterances of improved rejection scores…step 56 is carried out to return the transcriptions created from such utterances as updated models for use in the lexica at those sites of users of the class from which the improved utterances were created, [0029]).  

With respect to Claim 3, Konopka discloses: 	The recognition device of Claim 2, wherein the processor is configured to: input third voice data (In addition to entering criteria data, the user also enters utterances which are sampled and compared to a predetermined set of stored speech models by speech recognition apparatus 14, [0028]; Examiner interprets multiple voice data can be input since user can enter multiple utterances), and convert the third voice data into a third text with the created second recognition model (…samples of calibrated utterances of the user are entered by way of user input 12. Such calibrated utterances are predetermined words, phrases and sentences which are compared to the stored speech models in speech recognition apparatus 14 for determining the rejection score of those calibrated utterances and for establishing the type and degree of correction needed to conform the stored speech models to those utterances, [0023]).  

With respect to Claim 9, Konopka discloses: 	The recognition device of Claim 2, wherein the processor is configured to:  25determine whether a data amount of the acquired second data set and the first data set stored in the storage in association with the extracted tag is sufficient or not (…acoustic subword data that differs from a best-matched speech model by at least a predetermined amount…is identified. The corresponding best-matched speech model and the correction data needed to correct that speech model to achieve a closer match to the user's utterance are accumulated for later transmission to database 18 if batch processing is implemented, or forwarded immediately, if interactive mode is implemented. In either mode, the data is transferred to database 18 whereat it is associated with the class registered by the user, [0024]); 
create a third data set based on the second data 30set if it is determined that the data amount of the second data set and the first data set is not sufficient (After a sufficient quantity, or so-called "critical mass", of subword and correction data has been collected at database 18, module 20 either creates, or retrains the speech models stored in the database, resulting in an updated set of speech models, [0024]); 
create a second recognition model, based on the acquired second data set, the first data set stored in 35the storage in association with the extracted tag, and - 44 -the created third data set (…the best-matched model nevertheless differs from the sampled feature sequence to the extent that an improved set of speech models is needed to optimize recognition performance...; …database 18 collects utterances, speech models and correction data from several different users, and this information is stored in association with the individual classes that have been registered…the speech recognition apparatus at the user's site uses both the updated speech models and the original speech models stored thereat to determine respective best matches to new utterances. If subsequent utterances are determined to be better matched to the updated speech models, the original speech models are replaced by the updated models, [0028; 0036]).

With respect to Claim 10, Konopka discloses: 	The recognition device of Claim 9, wherein the processor is configured to: create third voice data from a third text acquired 5based on a keyword extracted from the second text in the second data set (In addition to entering criteria data, the user also enters utterances which are sampled and compared to a predetermined set of stored speech models by speech recognition apparatus 14, [0028]; Examiner interprets multiple voice data can be input since user can enter multiple utterances); and create a third data set comprising the third voice data and the third text (…samples of calibrated utterances of the user are entered by way of user input 12. Such calibrated utterances are predetermined words, phrases and sentences which are compared to the stored speech models in speech recognition apparatus 14 for determining the rejection score of those calibrated utterances and for establishing the type and degree of correction needed to conform the stored speech models to those utterances, [0023]).

With respect to Claim 11, Konopka discloses: 	A method executed by a recognition device (Figure 1, 10--1, block diagram of the speech recognition system…system includes user sites 101, 102, …10n, [0019]), which contains similar subject matter as claim 1 and thus is rejected similarly.  

With respect to Claim 12, Konopka discloses: 	A non-transitory computer-readable storage medium having stored thereon a computer program which is executable by a computer (…the retrained speech models may be recorded on a CD-ROM or other portable storage device and delivered to a user site to replace the original speech models stored in speech recognition apparatus 14, [0024]) using storage (Figure 1, Database 18, best-matched speech model and the correction data needed to correct that speech model to achieve a closer match to the user's utterance are accumulated for later transmission to database 18… data is transferred to database 18 whereat it is associated with the class registered by the user…database 18 collects utterances, speech models and correction data from several different users, and this information is stored in association with the individual classes that have been registered, [0024]) which contains similar subject matter as claim 1 and, thus, is rejected similarly.  

	Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Konopka and Weber in view of Yoshizawa [US Patent 7603276].

With respect to Claim 5, Konopka discloses: 	The recognition device of Claim 2.
Konopka fails to teach: wherein the processor is configured to: display the extracted tag; and create a second recognition model based on the 25acquired second dataset and the first data set stored in the storage in association with a tag designated by the user, of the displayed tags.  
In the same field of speech recognition, Yoshizawa teaches: wherein the processor is configured to: display the extracted tag (Figure 6B,…user selects acoustic models corresponding to the family members (i.e., those who use the speech recognition) using a display connected to the PC (the server 101), as shown by screen display examples in FIG. 6A and FIG. 6B. FIG. 6 shows that the acoustic models stored in the CD-ROM are displayed in a box indicated as "CD-ROM" and that the acoustic models selected from among these models have been copied into a box indicated as "USERS".…More specifically, the reading unit 111 reads the three reference models, which are then stored into the reference model storing unit 103 via the reference model preparing unit 102, [Column 21, lines 37-54]); and create a  (…The reference model selecting unit 305 selects the car-A reference model and the car-B reference model which are acoustically similar to the car noise created as the usage information 324, from among the reference models 321 stored in the reference model storing unit 303 (step S302)…the specification information creating unit 307 creates the specification information 325 on the basis of the specifications of the PDA 301 (step S303). In the present example, the specification information 325 indicating that the CPU power is small is created, on the basis of the specifications of the CPU provided for the PDA 301. The standard model creating unit 306 creates the standard model 322 so as to maximize or locally maximize the probability or likelihood with respect to the reference models 323 selected by the reference model selecting unit 305, on the basis of the created specification information 325 (step S304)…Finally, the noise identifying unit 313 performs noise identification on the noise inputted from the microphone 312 by the user, using the standard model 322 (step S305), [Column 32, lines 47-65]. 
It would have been obvious to one of ordinary skill in the art at the time of effective filing to combine the displaying of the extracted tag of Yoshikawa with the system of Konopka in order to “provide a user with, in a short period of time, a high-precision recognition model (standard model) suitable for the specifications of apparatuses and applications and for usage environments, by effectively utilizing many recognition models (reference models)”, (Yoshikawa, [Column 1, lines 62-67]). 

With respect to Claim 6, Yoshikawa teaches: 	wherein the processor is configured to display a tag related to the 30extracted tag (Figure 6B,…user selects acoustic models corresponding to the family members (i.e., those who use the speech recognition) using a display connected to the PC (the server 101), as shown by screen display examples in FIG. 6A and FIG. 6B. FIG. 6 shows that the acoustic models stored in the CD-ROM are displayed in a box indicated as "CD-ROM" and that the acoustic models selected from among these models have been copied into a box indicated as "USERS".…More specifically, the reading unit 111 reads the three reference models, which are then stored into the reference model storing unit 103 via the reference model preparing unit 102, [Column 21, lines 37-54]); (Figure 14, …in advance of the standard model creation, reference models serving as criteria are prepared (step S300). To be more specific: the reading unit 311 reads the noise reference models written on the storage device such as a CD-ROM; the reference model preparing unit 302 transmits the read reference models 321 to the reference model storing unit 303; and the reference model storing unit 303 stores the reference models 321…the usage information creating unit 304 creates the usage information 324, i.e., the noise type to be identified (step S301)… In this example, car noise has been selected…the noise identifying unit 313 performs noise identification on the noise inputted from the microphone 312 by the user, using the standard model 322 (step S305) [Column 32, lines 30-37, 46, 62-65)).  

	Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Konopka and Weber and in further view of Scarano et al. [US Patent 8055503]. 

	With respect to Claim 7, Konopka discloses: 	The recognition device of Claim 2, wherein the processor is configured to: execute recognition processing of the second voice data in the second data set by using the created second 35recognition model (Figure 2a, step 34…the user also enters utterances which are sampled and compared to a predetermined set of stored speech models by speech recognition apparatus 14, [0028]). 
Konopka fails to teach: -43 -compute a recognition accuracy of the second recognition model by comparing the recognition processing result of the second voice data using the created second 
Within the same field of speech recognition, Weber teaches: compute a recognition accuracy of the second recognition model by comparing the recognition processing result of the second voice data using the created second recognition model with the second text (In some implementations, speech recognition results for an existing acoustic model may be generated and evaluated. Speech recognition results having low confidence (accuracy) values may be identified and the acoustic model may be updated or reconfigured by finding examples of utterances within the corpus having characteristics (tags) in common with the utterances resulting in the low confidence values. For example, speech recognition may provide poor results (have low confidence values) for men with deep voices (acquiring tag representing men with deep voices) or for words with relatively rare triphones. In such instances, examples of utterances or portions of utterances within the corpus from male speakers with deep voices or of words with the relatively rare triphones may be identified and used to update or adapt an acoustic model. This process of identifying weak areas in an acoustic model and updating or adapting the model may advantageously improve the accuracy of acoustic models, Col. 3, lines 58-67 & Col. 4, lines 1-6; see also Col. 5, lines 38-54).
Therefore, it would have been obvious to one of ordinary skill in the art to modify the recognition device of Konopka to include compute a recognition accuracy of each of the first recognition models by comparing a result of the recognition processing of the second recognition target data with the second correct data; use the computed recognition accuracy of each of the first recognition models as an accuracy of each of the tags indicative of the predetermined condition in each of the first data set used to create each of the first recognition models; acquire a representative value of the accuracies of the tags, as taught by Weber, in order to improve the training of acoustic models by obtaining training data from a pre-existing corpus of audio data and corresponding transcription data (Col. 2, lines 9-19). 

Within the same field of speech recognition, Scarano teaches: display the computed recognition accuracy of the second recognition model (…FIG. 14 depicts a speech statistics setup display. The speech statistics component is used for displaying real-time graphics of statistics…a statistic can be created to count the number of times that a specific phrase is heard, is missing, or to calculate statistics based on any other measures, [Column 11, lines 65-67, Column 12, lines 1-2]).
It would have been obvious to one of ordinary skill in the art at the time of effective filing to combine the computation of a recognition accuracy of Weber with the system of Konopka with the displaying of the accuracy of Scarano in order to leverage voice recognition technology to provide new and improved features and functionality for use in audio data analysis, (Scarano, [Column 1, lines 27-29]). 

	With respect to Claim 8, Konopka discloses: 	The recognition device of Claim 7, wherein the processor is configured to: 10create a third recognition model based on the second data set (…the best-matched model nevertheless differs from the sampled feature sequence to the extent that an improved set of speech models is needed to optimize recognition performance...; the speech recognition apparatus at the user's site uses both the updated speech models and the original speech models stored thereat to determine respective best matches to new utterances. If subsequent utterances are determined to be better matched to the updated speech models, the original speech models are replaced by the updated models, [0028; 0036]). 
Konopka fails to teach: 15compute a recognition accuracy of the third recognition model by comparing the recognition processing result of the second voice data using the created third recognition 
Within the same field of speech recognition, Gurunath teaches: compute a recognition accuracy of the third recognition model by comparing the recognition processing result of the second voice data using the created third recognition model with the second text in the second data set (In some implementations, speech recognition results for an existing acoustic model may be generated and evaluated. Speech recognition results having low confidence (accuracy) values may be identified and the acoustic model may be updated or reconfigured by finding examples of utterances within the corpus having characteristics (tags) in common with the utterances resulting in the low confidence values. For example, speech recognition may provide poor results (have low confidence values) for men with deep voices (acquiring tag representing men with deep voices) or for words with relatively rare triphones. In such instances, examples of utterances or portions of utterances within the corpus from male speakers with deep voices or of words with the relatively rare triphones may be identified and used to update or adapt an acoustic model. This process of identifying weak areas in an acoustic model and updating or adapting the model may advantageously improve the accuracy of acoustic models, Col. 3, lines 58-67 & Col. 4, lines 1-6; see also Col. 5, lines 38-54).
Therefore, it would have been obvious to one of ordinary skill in the art to modify the recognition device of Konopka to include compute a recognition accuracy of each of the first recognition models by comparing a result of the recognition processing of the second recognition target data with the second correct data; use the computed recognition accuracy of each of the first recognition models as an accuracy of each of the tags indicative of the predetermined condition in each of the first data set used to create each of the first recognition models; acquire a representative value of the accuracies of the tags, as taught by Weber, in order to improve the training of acoustic models by obtaining training data from a pre-existing corpus of audio data and corresponding transcription data (Col. 2, lines 9-19). 

	Within the same field of speech recognition, Scarano teaches display the recognition accuracy of the second recognition model and the recognition accuracy of the third recognition model (…FIG. 14 depicts a speech statistics setup display. The speech statistics component is used for displaying real-time graphics of statistics…a statistic can be created to count the number of times that a specific phrase is heard, is missing, or to calculate statistics based on any other measures, [Column 11, lines 65-67, Column 12, lines 1-2]).
It would have been obvious to one of ordinary skill in the art at the time of effective filing to combine the computation of a recognition accuracy of Gurunath with the system of Konopka with the displaying of the accuracy of Scarano in order to leverage voice recognition technology to provide new and improved features and functionality for use in audio data analysis, (Scarano, [Column 1, lines 27-29]). 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:
Senior et al. (US Patent 9786270) which provides for generating acoustic models for speech recognition.	Qian (US PG Pub 20190304437) which provides for acquiring a multi-talker mixed speech signal from a plurality of speakers, performing permutation invariant training (PIT) model training on the multi-talker mixed speech signal based on knowledge from a single-talker 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658