DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 4, and therefore claims 5-7 which depend therefrom, claim 13, and therefore claims 14-16 which depend therefrom, and claims 9, and 18, are objected to because of the following informalities: claim 4 in line 7, claim 13 in line 6, claim 9 in line 2, and claim 18 in lines 1-2, all recite “change a parameter” but should recite “change the parameter.”  Appropriate correction is required.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Yunhao et al., CN108133705A (with reference to EPO Machine generated English language translation, herein “Yunhao”) further in view of Radzikowski et al. Dual supervised learning for non-native speech recognition. J AUDIO SPEECH MUSIC PROC. 2019, 3 (January 14, 2019). https://doi.org/10.1186/s13636-018-0146-4 (herein “Radzikowski NPL”).
Regarding claim 1, Yunhao teaches a speech recognition method comprising (Yunhao, bottom of page 1, the invention is a method to train a speech recognition model, which outputs recognized speech in a dual-learning method) 
learning a first learning model (Yunhao page 2, 8th paragraph down starting with “Finally, returning to step S1,” with speech synthesis regarded as the “main task”, steps S1 to S8 are for training (learning) the speech synthesis model (first learning model), and the data in the next step (step S2) are “symmetrically exchanged”) to obtain first speech data corresponding to first training data (Yunhao page 3, first and fifth paragraphs disclosing aspects of steps S2 and S5, training data are selected from voice data set DA and text data set DB to have training data in the form of voice A, text B, and using the speech synthesis model (first learning model), the text B (symmetrical exchange of training data text B) is converted into speech data A’ (first speech data corresponding to the text B (first training data))).
learning a second learning model (Yunhao page 1, last paragraph, and page 2, steps S2-S8, with speech synthesis regarded as the “main task”, steps S1 to S8 train (learning) the speech recognition model (second learning model) as a dual task).
While Yunhao teaches that the training of its speech recognition model and speech synthesis models is by a duality learning approach, and generally teaches on page 2 that this approach involves switching from the speech recognition model being the main task to the speech synthesis model being the main task, Yunhao does not provide adequate detail to one of ordinary skill as to the specific data inputs/outputs involved when the training of the speech synthesis model is the main task, beyond generic teachings of “data in the next step is symmetrically exchanged.” Therefore, to supplement these omitted details, Radzikowski NPL is herein relied upon as follows.
Radzikowski NPL teaches to obtain a first speech recognition result corresponding to second training data (Radzikowski NPL page 3, right column, feedback loop in a last step 4, the speech recognition model transfers a previously synthesized sample (corresponding to second training data) into textual form (first speech recognition result)); and 
controlling to change a parameter of the first learning model based on an error of the obtained first speech recognition result (Radzikowski NPL page 3, right column, a score                         
                            
                                
                                    a
                                
                                
                                    k
                                
                                
                                    l
                                    t
                                
                            
                        
                     is calculated that represents how correctly (thus considering error) the speech recognition model recognizes the synthesized speech as text (obtained first speech recognition result), where page 4, second column teaches that the score                         
                            
                                
                                    a
                                
                                
                                    k
                                
                                
                                    l
                                    t
                                
                            
                        
                     is used to determine a reward value                         
                            
                                
                                    a
                                
                                
                                    k
                                
                            
                        
                     that page 5 details is used in a gradient based method of optimization to modify the weights (thus change at least one weight/parameter) of the trainable model MTTS where page 3, right column teaches the MTTS model to be the speech generation model (corresponding to the claimed speech synthesis model of Yunhao/ the claimed first learning model)).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and speech synthesis model training teachings of Yunhao to include the detailed teachings of the training loop including error calculation and model adjustment when the error calculated is of the speech recognition model recognizing a synthesized speech input, as detailed in Radzikowski NPL at least because doing so would make it possible for the models to learn weights which will lead to better (in terms of the feedback-giving model) conversion results with further iterations in the training process (see Radzikowski NPL page 3, bottom of left column).
Regarding claims 2 and 11, Yunhao teaches wherein the first training data comprises a pair of text data and speech data corresponding to the text data (Yunhao page 3, first paragraph, training data are selected from voice data set DA and text dataset DB where the training data is in the form “voice A text B”, and the remainder of the steps disclosed on page 3 of Yunhao disclose the relationship between A and B as they are processed through the speech synthesizer and speech recognizer, thus A and B being corresponding to each other
Regarding claims 3 and 12, While Yunhao teaches that in the dual-learning system, the “data in the next step is symmetrically exchanged,” this does not explicitly teach that the second training data is the first speech data.
Radzikowski NPL teaches wherein the second training data is the first speech data (Radzikowski NPL page 3, right column, feedback loop in a last step 4, the speech recognition model transfers a previously synthesized sample (corresponding to second training data), where the previously synthesized sample is generated as output from the MTTS model (thus corresponding to the claimed first speech data)).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and speech synthesis model training teachings of Yunhao to include the detailed teachings of the training loop using the outputs from another model upstream in the processing loop, as detailed in Radzikowski NPL at least because doing so would make it possible for the models to learn weights which will lead to better (in terms of the feedback-giving model) conversion results with further iterations in the training process (see Radzikowski NPL page 3, bottom of left column).
Regarding claim 4, Yunhao teaches wherein the learning of the second learning model comprises learning the second learning model to obtain a second speech recognition result corresponding to second speech data (Yunhao page 2, detailing steps S2-S8 when the voice recognition is the main task, and where using the speech recognition model to be trained (learning of the second learning model) includes inputting training data of “voice A text B” and converting the speech data A into text B’ (second speech recognition result) which corresponds to the voice A training data (second speech data)).
Yunhao does not explicitly teach wherein the controlling to change the parameter of the first learning model comprises: obtaining an error value corresponding to a difference between the obtained first speech recognition result and the second speech recognition result; and 
controlling to change a parameter of the first learning model based on the obtained error value.
Radzikowski NPL teaches wherein the controlling to change the parameter of the first learning model comprises: obtaining an error value corresponding to a difference between the obtained first speech recognition result and the second speech recognition result (Radzikowski NPL page 5, and fig. 3, taking note that the claim is broadly reciting the error value to “correspond to” “a difference” thus permitting a broadest reasonable interpretation of any correspondence to any type of difference - “Is error growing” step obtains an error of the DSL method to decide whether convergence is reached for training, where page 7, section 2.6.4 teaches the error to be a character error rate (of the speech recognition) where this error and adjustment of the MSTT (corresponding to the second learning model) corresponds to the accuracy of the MSTT from both loops (how far off/difference the MSTT performance from one loop is from the other), where loop one/loop L calculates the score of the first speech recognition result (see page 3 of Radzikowski NPL), and loop two/loop S calculates the score of the second speech recognition result (see page 4 of Radzikowski NPL)); and 
Radzikowski NPL page 5, fig. 3, based on if the error is growing, the DSL process will either loop again (and update (change) the weights (at least one parameter) of the MTTS (first learning model), or simply stop updating).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and speech synthesis model training teachings of Yunhao to include the detailed teachings of the training loop including error calculation and model adjustment when the error calculated is of the speech recognition model recognizing a synthesized speech input, as detailed in Radzikowski NPL at least because doing so would make it possible for the models to learn weights which will lead to better (in terms of the feedback-giving model) conversion results with further iterations in the training process (see Radzikowski NPL page 3, bottom of left column).
Regarding claims 5 and 14, Yunhao teaches wherein the second speech data is data based on an actual speech (Yunhao page 2, detailing steps S2-S8 when the voice recognition is the main task, and where using the speech recognition model to be trained includes inputting training data of “voice A text B” where voice A is from voice data set DA, page 1 teaches that the data used to train the models is from large untagged data sources where such untagged data sources are disclosed as WeChat voice (thus from actual speech)).
Regarding claims 6 and 15, Yunhao teaches wherein the first training data, the first speech data, and the second speech data are data based on the same text Yunhao page 2, in step S2 training data selected from voice data set DA and text dataset DB and training data is in the form of “Voice A text B” and where the model obtained by the disclosed method uses one-to-one corresponding standard data in a supervised manner, thus the text data B corresponding to Voice A and used as a basis for both Voice A (second speech data) and the synthesized speech data A’ (first speech data)).
Regarding claims 7 and 16, Yunhao teaches wherein the second speech recognition result are text data (Yunhao page 2, detailing steps S2-S8 when the voice recognition is the main task, and where using the speech recognition model to be trained (learning of the second learning model) includes inputting training data of “voice A text B” and converting the speech data A into text B’ (second speech recognition result) which corresponds to the voice A training data (second speech data)).
While Yunhao discloses the duality learning approach, and that when speech recognition is the “main task” data is symmetrically exchanged as disclosed in steps s1-s8 but for when speech synthesis in the main task, Yunhao does not explicitly teach the first speech recognition result or that it is text data. 
Radzikowski NPL teaches the first speech recognition result is text data (Radzikowski NPL page 3, right column, feedback loop in a last step 4, the speech recognition model transfers a previously synthesized sample (corresponding to second training data) into textual form (first speech recognition result)).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and 
Regarding claim 8, Yunhao teaches further comprising learning at least one of the first learning model or the second learning model using supervised learning (Yunhao page 2, middle of the page, the model obtained by the method provided in the present invention can achieve and use a large number of one-to-one corresponding standard data to train the obtained model in a supervised manner (supervised learning)).
Regarding claim 9, Yunhao teaches wherein the controlling to change the parameter of the first learning model comprises controlling to change a parameter of the first learning model using reinforcement learning (Yunhao on page 3 also teaches controlling parameters of the speech synthesis model (first learning model), and also discloses that a REINFORCE algorithm in the reinforcement learning technique is used in updating the parameters (controlling to change the parameter)).
Regarding claim 10, Yunhao teaches learns the first learning model (Yunhao page 2, 8th paragraph down starting with “Finally, returning to step S1,” with speech synthesis regarded as the “main task”, steps S1 to S8 are for training (learning) the speech synthesis model (first learning model), and the data in the next step (step S2) are “symmetrically exchanged”)  to obtain first speech data corresponding to first training data (Yunhao page 3, first and fifth paragraphs disclosing aspects of steps S2 and S5, training data are selected from voice data set DA and text data set DB to have training data in the form of voice A, text B, and using the speech synthesis model (first learning model), the text B (symmetrical exchange of training data text B) is converted into speech data A’ (first speech data corresponding to the text B (first training data))), 
learns the second learning model (Yunhao page 1, last paragraph, and page 2, steps S2-S8, with speech synthesis regarded as the “main task”, steps S1 to S8 train (learning) the speech recognition model (second learning model) as a dual task).
Yunhao does not explicitly teach A speech recognition device comprising: a memory configured to store a first learning model and a second learning model; and a processor, wherein the processor.
Further, while Yunhao teaches that the training of its speech recognition model and speech synthesis models is by a duality learning approach, and generally teaches on page 2 that this approach involves switching from the speech recognition model being the main task to the speech synthesis model being the main task, Yunhao does not provide adequate detail to one of ordinary skill as to the specific data inputs/outputs involved when the training of the speech synthesis model is the main task, beyond generic teachings of “data in the next step is symmetrically exchanged.” Therefore, to supplement these omitted details, Radzikowski NPL is herein relied upon as follows.
Radzikowski NPL teaches to obtain a first speech recognition result corresponding to second training data (Radzikowski NPL page 3, right column, feedback loop in a last step 4, the speech recognition model transfers a previously synthesized sample (corresponding to second training data) into textual form (first speech recognition result)); and 
Radzikowski NPL page 3, right column, a score                         
                            
                                
                                    a
                                
                                
                                    k
                                
                                
                                    l
                                    t
                                
                            
                        
                     is calculated that represents how correctly (thus considering error) the speech recognition model recognizes the synthesized speech as text (obtained first speech recognition result), where page 4, second column teaches that the score                         
                            
                                
                                    a
                                
                                
                                    k
                                
                                
                                    l
                                    t
                                
                            
                        
                     is used to determine a reward value                         
                            
                                
                                    a
                                
                                
                                    k
                                
                            
                        
                     that page 5 details is used in a gradient based method of optimization to modify the weights (thus change at least one weight/parameter) of the trainable model MTTS where page 3, right column teaches the MTTS model to be the speech generation model (corresponding to the claimed speech synthesis model of Yunhao/ the claimed first learning model)).
Radzikowski NPL further teaches a speech recognition device comprising (Radzikowski NPL page 8 under results section – multiple GTX 1080 Ti graphics cards used to run/execute the disclosed DSL method shown in fig. 3 on page 5 (including speech recognition)): a memory configured to store a first learning model and a second learning model; and a processor, wherein the processor (Radzikowski NPL page 8 and 5, where fig. 3 illustrating the DSL process includes updates to the MSTT and MTTS models, they are stored somewhere in memory on the graphics cards, and where the graphics cards contain a processor).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and speech synthesis model training teachings of Yunhao to include the detailed teachings of the training loop including error calculation and model adjustment when the error 
Regarding claim 13, Yunhao teaches wherein learns the second learning model to obtain a second speech recognition result corresponding to second speech data (Yunhao page 2, detailing steps S2-S8 when the voice recognition is the main task, and where using the speech recognition model to be trained (learning of the second learning model) includes inputting training data of “voice A text B” and converting the speech data A into text B’ (second speech recognition result) which corresponds to the voice A training data (second speech data)).
Yunhao does not explicitly teach wherein the processor obtains an error value corresponding to a difference between the obtained first speech recognition result and the second speech recognition result; and 
controls to change a parameter of the first learning model based on the obtained error value.
Radzikowski NPL teaches wherein the processor (Radzikowski NPL page 8, section 3, GTX 1080 Ti graphics card, which will have a processor) obtains an error value corresponding to a difference between the obtained first speech recognition result and the second speech recognition result (Radzikowski NPL page 5, and fig. 3, taking note that the claim is broadly reciting the error value to “correspond to” “a difference” thus permitting a broadest reasonable interpretation of any correspondence to any type of difference - “Is error growing” step obtains an error of the DSL method to decide whether convergence is reached for training, where page 7, section 2.6.4 teaches the error to be a character error rate (of the speech recognition) where this error and adjustment of the MSTT (corresponding to the second learning model) corresponds to the accuracy of the MSTT from both loops (how far off/difference the MSTT performance from one loop is from the other), where loop one/loop L calculates the score of the first speech recognition result (see page 3 of Radzikowski NPL), and loop two/loop S calculates the score of the second speech recognition result (see page 4 of Radzikowski NPL)); and 
controls to change a parameter of the first learning model based on the obtained error value (Radzikowski NPL page 5, fig. 3, based on if the error is growing, the DSL process will either loop again (and update (change) the weights (at least one parameter) of the MTTS (first learning model), or simply stop updating).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and speech synthesis model training teachings of Yunhao to include the detailed teachings of the training loop including error calculation and model adjustment when the error calculated is of the speech recognition model recognizing a synthesized speech input, as detailed in Radzikowski NPL at least because doing so would make it possible for the models to learn weights which will lead to better (in terms of the feedback-giving 
Regarding claim 17, Yunhao teaches learns at least one of the first learning model or the second learning model using supervised learning (Yunhao page 2, middle of the page, the model obtained by the method provided in the present invention can achieve and use a large number of one-to-one corresponding standard data to train the obtained model in a supervised manner (supervised learning)).
Yunhao does not explicitly teach wherein the processor.
Radzikowski NPL teaches wherein the processor (Radzikowski NPL page 8, section 3, GTX 1080 Ti graphics card, which will have a processor).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and speech synthesis model training teachings of Yunhao to include the GTX 1080 Ti graphics card, as detailed in Radzikowski NPL at least because doing so would save time in running/executing the training method (Radzikowski NPL page 8).
Regarding claim 18, Yunhao teaches Yunhao teaches changes a parameter of the first learning model using reinforcement learning (Yunhao on page 3 also teaches controlling parameters of the speech synthesis model (first learning model), and also discloses that a REINFORCE algorithm in the reinforcement learning technique is used in updating the parameters (controlling to change the parameter)).
Yunhao does not explicitly teach wherein the processor.
Radzikowski NPL page 8, section 3, GTX 1080 Ti graphics card, which will have a processor).
Therefore, taking the teachings of Yunhao and Radzikowski NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition model and speech synthesis model training teachings of Yunhao to include the GTX 1080 Ti graphics card, as detailed in Radzikowski NPL at least because doing so would save time in running/executing the training method (Radzikowski NPL page 8).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Stephens, US 2010/0042410 A1, directed towards training a model for speech synthesis using a speech recognition engine processing speech to provide annotated (labeled) text data for use in training the model for speech synthesis.
Kim et al., US 2020/0394998 A1, directed towards a text to speech synthesis method that uses machine learning to receive input text, and output speech data for the input text.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on Monday-Friday, 9:30am-6:30pm, eastern time zone.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656