DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/09/2019 have been considered by the examiner and been placed of record in the file.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 5-6, 8-11, 15-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kingsbury et al. (US 2016/0005398 A1).

Claim 1. Kingsbury et al. disclose an apparatus (read as computer program product [0061]. FIG. 8) comprising: 
at least one memory configured to store computer program code (read as The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention [0061]); 
at least one hardware processor configured to access said computer program code and operate as instructed by said computer program code, said computer program code including (read as The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention [0061]): 
minimum Bayes risk (MBR) training code (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]) configured to cause said at least one hardware processor to train a sequence-to-sequence model (read as phone sequences to word sequences [0048]); and 
smoothing code configured to cause said at least one hardware processor to apply softmax smoothing (read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]) to an N-best generation (read as N-best paths are extracted [0040]) of the MBR training (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]).
Multiple embodiments were used in the rejection. All the limitations of the claim were disclosed by Kingsbury et al. although not in the exact order.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to use the teaching of Kingsbury et al. of in order to realize all the limitations of the claim.

Claim 5. The apparatus according to claim 1, Kingsbury et al. disclose,
wherein the MBR training code is configured to cause said at least one processor to apply an MBR loss operation (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]) to a plurality of pairs of training data and corresponding reference label sequences (read as phone sequences to word sequences [0048]).

Claim 6. The apparatus according to claim 5, Kingsbury et al. disclose,
wherein the training data comprises training speech utterance data (read as for training was received, 20 hours of audio for each language [0052]).

Claim 8. The apparatus according to claim 5, Kingsbury et al. disclose,
wherein the MBR loss operation comprises a risk operation between a hypothesized label sequence and ones of the reference label sequences (read as The N-best hypotheses represented by each expanded WFST are extracted (block 406), and then mapped back to a set of N or fewer word sequences through composition with a finite state transducer that maps from phone sequences to word sequences (block 408) [0039] …for training was received, 20 hours of audio for each language [0052]).

Claim 9. The apparatus according to claim 5, Kingsbury et al. disclose,
wherein the MBR loss operation comprises a sequence probability given the training data (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]).

Claim 10. The apparatus according to claim 8, Kingsbury et al. disclose,
wherein the MBR training further comprises deriving gradients of the MBR loss operation with respect to a probability (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]), of the sequence-to-sequence model emitting a particular label of the label prediction distribution, and the risk operation (read as phone sequences to word sequences [0048]).

Claim 1 1. Kingsbury et al. disclose a method performed by at least one computer processor comprising: 
minimum Bayes risk (MBR) training a sequence-to-sequence model (read as phone sequences to word sequences [0048]…training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]); and 
applying softmax smoothing (read as final softmax output with 1,000 targets [0054]) to an N-best generation (read as N-best paths are extracted [0040]) of the MBR training obtaining medical records (read as for training was received, 20 hours of audio for each language [0052]. The audio recording could be from medical records. Also using medical record for training is an intended use of the invention.).

Claim 15. The method according to claim 11, Kingsbury et al. disclose,
wherein the MBR training comprises applying an MBR loss operation (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]) to a plurality of pairs of training data and corresponding reference label sequences (read as phone sequences to word sequences [0048]).

Claim16. The method according to claim 15, Kingsbury et al. disclose,
wherein the training data comprises training speech utterance data (read as for training was received, 20 hours of audio for each language [0052]).

Claim 18. The method according to claim 15, Kingsbury et al. disclose,
wherein the MBR loss operation comprises a risk operation between a hypothesized label sequence and ones of the reference label sequences (read as The N-best hypotheses represented by each expanded WFST are extracted (block 406), and then mapped back to a set of N or fewer word sequences through composition with a finite state transducer that maps from phone sequences to word sequences (block 408) [0039] …for training was received, 20 hours of audio for each language [0052]).

Claim 19. The method according to claim 15, Kingsbury et al. disclose,
wherein the MBR loss operation comprises a sequence probability given the training data (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]).

Claim 20. Kingsbury et al. disclose a non-transitory computer readable medium storing a program causing a computer to execute a process (read as The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention [0061]), the process comprising: 
minimum Bayes risk (MBR) training (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]) a sequence-to-sequence model (read as phone sequences to word sequences [0048]); and 
applying softmax smoothing (read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]) to an N-best generation (read as N-best paths are extracted [0040]) of the MBR training obtaining (read as Training occurs in three phases: …distributed Hessian-free training using the state-level minimum Bayes risk criterion [0054]) medical records (read as for training was received, 20 hours of audio for each language [0052]. The audio recording could be from medical records. Also using medical record for training is an intended use of the invention.).
Multiple embodiments were used in the rejection. All the limitations of the claim were disclosed by Kingsbury et al. although not in the exact order.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to use the teaching of Kingsbury et al. of in order to realize all the limitations of the claim.


Claims 2-4, 7, 12-14 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Kingsbury et al. (US 2016/0005398 A1) in view of Wang et al. (US 2019/0266246 A1).

Claim 2. The apparatus according to claim 1, Kingsbury et al. do not explicitly disclose,
wherein the computer program code further includes beam search code configured to cause said at least one hardware processor to perform a beam search during the MBR training.
However, in the related field of endeavor Wang et al. disclose: The decoder 1208 may implement algorithms for feeding the input 1216 to the neural network(s) 1202 for generating the output sequence 1218 or elements thereof. For example, a decoder for SWAN may include program code implementing the beam search algorithm described with reference to FIG. 6 [0074].
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Kingsbury et al. with the teaching of Wang et al. in order to reduce the computational cost of computing the segment probabilities themselves, as well as their gradients as needed for the backward phase, by limiting to a fixed maximum value the length of the segments, and reusing computations for longer segments for shorter segments contained in the longer segments (Wang et al. [0006]). 

Claim 3. The apparatus according to claim 2, the combination of Kingsbury et al. and Wang et al. teaches,
wherein the beam search code is further configured to, during each iteration of the beam search (Wang et al.: read as include program code implementing the beam search algorithm described with reference to FIG. 6 [0074]), apply the softmax smoothing to a label prediction distribution (Kingsbury et al.: read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]).

Claim 4. The apparatus according to claim 3, the combination of Kingsbury et al. and Wang et al. teaches,
wherein the computer program code further comprises obtaining code configured to cause said at least one processor to obtain, as a result of applying the softmax smoothing (Kingsbury et al.: read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]), a plurality of hypothesized outputs applied to a hypothesis space for the MBR training (Kingsbury et al.: read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]).

Claim 7. The apparatus according to claim 5, Kingsbury et al. do not explicitly disclose,
wherein the training data comprises training machine translation data.
However, in the related field of endeavor Wang et al. disclose: sequence modeling approach described herein is applied to the problem of machine translation of text or speech from one human language to another [0007].
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Kingsbury et al. with the teaching of Wang et al. in order to reduce the computational cost of computing the segment probabilities themselves, as well as their gradients as needed for the backward phase, by limiting to a fixed maximum value the length of the segments, and reusing computations for longer segments for shorter segments contained in the longer segments (Wang et al. [0006]).

Claim 12. The method according to claim 11, Kingsbury et al. do not explicitly disclose,
further comprising: performing a beam search  during the MBR training.
However, in the related field of endeavor Wang et al. disclose: The decoder 1208 may implement algorithms for feeding the input 1216 to the neural network(s) 1202 for generating the output sequence 1218 or elements thereof. For example, a decoder for SWAN may include program code implementing the beam search algorithm described with reference to FIG. 6 [0074].
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Kingsbury et al. with the teaching of Wang et al. in order to reduce the computational cost of computing the segment probabilities themselves, as well as their gradients as needed for the backward phase, by limiting to a fixed maximum value the length of the segments, and reusing computations for longer segments for shorter segments contained in the longer segments (Wang et al. [0006]).

Claim 13. The method according to claim 12, further comprising: 
during each step of the beam search (Wang et al.: read as include program code implementing the beam search algorithm described with reference to FIG. 6 [0074]), applying the softmax smoothing to a label prediction distribution (Kingsbury et al.: read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]).

Claim 14. The method according to claim 13, the combination of Kingsbury et al. and Wang et al. teaches,
further comprising: 
obtaining, as a result of applying the softmax smoothing (Kingsbury et al.: read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]), a plurality of hypothesized outputs applied to a hypothesis space for the MBR training (Kingsbury et al.: read as contains 5 hidden layers with 1,024 logistic units per layer, and has a final softmax output with 1,000 targets [0054]).

Claim 17. The method according to claim 15, Kingsbury et al. do not explicitly disclose,
wherein the training data comprises training machine translation data.
However, in the related field of endeavor Wang et al. disclose: sequence modeling approach described herein is applied to the problem of machine translation of text or speech from one human language to another [0007].
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Kingsbury et al. with the teaching of Wang et al. in order to reduce the computational cost of computing the segment probabilities themselves, as well as their gradients as needed for the backward phase, by limiting to a fixed maximum value the length of the segments, and reusing computations for longer segments for shorter segments contained in the longer segments (Wang et al. [0006]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMED RACHEDINE whose telephone number is (571)272-9249. The examiner can normally be reached Mon-Fri 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lester Kincaid can be reached on (571)272-7922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MOHAMMED . RACHEDINE
Examiner
Art Unit 2649



/MOHAMMED RACHEDINE/Primary Examiner, Art Unit 2646