Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed.  Examiner recommends a few modifications to detail the ‘spike’ and the types of ‘models’.

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 18-20 are rejected under 35 U.S.C. 101 because  the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claimed computer product only relates to a computer readable storage medium, which by the specification’s definition, is not limited to a non-transitory type of computer readable storage medium.  Although para 0155 of applicants spec (PGPUB 20210082399) explicitly states “a computer readable storage medium, as used herein, is not to be construed as being a transitory signals per se” and “computer readable can be a tangible device…”, examiner notes the un-definitiveness of the language in the specification; more particularly, ‘per se’ and ‘can be’ create an open-ended list and does not preclude the embodiment of a transitory type computer readable medium.  see MPEP § 2106, subsection I); When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter.  See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007)  transitory embodiments are not directed to statutory subject matter) 

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1-9, 11-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Rao et al (10229672).
Rao et al (10229672) teaches a computer-implemented method for aligning spike timing of models (as phone label spikes in an alignment technique – col. 5 lines 50-60), comprising: 
generating a first model having a first architecture trained with a set of training samples each including an input sequence of observations and an output sequence of symbols having different length from the input sequence (as, using a CTC model to generate a first approximation alignments – col. 3 lines 62-64; wherein the CTC generates a sequence of symbols – labels – col. 5 lines 54-62, with an additional blank output label – col. 5 line 67 – col. 6 line 3 – examiner notes that this additional blank output label creates a different length sequence compared to the input sequence) ;
 training one or more second models with the trained first model (as training a second CTC with high recognition accuracy using the first CTC model – col. 3 lines 64-66) by minimizing a guide loss jointly with a normal loss for each second model (as, the CTC calculates a loss function optimizing the total likelihood between source and target – col. 6 lines 28-35 – examiner notes that the interleaving of the blank label, teaches that the source and target is the first and second CTC respectively), the guide loss evaluating dissimilarity in spike timing between the trained first model and each second model being trained (when applied to the acoustic modeling – col. 6 lines 34-36, and referring back to the phonetic alignments and labeling – col. 5 line 22-30, col. 5 lines 50-57);
 and performing a sequence recognition task using the one or more second models (and performing the recognition task with the more accurate second CTC model – col. 3 line 64-67). 

Rao et al (10229672) teaches the method of claim 1, wherein the training of the one or more second models comprises: 
preparing a mask using posterior distributions obtained from the trained first model by feeding an input sample thereto (as, mapping a relationship between the phonetic alignments vs the CTC phone posteriors – col. 6 lines 20-25; wherein the CI labels are used to build a phone-based model – col. 5 lines 38-43);
 and applying the mask to outputs obtained for the input sample from each second model being trained to obtain masked posterior distributions, the masked posterior distributions at least partially determining the guide loss (as applying the CI label-based phone model – col. 5 lines 38-40, and then the CTC CD phone directory is used – col. 5 lines 40-41, to train the secondary CTC model – col. 5 lines 58-62). 

As per claim 3, Rao et al (10229672) teaches the method of claim 2, wherein the mask is configured to pass at least an output value corresponding to a spike emitted from the trained first model at each time index, the spike representing an output symbol other than at least a blank symbol (as, the spike represents positions in the sequence of the most appropriate phone label -- col. 6 lines 9-15). 

As per claim 4, Rao et al (10229672) teaches the method of claim 2, wherein the mask has a value representing whether to pass or not pass an output value for each output symbol and each time index in a manner depending on a corresponding posterior distribution for each time index in the posterior distributions (wherein, if the mask is at a certain threshold, col. 9 lines 25-

As per claim 5, Rao et al (10229672) teaches the method of claim 2, wherein the mask has a factor representing degree of passing an output value for each output symbol and each time index in a manner depending on a corresponding probability in the posterior distributions (as, using mean and diagonal covariances – col. 8 lines 58-62, to measure accuracy on the alignment – col. 8 lines 52-58). 

As per claim 6, Rao et al (10229672) teaches the method of claim 1, wherein at least one of the one or more second models trained with the first model is used for posterior fusion (as, the CTC phone posteriors are used to train the DNN phone alignments, hence the posterior label are tied to the DNN – col. 6 lines 22-28). 

As per claim 7,  Rao et al (10229672) teaches the method of claim 1, wherein at least one of the one or more second models trained with the first model is used as a teacher model (as, the trained CI model – col. 6 lines 19-25) is used for knowledge distillation to train a student model, the student model having an architecture matched to the first architecture of the first model (as, using the CI model to build a CD phone inventory – col. 5 lines 37-42). 

As per claims 8,9,  Rao et al (10229672) teaches the method of claim 8, wherein the first architecture is a unidirectional and the second architecture is a bidirectional (col. 2, lines 27-30, 

As per claim 11, Rao et al (10229672) teaches the method of claim 1, wherein each second model constitutes an end-to-end speech recognition model (col. 5 lines 18-25), each observation in the input sequence of the training sample represents an acoustic feature (as using acoustic features as the sequence – col. 9 lines 39-44), and each symbol in the output sequence of the training sample represents at least one of a phone, a context dependent phone, a character, a word-piece, and a word (Examiner notes that the claim language states ‘one of’ – see col. 1 lines 23-34, showing context dependent phones). 

As per claim 12, Rao et al (10229672) teaches the method of claim 1, wherein the first model and the second model are alignment-free models (as the CTC can be not-fixed-alignment models – col. 1 lines 28-33). 

As per claim 13, Rao et al (10229672) teaches the method of claim 1, wherein the normal loss is CTC (Connectionist Temporal Classification) loss (as, using a CTC model to generate a first approximation alignments – col. 3 lines 62-64; wherein the CTC generates a sequence of symbols – labels – col. 5 lines 54-62, with an additional blank output label – col. 5 line 67 – col. 6 line 3 – examiner notes that this additional blank output label creates a different length sequence compared to the input sequence) ;
the first model and the second model are CTC models and the guide loss evaluates the dissimilarity in the spike timing while ignoring at least blank symbols ((as, the CTC calculates a 

	Claims 14-17 are computer system claims that perform method steps that are found in claims 1-9,11-13 above and as such, claims 14-17 are similar in scope and content to claims 1-9,11-13 above and therefore, claims 14-17 are rejected under similar rationale as presented against claims 1-9,11-13 above.  Furthermore, Rao et al (10229672) teaches computer system (Figs 3,6) that have cpu/memories; see also col. 16 lines 25-30, lines 40-49, lines 57-66. 

	Claims 18-20 are computer products claims that perform method steps that are found in claims 1-9,11-13 above and as such, claims 18-20 are similar in scope and content to claims 1-9,11-13 above and therefore, claims 18-20 are rejected under similar rationale as presented against claims 1-9,11-13 above.  Furthermore, Rao et al (10229672) teaches computer products/storage (Figs 3,6); see also col. 16 lines 57-66, being executed by the processor in col. 16 lines 26-35.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Rao et al (10229672) in view of  Senior (20170011738).

As per claim 10, Rao et al (10229672) teaches the secondary models trained with the first model, as shown above; however, Rao et al (10229672) does not explicitly teach the idea of using the model for ROVER, ie, “the method of claim 1, wherein at least one of the one or more second models trained with the first model is used for ROVER (Recognizer Output Voting Error Reduction).”; however, Senior (20170011738) teaches the concept of using ROVER in speech recognition systems (para 0062), and in combination with CTC networks – para 0064).  Therefore, it would have been obvious to one of ordinary skill in the art of CTC networks to implement into a ROVER technique, as taught by Senior (20170011738) above, because it would advantageously provide sharing of the intermediate representation (such as the CD state, because the combining of alternative hypotheses (para 0062), would lead to improved accuracy – para 0023-0025).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see the prior art listed on the PTO-892 form, and the detailed comments below:
Catanzaro et al (20170148431) teaches neural networks with a Connectionist Temporal Classification (CTC) loss function to predict speech transcription – para 0037
 	Chua (20170358293) teaches CTC for use in predicting speech pronunciations (para 0106)
	Battenberg et al (20180247643) teaches a CTC loss function – para 0035.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                            
06/01/2021