Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1, 5,7, 10-11, 15,17,20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Li et al (20190362741) .

As per claim 1, Li et al (20190362741) teaches a method of identifying a user utterance from a digital audio signal (as performing voice recognition – para 0003, 0004), the digital audio 
acquiring, by the electronic device, a set of features for a respective segment of the digital audio signal, each set of features comprising at least acoustic-type features derivable from a respective segment of the digital audio signal, the segments of the digital audio signal being associated with respective time intervals of a pre-determined time length (as acquiring features in a fixed time length – para 0019, 0046, 0047); 
acquiring, by the electronic device, an indication of an end-of-utterance moment in time in the digital audio signal corresponding to a given moment in time after which the user utterance has ended (as finding the voice endpoints – para 0044-0047, para 0058);
 determining, by the electronic device, an adjusted end-of-utterance moment in time by adding a pre-determined temporal offset to the end-of-utterance moment in time (as adjusting the end of the utterance – para 0125-0128); 
determining, by the electronic device, labels for respective sets of features based on the adjusted end-of-utterance moment in time and the time intervals of the corresponding segments of the digital audio signal, a given label being indicative of whether the user utterance has ended during the corresponding segment of the digital audio signal associated with the respective set of features (as tracking and taking feature and annotation information – para 0133, during the various time segments – para 0125-0136);
 using, by the electronic device, the sets of features and the respective labels for training a Neural Network (NN) to predict during which segment of the digital audio signal the user utterance has ended (as using various neural networks – para 0135-0136, to performing the feature/training endpoint detection – para 0127-0137);

 acquiring, by the electronic device, at least some of an in-use digital audio signal; determining, by the electronic device, a first set of features for a first segment of the in-use digital audio signal, the first segment including a most recently acquired portion of the in-use digital audio signal, the in-use digital audio signal being recorded in real-time during an in-use utterance (as acquiring in the audio signal, deriving features – para 0046, using acoustic features at the start point); the first set of features comprising at least acoustic-type features derivable from the first segment of the digital audio signal (as deriving features from the beginning of the segment – para 0046);
 using, by the electronic device, the NN to determine based on the first set of features a first value indicative of a probability of the in-use user utterance having ended during the first segment of the in-use digital audio signal (as using the neural network to determine a probability value for the segments – para 0063- 0065);
 responsive to the first value being above the pre-determined threshold, determining, by the electronic device, that the in-use user utterance ended during the first segment of the in-use digital audio signal (as using the estimate section for the voice finish point – para 0055, 0056 reflecting back on para 0047)  
responsive to the first value being below the pre-determined threshold: determining, by the electronic device, a second set of features for a second segment of the in-use digital audio signal, the second segment being sequential to the first segment in the in-use digital audio signal, the second set of features comprising at least acoustic-type features derivable from the second 
 using, by the electronic device, the NN to determine based on the first set of features and the second set of features a second value indicative of a probability of the in-use user utterance having ended during the second segment of the in-use digital audio signal (as deriving features from the beginning of the segment – para 0046); and responsive to the second value being above the pre-determined threshold, determining, by the electronic device, that the in-use user utterance ended during the second segment of the in-use digital audio signal (as using the estimate section for the voice finish point – para 0055, 0056 reflecting back on para 0047).


As per claim 5, Li et al (20190362741) teaches the method of claim 1, wherein the using the sets of features and the respective labels for training the NN comprises: organizing the sets of features and the respective labels in a same order as the order in which the corresponding segments occur in the digital audio signal (as ordering/organizing the sets according to time and annotations – para 0097, in view of para 0126-0137). 


As per claim 7, Li et al (20190362741) teaches the method of claim 6, wherein the method further comprises: generating, by the electronic device, a trigger for stopping recording the in-use digital audio signal (as using the finish point to stop the next audio frame from being inputted – para 0055). 


As per claim 10, Li et al (20190362741) teaches the method of claim 1, wherein the electronic device comprises one of: a user electronic device; and a server coupled to the user electronic device by a communication network (as server/device – para 0177). 

	Claims 11,15,17,20 are device claims that perform the method steps of claims 1, 5,7, 10 and as such, claims 11,15,17,19,20 are similar in scope and content to claims 1,5,7,10 above and therefore, claims 11,15,17, 20 are rejected under similar rationale as presented against claims 1,5,7,10 above.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-4,8,12-14,18 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al (20190362741) in view of Hoffmeister(20160379632).

As per claims 2,3, Li et al (20190362741) teaches the method of claim 1, but does not explicitly teach linguistic/language type recognition/parameters, however, Hoffmeister (20160379632) teaches endpoint voice recognition (para 0051) using natural language/linguistic Li et al (20190362741) with linguistic/language model speech recognition, as taught by Hoffmeister (20160379632) because it would advantageously improve upon distinguishing mid-pauses by a user’s voice so as to increase the accuracy of the recognition system ( Hoffmeister(20160379632) , para 0061). 

As per claim 4, the combination of Li et al (20190362741) in view of Hoffmeister (20160379632) teaches the method of claim 1, wherein the indication of the end-of-utterance moment in time is determined by at least one of a human-assessor and an Automatic Speech-to-text Alignment (ASA) algorithm (as aligning the sounds with known word sequence -- Hoffmeister(20160379632), para 0023).
 
As per claim 8, the combination of Li et al (20190362741) in view of Hoffmeister (20160379632) teaches the method of claim 6, wherein the method further comprises: providing, by the electronic device,
 an Automatic Speech Recognition (ASR) algorithm with at least some of the in-use digital audio signal for determining a textual representation of the in-use utterance;
Hoffmeister(20160379632) – para 0018).

	Claims 12-14, 18 are device claims that perform the method steps of claims 3,4,8 above and as such, claims 12-14,18 are similar in scope and content to claims 3,4,8 above; therefore, claims 12-14,18 are rejected under similar rationale as presented against claims 3,4,8 above.

Response to Arguments

Applicant's arguments filed 6/8/2021 have been fully considered but they are not persuasive.  Applicants remarks/arguments start on pp 10 of the response.  Applicants provides a review of the prior art references, from pp 10-pp13 middle; on pp 13 middle – p14 end, applicants argues that the claimed first and second features, from different segments, is not claimed.  Examiner disagrees and points to the cited sections of Li above, wherein segments are established and features are extracted -- 0019, 0046, 0047; over a series of segments, extracting features and annotations – para 0133, and 0125-0136.  

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  


The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Bak et al (US-20150205779-A1) teaches voice recognition alignment (para 0019)
Kahn (US-20050222843-A1) teaches voice recognition with text alignment (para 0016)
Reiner et al (US-20130262096-A1) teaches text alignment (para 0026) in voice recognition (155)
Kahn et al (US-20020152076-A1) ) teaches voice recognition with text alignment (para 0016)
Brandow et al (US-6064957-A) In accordance with a second embodiment of this invention, there is provided a system for improving speech recognition. The system comprises a text aligner for aligning text data generated from a speech recognition system with a corresponding true transcription of the speech recognition text data. A rule generator coupled to the text aligner, generates a plurality of correction rules from differences in alignment between the speech recognition text data and the corresponding true transcription of text data. A rule administrator 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        06/14/2021