Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings submitted on 05/17/2021 is being considered by the examiner.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 21, and 32 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 5 and 13 of U.S. Patent No. 11043218. Although the claims at issue are not identical, they are not patentably distinct from each other because pending claims and patented claims have common subject matter and further pending claims are broader than the patented claims and thus anticipated patented claims.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 21-42 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Wang et al.(US 20200286465 A1).

Regarding Claims 21, and 32,  Wang et al. teach: A computer-implemented method, comprising: determining a feature vector representing at least one frame of audio data ([0033] S202. Obtain first speech segments based on a to-be-recognized speech signal. [0039] If the to-be-recognized speech signal includes N unit frames, the N unit frames are respectively the first unit frame, the second unit frame, the third unit frame, . . . , and the N.sup.th unit frame from front to back according to an appearing sequence of the N unit frames in the to-be-recognized speech signal. [0043] An acoustic characteristic of a first speech segment may include acoustic characteristics of unit frames included in the first speech segment. [0044] An acoustic characteristic of a unit frame is obtained by performing acoustic characteristic extraction on the unit frame. Specifically, a waveform corresponding to a unit frame is converted into a multi-dimensional vector. The multi-dimensional vector may be used for indicating content information included in the unit frame, and may be an acoustic characteristic of the unit frame.); determining, using a first model and the feature vector, first output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a word ([0029] Specifically, the user terminal 110 obtains a to-be-recognized speech signal, and then transmits the to-be-recognized speech signal to the server 120 by using the network. The server 120 obtains first speech segments based on the to-be-recognized speech signal, and then obtains first probabilities respectively corresponding to the first speech segments by using a preset first classification model, where the first probabilities include probabilities that the first speech segments respectively correspond to pre-determined word segmentation units of a pre-determined keyword; then, obtains second speech segments based on the to-be-recognized speech signal, and respectively generates first prediction characteristics of the second speech segments based on first probabilities corresponding to first speech segments that correspond to each second speech segment; then, performs classification based on the first prediction characteristics by using a preset second classification model, to obtain second probabilities respectively corresponding to the second speech segments, where the second probabilities include at least one of probabilities that the second speech segments correspond to the pre-determined keyword and probabilities that the second speech segments do not correspond to the pre-determined keyword; and then, determines, based on the second probabilities, whether the pre-determined keyword exists in the to-be-recognized speech signal. [0041] S204. Obtain first probabilities respectively corresponding to the first speech segments by using a preset first classification model. [0042] The first probabilities corresponding to the first speech segments may include probabilities that the first speech segments respectively correspond to the pre-determined word segmentation units of the pre-determined keyword. The first probabilities may be posterior probabilities. ); and determining, using a second model and the feature vector, second output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a non-speech (non-keyword or other information) acoustic event ([0029] then, performs classification based on the first prediction characteristics by using a preset second classification model, to obtain second probabilities respectively corresponding to the second speech segments, where the second probabilities include at least one of probabilities that the second speech segments correspond to the pre-determined keyword and probabilities that the second speech segments do not correspond to the pre-determined keyword.  [0047] The first padding information refers to other information other than the pre-determined word segmentation units. For example, for a case in which the pre-determined word segmentation units are respectively “er” and “duo”, all other information other than “er” and “duo” is the first padding information. [0062] In another embodiment, the second probabilities may include only the probabilities that the second speech segments do not correspond to (that is, being in non-correspondence to) the pre-determined keyword. Using an example in which the pre-determined keyword is “er duo”, a second probability corresponding to a second speech segment may include only a probability that the second speech segment corresponds to other information other than “er duo”. [0070] If a probability that the last second speech segment corresponds to the pre-determined keyword is still less than the pre-determined probability threshold, it is determined that the pre-determined keyword does not exist in the to-be-recognized speech signal, a recognition result representing that the pre-determined keyword does not exist in the to-be-recognized speech signal is outputted, and the recognition process ends.).

Regarding Claims 22, and 33,  Wang et al. teach: The computer-implemented method of claim 21, further comprising: processing the first output data using a normalization component to determine first probability data (See rejection of claim 21 and [0050] Then, classification processing is performed on the s dimension-reduced feature maps by using the fully connected layer, and outputs of the fully connected layer are fed into the softmax layer. After that, normalization processing is performed on the outputs of the fully connected layer by using the softmax layer, to obtain the first probabilities corresponding to the first speech segments.) .

Regarding Claims 23, and 34,  Wang et al. teach: The computer-implemented method of claim 21, further comprising: processing the second output data using at least one activation function component to determine the second output data (See rejection of claim 21 and [0070] When a probability that a second speech segment corresponds to the pre-determined keyword is greater than the pre-determined probability threshold, it is determined that the pre-determined keyword exists in the second speech segment, a recognition result representing that the pre-determined keyword exists in the to-be-recognized speech signal is outputted, and the recognition process ends. If a probability that the last second speech segment corresponds to the pre-determined keyword is still less than the pre-determined probability threshold, it is determined that the pre-determined keyword does not exist in the to-be-recognized speech signal, a recognition result representing that the pre-determined keyword does not exist in the to-be-recognized speech signal is outputted, and the recognition process ends. ).

Regarding Claims 24, and 35,  Wang et al. teach: The computer-implemented method of claim 23, further comprising: processing the second output data using a classifier to detect an occurrence of the non- speech acoustic event (See rejection of claim 23.).

Regarding Claims 25, and 36,  Wang et al. teach: The computer-implemented method of claim 21, wherein the non-speech acoustic event comprises a non-speech sound made by a human (See rejection of claim 21 and [0062] In another embodiment, the second probabilities may include only the probabilities that the second speech segments do not correspond to (that is, being in non-correspondence to) the pre-determined keyword. Using an example in which the pre-determined keyword is “er duo”, a second probability corresponding to a second speech segment may include only a probability that the second speech segment corresponds to other information other than “er duo”.).

Regarding Claims 26, and 37,  Wang et al. teach: The computer-implemented method of claim 21, wherein the first output data corresponds to a likelihood that the at least one frame includes a representation of at least part of a first wakeword (See rejection of claim 21 and [0046] Correspondingly, for any first speech segment, a first probability that corresponds to the first speech segment and that is outputted by the first classification model may include a probability that the first speech segment corresponds to “er”, and a probability that the first speech segment corresponds to “duo”. ).

Regarding Claims 27, and 38,  Wang et al. teach: The computer-implemented method of claim 26, further comprising: determining, using the feature vector, third output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a second wakeword (See rejection of claim 21 and [0070] If a probability that the 1st second speech segment (a second speech segment to which the foremost unit frame appearing in the to-be-recognized speech signal corresponds) corresponds to the pre-determined keyword is greater than the pre-determined probability threshold, it is determined that the pre-determined keyword exists in the 1st second speech segment, a recognition result representing that the pre-determined keyword exists in the to-be-recognized speech signal is outputted, and a recognition process ends. Conversely, if the probability that the 1st second speech segment corresponds to the pre-determined keyword is less than the pre-determined probability threshold, it is determined that the pre-determined keyword does not exist in the 1st second speech segment. A probability that the 2nd second speech segment corresponds to the pre-determined keyword continues to be compared with the pre-determined probability threshold. The rest is deduced by analogy. When a probability that a second speech segment corresponds to the pre-determined keyword is greater than the pre-determined probability threshold, it is determined that the pre-determined keyword exists in the second speech segment, a recognition result representing that the pre-determined keyword exists in the to-be-recognized speech signal is outputted, and the recognition process ends. If a probability that the last second speech segment corresponds to the pre-determined keyword is still less than the pre-determined probability threshold, it is determined that the pre-determined keyword does not exist in the to-be-recognized speech signal, a recognition result representing that the pre-determined keyword does not exist in the to-be-recognized speech signal is outputted, and the recognition process ends.).

Regarding Claims 28, and 39,  Wang et al. teach: The computer-implemented method of claim 21, further comprising: receiving the at least one frame of audio data; and processing the at least one frame of audio data using a feature-extraction model to determine the feature vector, the feature-extraction model configured to determine feature output data operable by both the first model and the second model, wherein determining the first output data comprises processing the feature vector using the first model, and wherein determining the second output data comprises processing the feature vector using the second model (See rejection of claim 21 and [0075] However, in the embodiments of the present disclosure, the pre-determined keyword is recognized layer by layer (level by level) by using the first classification model and the second classification model (e.g., first at word segmentation units level such as single pinyin or phenome, and then at speech segment level such as combined pinyin, word or phrase). The first probabilities that are in a one-to-one correspondence with the first speech segments are first obtained, and then the second probabilities that are in a one-to-one correspondence with the second speech segments are obtained based on first probabilities corresponding to first speech segments that correspond to each second speech segment.).

Regarding Claims 29, and 40,  Wang et al. teach:  The computer-implemented method of claim 21, wherein the feature vector represents acoustic feature data and the method further comprises: processing the feature vector using a feature-extraction model to determine a second feature vector, the feature-extraction model configured to determine feature output data operable by both the first model and the second model, wherein determining the first output data comprises processing the second feature vector using the first model, and wherein determining the second output data comprises processing the second feature vector using the second model (See rejection of claim 21 and [0070] If a probability that the 1st second speech segment (a second speech segment to which the foremost unit frame appearing in the to-be-recognized speech signal corresponds) corresponds to the pre-determined keyword is greater than the pre-determined probability threshold, it is determined that the pre-determined keyword exists in the 1st second speech segment, a recognition result representing that the pre-determined keyword exists in the to-be-recognized speech signal is outputted, and a recognition process ends. Conversely, if the probability that the 1st second speech segment corresponds to the pre-determined keyword is less than the pre-determined probability threshold, it is determined that the pre-determined keyword does not exist in the 1st second speech segment. A probability that the 2nd second speech segment corresponds to the pre-determined keyword continues to be compared with the pre-determined probability threshold. The rest is deduced by analogy. When a probability that a second speech segment corresponds to the pre-determined keyword is greater than the pre-determined probability threshold, it is determined that the pre-determined keyword exists in the second speech segment, a recognition result representing that the pre-determined keyword exists in the to-be-recognized speech signal is outputted, and the recognition process ends. If a probability that the last second speech segment corresponds to the pre-determined keyword is still less than the pre-determined probability threshold, it is determined that the pre-determined keyword does not exist in the to-be-recognized speech signal, a recognition result representing that the pre-determined keyword does not exist in the to-be-recognized speech signal is outputted, and the recognition process ends. [0075] However, in the embodiments of the present disclosure, the pre-determined keyword is recognized layer by layer (level by level) by using the first classification model and the second classification model (e.g., first at word segmentation units level such as single pinyin or phenome, and then at speech segment level such as combined pinyin, word or phrase). The first probabilities that are in a one-to-one correspondence with the first speech segments are first obtained, and then the second probabilities that are in a one-to-one correspondence with the second speech segments are obtained based on first probabilities corresponding to first speech segments that correspond to each second speech segment.).

Regarding Claims 30, and 41,  Wang et al. teach: The computer-implemented method of claim 21, wherein: determining the first output data comprises: processing the feature vector using a feature extraction component to determine first feature data, and processing the first feature data using the first model to determine the first output data; and determining the second output data comprises: processing the feature vector using the feature extraction component to determine second feature data, and processing the second feature data using the second model to determine the second output data (See rejection of claim 21 and [0050] As shown in FIG. 3, convolution processing may be performed on the eigenvectors whose dimensions are t×f corresponding to the first speech segments and convolution kernels (that is, a filtering weight matrix) whose dimensions are s×v×w by using the convolutional layer, to obtain s feature maps…Then, max-pooling processing (that is, processing of selecting a maximum feature point in a neighborhood, that is, sampling processing) is respectively performed on the s feature maps by using the max-pooling layer, to reduce a magnitude of a time frequency dimension, and obtain s dimension-reduced feature maps. Then, classification processing is performed on the s dimension-reduced feature maps by using the fully connected layer, and outputs of the fully connected layer are fed into the softmax layer. After that, normalization processing is performed on the outputs of the fully connected layer by using the softmax layer, to obtain the first probabilities corresponding to the first speech segments.).

Regarding Claims 31, and 42,  Wang et al. teach: The computer-implemented method of claim 21, wherein: the feature vector represents acoustic feature data; the first model comprises a feature extraction component; determining the first output data comprises: processing the feature vector using the first model to determine a second feature vector, and using the second feature vector to determine the first output data; and determining the second output data comprises using the second feature vector and the second model to determine the second output data (See the rejection of claim 30 and [0029]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Zhang et al.(US 2020/0176014 A1) teach: The method can include: processing the speech signal to generate a plurality of speech frames; generating a first number of acoustic features based on the plurality of speech frames using a frame shift at a given frequency; and generating a second number of posteriori probability vectors based on the first number of acoustic features using an acoustic model, wherein each of the posteriori probability vectors comprises probabilities of the acoustic features corresponding to a plurality of modeling units, respectively.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656