Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 2-12, 14-20 and 23-26 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 2-12, 14-20 and 23-26 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  
In the present application, the term “acoustic features” refers to representations of frames of a voice signal determined using operations such as a Fourier transform (paragraph [0071]). Examples of acoustic features comprise linear prediction coefficients, cepstral coefficients, etc. The term “acoustic feature set” refers to a set of acoustic features that have a correspondence with phonemes (paragraph [0072]). An acoustic feature set with known correspondences to phonemes may be used to train a classifier to determine the conditional probability P(O|W).
Claim 1, as amended, recites an acoustic feature set that comprises acoustic features corresponding to phonemes, where “the acoustic features are calculated by calculating a probability that an acoustic feature of each of a plurality of frames is generated by a corresponding phoneme and multiplying probabilities for the plurality of frames to obtain a conditional probability”. However, as noted above, the specification discloses acoustic features are calculated using operations such as a Fourier transform. Additionally, the specification discloses the acoustic feature set may be used to train a classifier (such as a GMM) to calculate a probability that an acoustic feature of each of a plurality of frames is generated by a corresponding phoneme and multiplying probabilities for the plurality of frames to obtain a conditional probability (paragraph [0072]).  However, the described calculated probability is not an acoustic feature.
Therefore, the specification as filed merely describes the conventional technique of calculating acoustic features by performing operations such as a Fourier transform on frames of a voice signal. There is no support, however, to calculate the acoustic features “by calculating a probability that an acoustic feature of each of a plurality of frames is generated by a corresponding phoneme and multiplying probabilities for the plurality of frames to obtain a conditional probability”, as recited in 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3-9, 11-12, 14-20, and 23-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kurian (U.S. Patent No. 10,474,800), in view of Scheffer et al. (U.S. Patent No. 9,336,781, hereinafter “Scheffer”).
In regard to claim 1, Kurian discloses a voice recognition method based on identity recognition (Figs. 2A-2F), the method comprising: 
judging an environmental state (step 201, a vehicle computing platform receives driver sensor data indicating the presence of the driver, column 11, lines 14-29); 
activating an identity recognition function in response to the environmental state meeting a predetermined condition (step 202, in response, images of the driver are captured, column 11, lines 30-38); 
receiving a result of recognizing an identity (step 203, the images are analyzed using facial recognition to identify the driver, column 11, lines 38-44).
Kurian further discloses recognizing a voice to determine the identity of the driver (column 11, lines 45-53). However, Kurian does not disclose determining an acoustic feature set corresponding to 
Scheffer discloses a method for recognizing a voice based on the result of recognizing an identity (Fig. 1, input speaker identifier 132, column 3, lines 3-16), comprising:
determining an acoustic feature set corresponding to the recognized identity based on the result (Fig. 2, a speaker specific acoustic model acoustic model is selected, column 5, lines 17-31), wherein the acoustic feature set comprises acoustic features corresponding to phonemes (the acoustic model 116 associates phonetic content with the current speech sample, column 5, lines 32-44), the acoustic features comprise at least one of a linear prediction coefficient, a cepstral coefficient, an Mel frequency cepstral coefficient, or a perceptual linear prediction coefficient (MFCC, etc., column 4, lines 18-27), and the acoustic features are calculated by calculating a probability that an acoustic feature of each of a plurality of frames is generated by a corresponding phoneme and multiplying probabilities for the plurality of frames to obtain a conditional probability (each speech segment is associated with corresponding phones, column 7, lines 20-38; and combined to indicate the conditional probability that the phonemes match the voice segments, column 9, line 59 to column 10, line 14); and 
recognizing a voice corresponding to the recognized identity from received sound based on the determined acoustic feature set (the current phonetic speaker model is compared to a stored phonetic speaker model for the asserted speaker identity to determine if there is a match, column 10, lines 15-26).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to determine an acoustic feature set corresponding to the recognized identity based on the result and recognize a voice corresponding to the recognized identity from received sound based on the determined acoustic feature set, because this allows the system to compare portions of 

In regard to claim 3, Kurian discloses the step of judging an environmental state comprises: 
receiving sensor data from at least one environmental sensor (driver sensor data, column 11, lines 14-29); and 
determining whether the environmental state meets the predetermined condition (the driver sensor data is used to determine whether a driver seat in the vehicle is occupied, column 11, lines 14-29).

In regard to claim 4, Kurian discloses the step of receiving a result of recognizing an identity comprises: 
receiving the result from an external identification apparatus (see Fig. 1B, vehicle computing platform 110 includes driver identification module 113 which may be implemented as a separate device, column 9, lines 49-58 and column 10, lines 40-65), wherein the external identification apparatus recognizes the identity with at least one of the following techniques: 
face recognition, iris recognition, fingerprint recognition, password recognition, and login information recognition (facial recognition, column 11, lines 30-44).

In regard to claim 5, Kurian does not expressly disclose creating an identity and establishing a corresponding acoustic feature set for the identity.
Scheffer discloses creating an identity and establishing a corresponding acoustic feature set for the identity (acoustic feature vectors that have been previously computed for a number of different speaker specific speech samples, column 5, lines 27-31).


In regard to claim 6, Kurian does not expressly disclose creating an identity and establishing a corresponding acoustic feature set for the identity.
Scheffer discloses the step of establishing a corresponding acoustic feature set for the identity comprises: 
receiving voice corresponding to the identity (an identified speaker records voice samples, column 8, lines 41-44); 
extracting acoustic features corresponding to the identity from the received voice (acoustic features are computed for the voice samples, column 5, lines 27-31); and 
establishing an acoustic feature set corresponding to the identity according to the extracted acoustic features (the extracted acoustic features are stored as speaker specific training data, column 5, lines 27-31).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to create an identity and establish a corresponding acoustic feature set for the identity, because this would allow new speakers to enroll with the system and allow them to be identified, as suggested by Scheffer (column 8, lines 38-53).

In regard to claim 7, Kurian does not expressly disclose creating an identity and establishing a corresponding acoustic feature set for the identity.

receiving voice corresponding to the identity (an identified speaker records voice samples, column 8, lines 41-44); 
transmitting the received voice to a server (see Fig. 6, all processing may be performed by a server computing device 660 in communication over network 650, column 11, line 48 to column 12, line 10); and 
receiving an acoustic feature set corresponding to the identity from the server (the extracted acoustic features are stored as speaker specific training data, column 5, lines 27-31).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to create an identity and establish a corresponding acoustic feature set for the identity, because this would allow new speakers to enroll with the system and allow them to be identified, as suggested by Scheffer (column 8, lines 38-53).

In regard to claim 8, while Kurian discloses recognizing a voice corresponding to the recognized identity, Kurian is silent as to the details of the voice recognition process.
Scheffer discloses the step of recognizing voice corresponding to the recognized identity from received sound further comprises: 
extracting acoustic features from the received sound (a set of acoustic features representing the sound signal, column 3, lines 31-49 and column 8, lines 4-10); 
matching the extracted acoustic features with the acquired acoustic feature set corresponding to the recognized identity (the current speaker model is compared to the stored speaker model that corresponds to the asserted identity, column 8, lines 31-62); and 

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to perform the voice recognition utilizing the techniques disclosed by Scheffer, because this allows the system to compare portions of the audio signal that correspond to similar phonetic units, and thereby more accurately detect mismatches, as taught by Scheffer (column 4, line 51 to column 5, line 2).

In regard to claim 9, while Kurian discloses recognizing a voice corresponding to the recognized identity, Kurian is silent as to the details of the voice recognition process.
Scheffer discloses the step of matching the extracted acoustic features with the determined acoustic feature set corresponding to the recognized identity comprises:
calculating a maximum likelihood probability of the extracted acoustic features based on the determined acoustic feature set corresponding to the recognized identity (HMM 212 and DNN 214 calculate the posterior probability distribution 222 of the acoustic features with respect to the phonemes, column 5, line 66 to column 6, line 29); and
determining that the extracted acoustic features match the acoustic feature set corresponding to the recognized identity when the calculated probability is greater than a first threshold, wherein the first threshold is a probability threshold indicating whether the recognized identity is the identity corresponding to the extracted acoustic features (the posteriors provided by the DNN are used to determine a biometric score, which determines whether there is a match to the stored speaker model 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to perform the voice recognition utilizing the techniques disclosed by Scheffer, because this allows the system to compare portions of the audio signal that correspond to similar phonetic units, and thereby more accurately detect mismatches, as taught by Scheffer (column 4, line 51 to column 5, line 2).

In regard to claim 11, Kurian discloses recognizing an operation to be performed from the voice corresponding to the recognized identity (if an activation phrase is received, further actions are performed, column 7, lines 37-58).

In regard to claim 12, Kurian discloses a voice recognition apparatus based on identity recognition (Figs. 1A and 1B), the voice recognition apparatus comprising:
a processor (computing devices comprising processors, column 9, lines 27-48);
a memory having instructions stored thereon (memories, column 9, lines 27-48), which when executed by the processor, cause the processor to:
judge an environmental state (step 201, a vehicle computing platform receives driver sensor data indicating the presence of the driver, column 11, lines 14-29); 
activate an identity recognition function in response to the environmental state meeting a predetermined condition (step 202, in response, images of the driver are captured, column 11, lines 30-38); 
receive a result of recognizing an identity (step 203, the images are analyzed using facial recognition to identify the driver, column 11, lines 38-44).
Kurian further discloses recognizing a voice to determine the identity of the driver (column 11, lines 45-53). However, Kurian does not disclose determining an acoustic feature set corresponding to 
Scheffer discloses a system for recognizing a voice based on the result of recognizing an identity (Fig. 1, input speaker identifier 132, column 3, lines 3-16), comprising means to:
determine an acoustic feature set corresponding to the recognized identity based on the result (Fig. 2, a speaker specific acoustic model acoustic model is selected, column 5, lines 17-31), wherein the acoustic feature set comprises acoustic features corresponding to phonemes (the acoustic model 116 associates phonetic content with the current speech sample, column 5, lines 32-44), the acoustic features comprise at least one of a linear prediction coefficient, a cepstral coefficient, an Mel frequency cepstral coefficient, or a perceptual linear prediction coefficient (MFCC, etc., column 4, lines 18-27), and the acoustic features are calculated by calculating a probability that an acoustic feature of each of a plurality of frames is generated by a corresponding phoneme and multiplying probabilities for the plurality of frames to obtain a conditional probability (each speech segment is associated with corresponding phones, column 7, lines 20-38; and combined to indicate the conditional probability that the phonemes match the voice segments, column 9, line 59 to column 10, line 14); and 
recognize a voice corresponding to the recognized identity from received sound based on the determined acoustic feature set (the current phonetic speaker model is compared to a stored phonetic speaker model for the asserted speaker identity to determine if there is a match, column 10, lines 15-26).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to determine an acoustic feature set corresponding to the recognized identity based on the result and recognize a voice corresponding to the recognized identity from received sound based on the determined acoustic feature set, because this allows the system to compare portions of 

In regard to claim 14, Kurian discloses the instructions, which when executed by the processor, further cause the processor to:
receive sensor data from at least one environmental sensor (driver sensor data, column 11, lines 14-29); and 
determine whether the environmental state meets the predetermined condition (the driver sensor data is used to determine whether a driver seat in the vehicle is occupied, column 11, lines 14-29).

In regard to claim 15, Kurian discloses the instructions, which when executed by the processor, further cause the processor to:
receive the result from an external identification apparatus (see Fig. 1B, vehicle computing platform 110 includes driver identification module 113 which may be implemented as a separate device, column 9, lines 49-58 and column 10, lines 4065), wherein the external identification apparatus recognizes the identity with at least one of the following technologies: 
face recognition, iris recognition, fingerprint recognition, password recognition, and login information recognition (facial recognition, column 11, lines 30-44).

In regard to claim 16, Kurian does not expressly disclose creating an identity and establishing a corresponding acoustic feature set for the identity.

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to create an identity and establish a corresponding acoustic feature set for the identity, because this would allow new speakers to enroll with the system and allow them to be identified, as suggested by Scheffer (column 8, lines 38-53).

In regard to claim 17, Kurian does not expressly disclose creating an identity and establishing a corresponding acoustic feature set for the identity.
Scheffer the instructions, which when executed by the processor, further cause the processor to:
receive voice corresponding to the identity (an identified speaker records voice samples, column 8, lines 41-44); 
extract acoustic features corresponding to the identity from the received voice (acoustic features are computed for the voice samples, column 5, lines 27-31); and 
establish an acoustic feature set corresponding to the identity according to the extracted acoustic features (the extracted acoustic features are stored as speaker specific training data, column 5, lines 27-31).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to create an identity and establish a corresponding acoustic feature set for the identity, because this would allow new speakers to enroll with the system and allow them to be identified, as suggested by Scheffer (column 8, lines 38-53).


Scheffer discloses the instructions, which when executed by the processor, further cause the processor to: 
receive voice corresponding to the identity (an identified speaker records voice samples, column 8, lines 41-44); 
transmit the received voice to a server (see Fig. 6, all processing may be performed by a server computing device 660 in communication over network 650, column 11, line 48 to column 12, line 10); and 
receive an acoustic feature set corresponding to the identity from the server (the extracted acoustic features are stored as speaker specific training data, column 5, lines 27-31).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to create an identity and establish a corresponding acoustic feature set for the identity, because this would allow new speakers to enroll with the system and allow them to be identified, as suggested by Scheffer (column 8, lines 38-53).

In regard to claim 19, while Kurian discloses recognizing a voice corresponding to the recognized identity, Kurian is silent as to the details of the voice recognition process.
Scheffer discloses the instructions, which when executed by the processor, further cause the processor to: 
extract acoustic features from the received sound (a set of acoustic features representing the sound signal, column 3, lines 31-49 and column 8, lines 4-10); 

recognize the received sound as the voice corresponding to the recognized identity if the extracted acoustic features match the determined acoustic feature set (based on the comparison, the system determines if there is a match between the current speaker model and the stored speaker model, column 9, lines 12-27).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to perform the voice recognition utilizing the techniques disclosed by Scheffer, because this allows the system to compare portions of the audio signal that correspond to similar phonetic units, and thereby more accurately detect mismatches, as taught by Scheffer (column 4, line 51 to column 5, line 2).

In regard to claim 20, while Kurian discloses recognizing a voice corresponding to the recognized identity, Kurian is silent as to the details of the voice recognition process.
Scheffer discloses the instructions, when executed by the processor, further cause the processor to:
calculate a maximum likelihood probability of the extracted acoustic features based on the determined acoustic feature set corresponding to the recognized identity (HMM 212 and DNN 214 calculate the posterior probability distribution 222 of the acoustic features with respect to the phonemes, column 5, line 66 to column 6, line 29); and
determine that the extracted acoustic features match the acoustic feature set corresponding to the recognized identity when the calculated probability is greater than a first threshold, wherein the first threshold is a probability threshold indicating whether the recognized identity is the identity corresponding to the extracted acoustic features (the posteriors provided by the DNN are used to determine a biometric score, which determines whether there is a match to the stored speaker model (above a threshold) or not (below a threshold), column 7, lines 1-19 and column 8, line 54 to column 9, line 11).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to perform the voice recognition utilizing the techniques disclosed by Scheffer, because this allows the system to compare portions of the audio signal that correspond to similar phonetic units, and thereby more accurately detect mismatches, as taught by Scheffer (column 4, line 51 to column 5, line 2).

In regard to claim 23, Kurian does not expressly disclose creating an identity and establishing a corresponding acoustic feature set for the identity.
Scheffer discloses storing data required for processing voice corresponding to the created identity in a database associated with the created identity (speaker models are stored in a memory or data storage associated with a speaker identity (column 8, lines 31-53).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to store data required for processing voice corresponding to the created identity in a database associated with the created identity, because this would allow new speakers to enroll with the system and allow them to be identified, as suggested by Scheffer (column 8, lines 38-53).


an identification apparatus configured to identify the identity (see Fig. 1B, vehicle computing platform 110 includes driver identification module 113 which may be implemented as a separate device, column 9, lines 49-58 and column 10, lines 4065); and
at least one environmental sensor configured to detect environmental data (driver sensor data, column 11, lines 14-29).

In regard to claim 25, Kurian discloses determining whether the environmental state meets a predetermined condition according to the sensor data comprises:
determining whether the sensor data reaches a predetermined threshold (environmental sensors include seat weight sensors, which inherently compare the detected weight to a threshold, to determine whether the driver seat is occupied, column 11, lines 14-29); and
determining that the environmental state meets a predetermined condition in response to determining that the sensor data reaches the predetermined threshold (when the weight threshold is exceeded, the driver is considered to be occupying the driver seat, column 11, lines 14-29).

In regard to claim 26, Kurian discloses the at least one environmental sensor comprises at least one of a vehicle door sensor, a driving position sensor, or a driving state sensor (state of the vehicle, column 13, lines 52-63).


Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kurian, in view of Scheffer, and further in view of Strope et al. (U.S. Patent No. 8,185,392, hereinafter “Strope”).

Strope discloses updating an acoustic feature set when the calculated probability is greater than the first threshold and less than a second threshold (audio data that has a likelihood above a first threshold and below a second threshold is selected to update an acoustic model, column 3, lines 37-59).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to update the acoustic feature set when the calculated probability is greater than the first threshold and less than a second threshold, because the audio data with a likelihood above a first threshold is likely to be correct, while audio data with a likelihood above the second threshold would not need further refinement, as taught by Strope (column 3, lines 37-59).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616.  The examiner can normally be reached on W: 12-4, Th: 9-3, F: 9-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






BLA 7/29/21
/BRIAN L ALBERTALLI/Primary Examiner, Art Unit 2656