DETAILED ACTION
Claims 1, 6-14 and 19-25 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
With regard to the Advisory Action from 21 January 2022, the Applicant has filed a response on 28 January 2022.
Claims 2-5 and 15-18 have been cancelled.
Claim objections were raised particularly with regard to claims 1, 14, 20, 21 and 25 for minor informalities. The issue raised has been resolved and the Examiner hereby withdraws the objection.
Response to Arguments
Applicant’s arguments (Remarks: page 14 par 2 – page 15 par 2), filed 28 January 2022, with respect to the rejection of the independent claims 1, 14, 20, 21 and 24 under 35 U.S.C. 103 have been fully considered and are persuasive enough regarding the claims as currently presented. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground of rejection is made in view of the newly added limitation indicating ‘comparing a value estimated in the model from the inputting of the normalized target speech signal with the class information.’ The claims will be addressed by the currently presented claim set in the following section.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 10, 11, 12, 13, 14, 20 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Xue (US 2018/0005628 A1) in view of SUN et al (US 2017/0092276 A1: hereafter – Sun) further in view of Hong (US 2003/0233233 A1) and further in view of Hoffmeister et al (US 2014/0163977 A1: hereafter – Hoffmeister).
For claim 1, Xue discloses a recognition method performed in a user terminal (Xue: [0007] — a device for speech recognition), the recognition method comprising:
determining a characteristic parameter personalized to a speech of a user based on a reference speech signal input by the user (Xue: [0075] — feature training Deep Neural Network which involves obtaining voiceprint features (which are characteristics parameters personalised to a user); [0079], FIG. 2 Step 202 — obtaining voice training data; [0007] — “[a]n audio capturing device records audios, such as a few sentences narrated by a user, as training data”),
receiving, as an input, a target speech signal to be recognized from the user (Xue: FIG. 2 Step 214, [0085] — receiving voice data to be recognised); and
outputting a recognition result of the target speech signal (Xue: FIG. 2 — obtaining speech recognition results; [0065] — “[b]y compensating the data to be recognized using the training data that is relatively more accurate, more accurate results of speech recognition may be obtained”),
wherein the characteristic parameter personalized to the user includes [[normalization information to be used for normalizing the target speech signal,]] identification information indicating a speech characteristic of the user, [[and class information to be used for classifying in a model]] (Xue: [0075] — for speech recognition, feature vectors of the data to be recognised as well as the voiceprint feature vectors are used for decoding recognition),
wherein the recognition result of the target speech signal is determined through speech recognition personalized to speech characteristic of the user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to a model of general speech recognition for a plurality of users, and comparing a value estimated in the model from the inputting of the normalized target speech signal with the class information (Xue: [0075] — for speech recognition, feature vectors of the data to be recognised (the target speech signal) as well as the voiceprint feature vectors (identification information) are used for decoding recognition; [0087] — inputting voiceprint features to a model related to a speaker).
The reference of Xue fails to teach about updating the characteristic parameter personalized to a speech of a user. The reference of Sun is now introduced to teach this as:
wherein the determining includes: updating the characteristic parameter selectively using other reference speech signals, input by the user; and satisfying a predetermined condition associated with at least one of a length of and an intensity of a speech signal, of the other reference speech signals (Sun: [0156] — a preset condition for generating voiceprint data being that the registration voice data (user speech input) reaches a preset time period/duration; [0176] — obtaining duration information of input speech signal that is to be used for verification; Claim 8 — updating voiceprint based on input speech signal and reference speech signal).
The reference of Xue provides teaching for determining a characteristic parameter personalized to a speech of a user. It differs from the claimed invention in that the claimed invention further provides that he determining includes updating the characteristic parameter based on reference signals that satisfy a predetermined length condition. This isn’t new to the art as the reference of Sun goes to show updating a voiceprint based on received speech signals, in comparison with reference speech signals used for enrolment, wherein the enrolment speech signals meet a predetermined length condition. Hence, at the time the application was effectively file, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Sun into that of Xue, given the predictable result of tracking possible changes to the characteristic parameter of the user over time by updating the parameter, to ensure the user’s continued access.
The combination of Xue in view of Sun fails to disclose the further limitation of this claim, for which Hong is now introduced to teach as:
wherein the characteristic parameter personalized to the user includes normalization information to be used for normalizing the target speech signal, [[identification information indicating a speech characteristic of the user,]] and class information to be used for classifying in a model (Hong: [0045] — normalisation information for normalising input speech (recorded speech signal); [0017] — a classifier coupled to a model selector),
wherein the recognition result of the target speech signal is determined through speech recognition personalized to speech characteristic of the user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to a model of general speech recognition for a plurality of users, and comparing a value estimated in the model from the inputting of the normalized target speech signal with the class information (Hong: [0008] — performing speech recognition through selecting a model based on the determined classification of the received information of input speech; [0045] — performing normalisation of the input speech; [0046] — obtaining diagonal covariance matrices to obtain likelihood probabilities (as the value) in order to determine class information).
The combination of Xue in view of Sun provides teaching for obtaining characteristic parameters such as identification information indicating a speech characteristic of the user. It differs from claimed invention in that the claimed invention further provides teaching for obtaining of class information to be used for classifying in a model, wherein a recognition result is obtained from inputting a normalised target speech into a model, the model being determined based on class information obtained from a value estimated in the model. This isn’t new to the art as the reference of Hong is seen to provide. Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Hong into that of the combination, given the predictable result of ensuring an appropriate speech recognition by assigning speech recognition features to their most suitable recognition classes.
The combination of Xue in view of Sun further in view of Hong teaches of obtaining training characteristic parameters for a user to be used in obtaining a target speech recognition result. This combination fails to teach the determination of recognition result of a target speech signal through inputting the identification information to a model of general speech recognition for a plurality of users.
This is not new to the art as the reference of Hoffmeister teaches this:
(Hoffmeister: [0017] — speech recognition through a base model (a model for general speech recognition) as well as a model specific to a user; [0049], Fig. 3 Steps 308-314 — making use of a base model as well as a (user) specific model for recognition of the target speech).
Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Hoffmeister into that of the combination of Xue in view of Sun further in view of Hong, given the predictable result of providing a general solution to speech recognition without the added task of focusing the recognition resources on an individual user.
For claim 10, claim 1 is incorporated and the combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister discloses the recognition method of claim 1, wherein the reference speech signal is a speech signal input to the user terminal in response to the user using the user terminal before the target speech signal is input to the user terminal (Xue: [0007] — an audio capturing device that records audio to be used as training data (the training data being input before the target/test data)).
For claim 11, claim 1 is incorporated and the combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister discloses the recognition method, further comprising:
transmitting the target speech signal and the characteristic parameter to a server (Xue: [0007] — transmitting the training data to one or more server, or another device that receives the one or more audio for speech recognition from another capturing device (teaching of transmitting the target speech signal to a server)); and
receiving the recognition result of the target speech signal from the server, wherein the recognition result of the target speech signal is generated in the server (Xue: [0015] — speech recognition steps being performed at the server).
For claim 12, claim 1 is incorporated and the combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister discloses the recognition method, further comprising generating the recognition result of the target speech signal in the user terminal (Xue: [0015] — the device performing the speech recognition alone).
As for claim 13, computer program product claim 13 and method claim 1 are related as computer program product storing executable instructions required for performing the claimed method steps on a computer. Xue in [0094] provides such a computer-readable medium to read upon this claim. Accordingly, claim 13 is similarly rejected under the same rationale as applied above with respect to method claim 1.
For claim 14, Xue discloses a recognition method performed in a server that recognizes a target speech signal input to a user terminal (Xue: [0015] — speech recognition steps being performed at the server; [0007] — “[a]n audio capturing device records audios), the recognition method comprising:
receiving, from the user terminal, a characteristic parameter personalized to a speech of a user and determined based on a reference speech signal (Xue: [0075] — feature training Deep Neural Network which involves obtaining voiceprint features (which are characteristics parameters personalised to a user); [0079], FIG. 2 Step 202 — obtaining voice training data; [0007] — “[a]n audio capturing device records audios, such as a few sentences narrated by a user, as training data”; [0007] — transmitting the training data to one or more servers),
wherein the characteristic parameter personalized to the user includes [[normalization information to be used for normalizing the target speech signal,]] identification information indicating a speech characteristic of the user, [[and class information to be used for classifying in a model]] (Xue: [0075] — for speech recognition, feature vectors of the data to be recognised as well as the voiceprint feature vectors are used for decoding recognition);
receiving, from the user terminal, a target speech signal of the user to be recognized (Xue: FIG. 2 Step 214, [0085] — receiving voice data to be recognised; [0015] — speech recognition steps being performed at the server);
transmitting a recognition result of the target speech signal to the user terminal (Xue: [0007], [0015] — a distribution environment whereby some speech recognition tasks are performed at the server (indicating performing speech recognition at the server and transmitting the results to the user device); [0093] — the user device having output and network interfaces).
The reference of Xue fails to teach about updating the characteristic parameter personalized to a speech of a user. The reference of Sun is now introduced to teach this as:
wherein the determining includes updating the characteristic parameter selectively using other reference speech signals, input by the user, and satisfying a predetermined condition associated with at least one of a length of and an intensity of a speech signal, of the other reference speech signals (Sun: [0156] — a preset condition for generating voiceprint data being that the registration voice data (user speech input) reaches a preset time period/duration; [0176] — obtaining duration information of input speech signal that is to be used for verification; Claim 8 — updating voiceprint based on input speech signal and reference speech signal).
The same motivation as applied to claim 1 is applicable here still.
The combination of Xue in view of Sun fails to disclose the further limitation of this claim, for which Hong is now introduced to teach as:
wherein the characteristic parameter personalized to the user includes normalization information to be used for normalizing the target speech signal, [[identification information indicating a speech characteristic of the user,]] and class information to be used for classifying in a model (Hong: [0045] — normalisation information for normalising input speech (recorded speech signal); [0017] — a classifier coupled to a model selector);
performing speech recognition personalized to speech characteristic of the user on the target speech signal user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to a model of general speech recognition for a plurality of users, and comparing a value estimated in the model from the inputting of the normalized target speech signal with the class information (Hong: [0008] — performing speech recognition through selecting a model based on the determined classification of the received information of input speech; [0045] — performing normalisation of the input speech; [0046] — obtaining diagonal covariance matrices to obtain likelihood probabilities (as the value) in order to determine class information).
The same motivation as applied to claim 1 for incorporating the reference of Hong is applicable here still.
The combination of Xue in view of Sun further in view of Hong teaches of obtaining training characteristic parameters for a user to be used in obtaining a target speech recognition result. This combination fails to teach the determination of recognition result of a target speech signal through inputting the identification information to a model of general speech recognition for a plurality of users.
This is not new to the art as the reference of Hoffmeister teaches this:
(Hoffmeister: [0017] — speech recognition through a base model (a model for general speech recognition) as well as a model specific to a user; [0049], Fig. 3 Steps 308-314 — making use of a base model as well as a (user) specific model for recognition of the target speech).
The same motivation as applied to claim 1 for incorporating the reference of Hoffmeister is applicable here still.
As for claim 20, user terminal claim 20 and method claim 1 are related as apparatus and the method of using same, with each claimed element’s function corresponding to the claimed method step. Xue in [0093] provides a processor as well as computer memory able to store the required instructions. Accordingly, claim 20 is similarly rejected under the same rationale as applied above with respect to method claim 1.
For claim 24, Xue discloses a speech recognition method comprising:
determining, in a user terminal, a parameter based on a reference speech signal input by the individual user to the user terminal, [[wherein the determining includes updating the characteristic parameter selectively using other reference speech signals, input by the user, satisfying a predetermined condition associated with at least one of a length of and an intensity of a speech signal, of the other reference speech signals]] (Xue: [0075] — feature training Deep Neural Network which involves obtaining voiceprint features (which are characteristics parameters personalised to a user); [0079], FIG. 2 Step 202 — obtaining voice training data; [0007] — “[a]n audio capturing device records audios, such as a few sentences narrated by a user, as training data”), wherein the parameter personalized to the individual user includes [[normalization information to be used for normalizing the target speech signal,]] identification information indicating a speech characteristic of the individual user, [[and class information to be used for classifying in a model]] (Xue: [0075] — for speech recognition, feature vectors of the data to be recognised as well as the voiceprint feature vectors are used for decoding recognition);
transmitting, from the user terminal to a server, the parameter based on the reference speech signal and a target speech signal of the individual user to be recognized (Xue: [0007] — transmitting the training data to one or more server, or another device that receives the one or more audio for speech recognition from another capturing device (teaching of transmitting the target speech signal to a server); [0093] — the user device having output and network interfaces); and
receiving, in the user terminal from the server, a recognition result of the target speech signal (Xue: [0007], [0015] — a distribution environment whereby some speech recognition tasks are performed at the server (indicating performing speech recognition at the server and transmitting the results to the user device); [0093] — the user device having output and network interfaces),
wherein the recognition speech result of the target speech signal is determined in the server through speech recognition personalized to speech characteristics of the user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to a model of general speech recognition for a plurality of users, and comparing a value estimated in the model from the inputting of the normalized target speech signal with class information (Xue: [0083]-[0088], FIG. 2 Steps 214 [Wingdings font/0xE0] 216 [Wingdings font/0xE0] 218 [Wingdings font/0xE0] 212 — obtaining feature vectors of test speech data to be recognised and applying them to the speech model in order to obtain an adaptive speech model for the particular user (indicating a speech recognition personalised to the speech characteristic of the individual user) and then apply that model to recognise the speech; [0075] — for speech recognition, feature vectors of the data to be recognised (the target speech signal) as well as the voiceprint feature vectors (identification information) are used for decoding recognition; [0087] — inputting voiceprint features to a model related to a speaker).
The reference of Xue fails to teach about updating the characteristic parameter personalized to a speech of a user. The reference of Sun is now introduced to teach this as:
determining, in a user terminal, a parameter based on a reference speech signal input by the individual user to the user terminal, wherein the determining includes updating the characteristic parameter selectively using other reference speech signals, input by the user, satisfying a predetermined condition associated with at least one of a length of and an intensity of a speech signal, of the other reference speech signals, wherein the parameter personalized to the individual user includes normalization information to be used for normalizing the target speech signal, identification information indicating a speech characteristic of the individual user, and class information to be used for classifying in a model (Sun: [0156] — a preset condition for generating voiceprint data being that the registration voice data reaches (user speech input) reaches a preset time period/duration; [0176] — obtaining duration information of input speech signal that is to be used for verification; Claim 8 — updating voiceprint based on input speech signal and reference speech signal).
The same motivation as applied to claim 1 is applicable here still.
The combination of Xue in view of Sun fails to disclose the further limitation of this claim, for which Hong is now introduced to teach as:
wherein the parameter personalized to the individual user includes normalization information to be used for normalizing the target speech signal, [[identification information indicating a speech characteristic of the individual user,]] and class information to be used for classifying in a model (Hong: [0045] — normalisation information for normalising input speech (recorded speech signal); [0017] — a classifier coupled to a model selector),
wherein the recognition speech result of the target speech signal is determined in the server through speech recognition personalized to speech characteristics of the user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to a model of general speech recognition for a plurality of users, and comparing a value estimated in the model from the inputting of the normalized target speech signal with class information (Hong: [0008] — performing speech recognition through selecting a model based on the determined classification of the received information of input speech; [0045] — performing normalisation of the input speech; [0046] — obtaining diagonal covariance matrices to obtain likelihood probabilities (as the value) in order to determine class information).
The same motivation as applied to claim 1 for incorporating the reference of Hong is applicable here still.
The combination of Xue in view of Sun further in view of Hong fails to teach the further limitation of this claim, for which Hoffmeister is now introduced to teach as:
wherein the recognition speech result of the target speech signal is determined in the server through speech recognition personalized to speech characteristics of the user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to a model of general speech recognition for a plurality of users, [[and comparing a value estimated in the model from the inputting of the normalized target speech signal with class information]] (Hoffmeister: [0003] — speech recognition being performed at a server; [0017] — speech recognition through a base model as well as a model specific to a user; [0049], Fig. 3 Steps 308-314 — making use of a base model as well as a (user) specific model for recognition of the target speech).
The same motivation as applied to claim 1 is applicable here still.
Claims 6, 7, 8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Xue (US 2018/0005628 A1) in view of Sun (US 2017/0092276 A1) further in view of Hong (US 2003/0233233 A1) and further in view of Hoffmeister (US 2014/0163977 A1) as applied to claim 1, and further in view of Bellegarda (US 2014/0088964 A1).
For claim 6, claim 1 is incorporated but the combination of Xue in view of Sun further in view of Hong and further in view of Hoffmeister fail to teach the limitation of this claim, for which Bellegarda is now introduced to teach as the recognition method, wherein the determining of the characteristic parameter comprises determining different types of characteristic parameters based on environment information obtained when the reference speech signal is input to the user terminal (Bellegarda: [0073] — obtaining training samples caused by variations due to ambient noise levels, acoustic properties of the local environment, noise levels (these qualify as different characteristic parameters which are based on environment information at the user terminal); [0046] — having exemplars as subsets of training data that are specifically selected for an input signal in order to recognise speech using a focused acoustic model).
The combination of Xue in view of Sun further in view of Hong and further in view of Hoffmeister provides teaching for determining a characteristic parameter personalized to a speech of a user, but differs from the claimed invention in that the claimed invention further provides that determination of different types of characteristic parameters based on environment information. This however isn’t new to the art as the reference of Bellegarda is seen to teach above. Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Bellegarda into that of the combination, given the predictable result of performing recognition using speech models that are most closely related to characteristic parameters present in the speech signal being recognised.
For claim 7, claim 6 is incorporated and the combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister and further in view of Bellegarda discloses the recognition method, wherein the environment information comprises either one or both of noise information about noise included in the reference speech signal and distance information indicating a distance from the user uttering the reference speech signal to the user terminal (Bellegarda: [0073] — obtaining training samples caused by variations due to ambient noise levels, acoustic properties of the local environment, noise levels (all which are based on environment information at the user terminal collected on the reference speech)).
For claim 8, claim 6 is incorporated and the combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister and further in view of Bellegarda discloses the recognition method of claim 1, wherein the recognition result of the target recognition signal is additionally determined using a characteristic parameter selected based on environment information obtained when the target speech signal is input from different types of characteristic parameters determined in advance based on environment information obtained when the reference speech signal is input (Bellegarda: [0073] — obtaining training samples caused by variations due to acoustic properties of the local environment, noise levels (all which are based on environment information at the user terminal collected on the reference speech); [0045]-[0046]).
For claim 19, claim 14 is incorporated and the combination of Xue in view of Sun further in view of Hong and further in view of Hoffmeister and further in view of Bellegarda discloses the recognition method, wherein the characteristic parameter is a characteristic parameter selected based on environment information obtained when the target speech signal is input from different types of characteristic parameters determined in advance based on environment information obtained when the reference speech signal is input (Bellegarda: [0073] — obtaining training samples caused by variations due to ambient noise levels, acoustic properties of the local environment, noise levels (all which are based on environment information at the user terminal)).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Xue (US 2018/0005628 A1) in view of Sun (US 2017/0092276 A1) further in view of Hong (US 2003/0233233 A1) and further in view of Hoffmeister (US 2014/0163977 A1) as applied to claim 1, and further in view of Goesnar et al (US 2015/0162004 A1: hereafter – Goesnar).
For claim 9, claim 1 is incorporated but the combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister fails to teach the limitations of this claim for which Goesnar is now introduced to teach as the recognition method, wherein the determining of the characteristic parameter comprises determining the characteristic parameter by applying a personal parameter acquired from the reference speech signal to a basic parameter determined based on the plurality of users (Goesnar: [0037] — employing individualised acoustic speech models to recognise user speech in user voice inputs based at least in part on the user identification acoustically determined).
The combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister provides teaching for determining characteristic parameters employed in speech recognition. It differs from the claimed invention in that the claimed invention provides that a personal characteristic parameter acquired from the reference speech signal and applied to a basic parameter, is also used. This is however not new to the art as the reference of Goesnar is seen to show above. Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Goesnar into that of the combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister, given the predictable result having a system that addresses and responds only to commands issued by particular users/speakers, thereby authorising certain speakers to make use of the user device, over others.
Claims 21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Xue (US 2018/0005628 A1) in view of Sun (US 2017/0092276 A1) further in view of Hong (US 2003/0233233 A1) and further in view of Schroeter et al (US 2016/0125876 A1: hereafter – Schroeter).
For claim 21, Xue discloses a speech recognition method comprising:
determining a characteristic parameter among a plurality of characteristic parameters personalized to a speech of an individual user based on a reference speech signal of the individual user (Xue: [0075] — voiceprint feature vectors are extracted from the speech input of a speaker which get used for training (the voiceprints being personalised characteristic parameters and that they’re used for training indicates the presence of reference speech)), [[wherein the determining includes updating the characteristic parameter selectively using other reference speech signals, input by the individual user, satisfying a predetermined condition associated with at least one of a length of and an intensity of a speech signal, of the other reference speech signals,]] wherein the characteristic parameter personalized to the individual user includes [[normalization information to be used for normalizing the target speech signal,]] identification information indicating a speech characteristic of the individual user, [[and class information to be used for classifying in a model]] (Xue: [0075] — for speech recognition, feature vectors of the data to be recognised as well as the voiceprint feature vectors are used for decoding recognition);
applying the characteristic parameter of the individual user to a basic speech recognition model determined for a plurality of users to obtain a personalized speech recognition model personalized to the individual user (Xue: FIG. 1 Step S106, [0082] — adapting voiceprint feature vectors personalised to a speaker, with a speech recognition acoustic model, in order to obtain a personalised model for the speaker; [0013] — different characteristic parameters being voiceprint feature, noise feature, dialect feature, scene information feature); and
applying a target speech signal of the individual user to the personalized speech recognition model to obtain a recognition result of the target speech signal through speech recognition personalized to speech characteristic of the individual user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to the personalized speech recognition model, and comparing a value estimated in the personalized speech recognition model from the inputting of the normalized target speech signal with class information (Xue: [0083]-[0088], FIG. 2 Steps 214 [Wingdings font/0xE0] 216 [Wingdings font/0xE0] 218 [Wingdings font/0xE0] 212 — obtaining feature vectors of test speech data to be recognised and applying them to the speech model in order to obtain an adaptive speech model for the particular user (indicating a speech recognition personalised to the speech characteristic of the individual user) and then apply that model to recognise the speech; [0075] — for speech recognition, feature vectors of the data to be recognised (the target speech signal) as well as the voiceprint feature vectors (identification information) are used for decoding recognition; [0087] — inputting voiceprint features to a model related to a speaker).
The reference of Xue fails to teach about updating the characteristic parameter personalized to a speech of a user. The reference of Sun is now introduced to teach this as:
determining a characteristic parameter among a plurality of characteristic parameters personalized to a speech of an individual user based on a reference speech signal of the individual user wherein the determining includes updating the characteristic parameter selectively using other reference speech signals, input by the individual user, satisfying a predetermined condition associated with at least one of a length of and an intensity of a speech signal, of the other reference speech signals, [[wherein the characteristic parameter personalized to the individual user includes normalization information to be used for normalizing the target speech signal, identification information indicating a speech characteristic of the individual user, and class information to be used for classifying in a model]] (Sun: [0156] — a preset condition for generating voiceprint data being that the registration voice data reaches (user speech input) reaches a preset time period/duration; [0176] — obtaining duration information of input speech signal that is to be used for verification; Claim 8 — updating voiceprint based on input speech signal and reference speech signal).
The same motivation as applied to claim 1 is applicable here still.
The combination of Xue in view of Sun fails to disclose the further limitation of this claim, for which Hong is now introduced to teach as:
wherein the characteristic parameter personalized to the individual user includes normalization information to be used for normalizing the target speech signal, [[identification information indicating a speech characteristic of the individual user,]] and class information to be used for classifying in a model (Hong: [0045] — normalisation information for normalising input speech (recorded speech signal); [0017] — a classifier coupled to a model selector),
applying a target speech signal of the individual user to the personalized speech recognition model to obtain a recognition result of the target speech signal through speech recognition personalized to speech characteristic of the individual user by normalizing the target speech signal based on the normalization information, inputting the normalized target speech signal and the identification information to the personalized speech recognition model, and comparing a value estimated in the personalized speech recognition model from the inputting of the normalized target speech signal with class information (Hong: [0008] — performing speech recognition through selecting a model based on the determined classification of the received information of input speech; [0045] — performing normalisation of the input speech; [0046] — obtaining diagonal covariance matrices to obtain likelihood probabilities (as the value) in order to determine class information).
The same motivation as applied to claim 1 for incorporating the reference of Hong is applicable here still.
The combination of Xue in view of Sun further in view of Hong fails to teach the further limitation of this claim, for which Schroeter is now introduced to teach:
wherein the characteristic parameter is selected from the plurality of characteristic parameters based on target environment information (Schroeter: Abstract — obtaining metadata information (characteristic parameter) associated with ambient noise of input speech (target environment information) obtained from various acoustic environments, such that the metadata information is compared to other stored metadata information in order to match it to the metadata information for the ambient noise profile).
The combination of Xue in view of Sun further in view of Hong provides teaching for obtaining characteristic parameter from a plurality of characteristic parameters, but differs from the claimed invention in that the claimed invention further provides teaching indicating that the characteristic parameter is based on target environment information. This isn’t new as the reference of Schroeter is made available to provide teaching for such. Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Schroeter into that of the combination of Xue in view of Sun further in view of Hong, given the predictable result of an adaptive filtering based on a detected noise profile for speech enhancement and thereby providing an easier speech recognition.
For claim 23, claim 21 is incorporated and the combination of Xue in view of Sun further in view of Hong and further in view of Schroeter discloses the speech recognition method of claim 21, wherein the reference speech signal and the target speech signal are input by the individual user to a user terminal (Xue: [0007] — an audio capturing device which receives the audio for training as well as that for performing the speech recognition), and
the determining of the characteristic parameter comprises accumulatively determining the characteristic parameter each time a reference speech signal is input by the individual user to the user terminal (Xue: [0007] — recording audios as training data (indicating an accumulated amount of reference speech signals); [0008] — obtaining feature vectors from obtained training data and inputting the feature vectors into a speech recognition model).
Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Xue (US 2018/0005628 A1) in view of Sun (US 2017/0092276 A1) further in view of Hong (US 2003/0233233 A1) and further in view of Schroeter (US 2016/0125876 A1) as applied to claim 21, and further in view of Kiss et al (U.S. 9,190,055 B1: hereafter – Kiss).
For claim 22, claim 21 is incorporated and the combination of Xue in view of Sun further in view of Schroeter discloses the speech recognition method, wherein the determining of the characteristic parameter comprises:
acquiring a personal parameter determined for the individual user from the reference speech signal (Xue: [0013] — “the feature vectors of training data include at least one of a voiceprint feature vector” (voiceprints being personal parameters of a user)).
The combination of Xue in view of Sun further in view of Schroeter fails to explicitly disclose the further limitations of this claim for which Kiss is now introduced to teach as:
applying a first weight to the personal parameter to obtain a weighted personal parameter (Kiss: Claim 24 — a first score for a parameter vector of a personal model);
applying a second weight to a basic parameter determined for the plurality of users to obtain a weighted basic parameter (Kiss: Claim 24 — a second score for a parameter vector of a general model); and
adding the weighted personal parameter to the weighted basic parameter to obtain the characteristic parameter (Kiss: Claim 24 — adding both scores together in order to obtain the parameter for recognising a named entity).
The combination of Xue in view of Sun further in view of Schroeter provides teaching for obtaining a personal parameter determined for an individual from a reference speech signal for the purpose of determining a characteristic parameter associated with the individual. It differs from the claimed invention in that the claimed invention further provides that a first and second weight applied respectively to a personal parameter and a basic parameter are added to obtain the characteristic weight. This isn’t new to the art as it is seen to be taught by the reference of Kiss above. Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Kiss into the combination of Xue in view of Sun further in view of Schroeter, given the predictable result of interpolating a general model and a personal model to generate a single composite model that may be used for easily performing speech recognition (Kiss: Col 8 lines 50-56).
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Xue (US 2018/0005628 A1) in view of Sun (US 2017/0092276 A1) further in view of Hong (US 2003/0233233 A1) and further in view of Hoffmeister (US 2014/0163977 A1) as applied to claim 24, further in view of Goesnar (US 2015/0162004 A1) and further in view of Kiss (U.S. 9,190,055 B1).
For claim 25, claim 24 is incorporated and the combination of Xue in view of Sun further in view of Hong and further in view of Hoffmeister discloses the speech recognition method wherein
the transmitting comprises transmitting, from the user terminal to the server, the personal parameter and the target speech signal (Xue: [0007] — transmitting the training data to one or more servers, or another device that receives the one or more audio for speech recognition from another capturing device (teaching of transmitting the target speech signal to a server); [0093] — the user device having output and network interfaces).
The combination of Xue in view of Sun further in view of Hong and further in view of Hoffmeister fails to disclose the further limitations of this claim, for which Goesnar is now introduced to teach as the speech recognition method,
wherein the determining of the parameter based on the reference speech signal comprises acquiring a personal parameter determined for the individual user from the reference speech signal (Goesnar: [0037] — employing individualised acoustic speech models to recognise user speech in user voice inputs based at least in part on the user identification acoustically determined).
The combination of Xue in view of Sun further in view of Hong and further in view of Hoffmeister provides teaching for determining characteristic parameters employed in speech recognition. It differs from the claimed invention in that the claimed invention provides that a personal characteristic parameter acquired from the reference speech signal and applied to a basic parameter, is also used. This is however not new to the art as the reference of Goesnar is seen to show above. Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Goesnar into that of the combination of Xue in view of Sun further in view of Hong and further in view of Hoffmeister, given the predictable result having a system that addresses and responds only to commands issued by particular users/speakers, thereby authorising certain speakers to make use of the user device, over others.
The combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister and further in view of Goesnar fails to provide teaching for the further limitations of this claim for which Kiss is now introduced to teach as:
the parameter based on the reference signal is determined in the server (Kiss: Col 10 lines 62-64 — implementation at a server) by
applying a first weight to the personal parameter to obtain a weighted personal parameter (Kiss: Claim 24 — a first score for a parameter vector of a personal model),
applying a second weight to a basic parameter to obtain a weighted basic parameter (Kiss: Claim 24 — a second score for a parameter vector of a general model), and
adding the weighted personal parameter to the weighted basic parameter to obtain the parameter based on the reference speech signal (Kiss: Claim 24 — adding both scores together in order to obtain the parameter for recognising a named entity).
The same motivation for the incorporation of Kiss as applied to claim 22 is applicable here still.
The combination of Xue in view of Sun further in view of Hong, further in view of Hoffmeister and further in view of Goesnar provides teaching for obtaining a personal parameter determined for an individual from a reference speech signal for the purpose of determining a characteristic parameter associated with the individual. This combination differs from the claimed invention in that the claimed invention further provides that a first and second weight applied respectively to a personal parameter and a basic parameter are added to obtain the characteristic weight. This isn’t new to the art as it is seen to be taught by the reference of Kiss above. Hence, at the time the application was effectively filed, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Kiss into that of the combination, given the predictable result of interpolating a general model and a personal model to generate a single composite model that may be used for easily performing speech recognition (Kiss: Col 8 lines 50-56).
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
The reference of Sharma et al (US 2003/0009333 A1) provides teaching for testing an output value of a received speech signal by comparing the value to a threshold value and determining if the voice closely matches a particular model [0108].
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to OLUWADAMILOLA M. OGUNBIYI whose telephone number is (571)272-4708. The Examiner can normally be reached Monday - Thursday (8:00 AM - 5:30 PM Eastern Standard Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s Supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/OLUWADAMILOLA M OGUNBIYI/Examiner, Art Unit 2657

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657