Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 30, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
Claims 1-12 are directed to an abstract idea, namely a mathematical concept that can be performed mentally or is analogous to human mental work.
Independent 1, 11, and 12 claims recite the steps of “calculating a selection rate of each of a plurality of quantization points included in a quantization table, based on quantization data obtained by quantizing features of a plurality of utterance data” and “updating the quantization table by updating the plurality of quantization points based on the selection rate”. These steps are clearly directed to an abstract idea and/or a mathematical algorithm.
This judicial exception is not integrated into a practical application. Claim 1 recites “A non-transitory computer-readable recording medium having stored therein an update program that causes a computer to execute a procedure, the procedure comprising”. Claim 11 recites “An update method comprising”. Claim 12 recites “An information processing apparatus comprising: a memory; and a processor coupled to the memory and configured to”. These limitations direct towards using a computer for the method, and does not impose any meaningful limits on practicing the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The addition of the generic computer components recited above with regard to claim 1, 11, and 12 do not amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claims 1, 11, and 12 does not recite any additional limitations. The claims as drafted, are not patent eligible.
Claims 2-10 are rejected for their dependence on claim 1, because they do not contain additional limitations that overcome the present rejection. Claim 10 mentions the invention’s intended use for speech recognition, but no actual speech recognition is done.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 7-8, and 10-12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Stoimenov (U.S. Publication No. 20200349927).
Regarding claim 1, Stoimenov discloses a non-transitory computer-readable recording medium having stored therein an update program that causes a computer to execute a procedure, the procedure comprising ([0119] - The software may include computer executable instructions stored on computer or other machine-readable media or storage medium, such as one or more non-transitory memories (e.g., a non-transitory machine-readable medium) or other type of hardware-based storage devices, either local or networked):
calculating a selection rate of each of a plurality of quantization points included in a quantization table, based on quantization data obtained by quantizing features of a plurality of utterance data ([0042] - The confidence classifier can be used to assign a score that, based on a score value relative to a threshold, determines whether the audio includes an utterance of the wake word. NN layer quantization, SVD, frame skipping, frame stacking, and frame batching are discussed with regard to FIGS. and elsewhere herein);
and updating the quantization table by updating the plurality of quantization points based on the selection rate ([0004] - a lookup table (LUT) indicating a hidden vector to be generated in response to a phoneme of a user-specified wake word. [0074] - Then audio is passed through the decoding graph 332 with either the AM 330 and the LM 334, or the RNNT model with the LUT 444. The output can be provided to a beam search decoder 338 that produces a confidence score which shows how likely the audio contains the wake word. The confidence score is then compared with a predefined threshold to determine whether the wake word is present at operation 340).
Regarding claim 2, Stoimenov discloses the non-transitory computer-readable recording, wherein the updating of the quantization table includes:
excluding, from the quantization table, quantization points whose selection rate is equal to or less than a predetermined reference, among the plurality of quantization points ([0065] - If an entry in the beam search decoder 338 corresponding to the wake word is greater than a threshold, the keyword can be detected at operation 346. If the entry in the beam search decoder 338 corresponding to the wake word is not greater than the threshold, the keyword is not detected at operation 348),
adding, to the quantization table, quantization points different from the excluded quantization points ([0065] - If an entry in the beam search decoder 338 corresponding to the wake word is greater than a threshold, the keyword can be detected at operation 346. If the entry in the beam search decoder 338 corresponding to the wake word is not greater than the threshold, the keyword is not detected at operation 348),
and updating quantization points other than the quantization points whose selection rate is equal to or less than the predetermined reference, based on a result of a selection of the quantization points ([0071] -After the user decides the wake word 344, hidden vectors of the prediction network can be computed in and stored in the LUT 444. Instead of operating a prediction network, as previously done by others, the hidden vectors of a prediction network operating on the wake word 344 can be stored in the LUT 444).
Regarding claim 7, Stoimenov discloses the non-transitory computer-readable recording medium, the procedure further comprising:
generating a quantization result associated with a quantization point corresponding to a feature of the voice information, based on vector quantization on input voice information and the quantization table after update that includes the updated plurality of quantization points ([0087] - operating on the converted input vectors using a quantized weight matrix to generate quantized result , at operation 840; and dequantizing the quantized result and removing biases realized from using quantization, at operation 860);
and performing learning of a model to which a neural network is applied, when the quantization result is input into the model so that output information output from the model approaches correct answer information for indicating whether the voice information corresponding to the quantization result includes a predetermined conversation situation ([0089] - At operation 902, weights for a first layer of an NN can be loaded from the memory 160 into a cache memory 802. At operation 903, the weights for the first layer of the NN can be provided to the processing circuitry 804 for execution. At operation 904 first audio features from a first audio frame of an audio sample can be provided to the cache memory 802. At operation 905, the audio features from the first audio frame can be provided to the processing circuitry 804. The processing circuitry 804 can operate on the audio features using the NN configured using the layer weights provided at operation 903. The processing circuitry 804 can provide a corresponding output of the first layer based on the audio features from the first audio frame to the cache memory 802 at operation 906).
Regarding claim 8, Stoimenov discloses the non-transitory computer-readable recording medium, the procedure further comprising:
determining whether the predetermined conversation situation is included in utterance data to be determined, based on the output information acquired by inputting quantization data obtained by quantizing the feature of the utterance data to be determined, into the model that has been subjected to the learning ([0019] - Advances in wake word detection focus on training low - complexity models (e.g., models that consume small amounts of memory or processing circuitry bandwidth) that detect an utterance of a predetermined static wake word. {0023] - One or more embodiments can employ a general acoustic model (AM) that does not need wake word-specific training. To run on low - power hardware, this model can be compressed by SVD (singular value decomposition) and quantized (e.g., to 8 bits, 16 bits, or the like, per weight). The CTC can be used in conjunction with an adaptable back ground language model (BLM) and keyword graph to build a high correct acceptance (CA) and low false acceptance (FA) custom keyword detection system).
Regarding claim 10, Stoimenov discloses a non-transitory computer-readable recording medium,
wherein the features are information extracted from utterance data and to be used for speech recognition ([0047] - The AM 330 is used in SR to represent a relationship between an audio signal (features of an audio signal) and phonemes or other linguistic units that make up speech).
Regarding claim 11, Stoimenov discloses an update method comprising:	
calculating a selection rate of each of a plurality of quantization points included in a quantization table, based on quantization data obtained by quantizing features of a plurality of utterance data ([0042] - The confidence classifier can be used to assign a score that, based on a score value relative to a threshold, determines whether the audio includes an utterance of the wake word. NN layer quantization, SVD, frame skipping, frame stacking, and frame batching are discussed with regard to FIGS. and elsewhere herein);
updating the quantization table by updating the plurality of quantization points based on the selection rate, by a processor ([0004] - a lookup table (LUT) indicating a hidden vector to be generated in response to a phoneme of a user-specified wake word [0028] - One or more processor (s) of device 110 (e.g., processing unit 1102 as depicted in FIG. 12 and described below) can execute such application(s). [0074] - Then audio is passed through the decoding graph 332 with either the AM 330 and the LM 334, or the RNNT model with the LUT 444. The output can be provided to a beam search decoder 338 that produces a confidence score which shows how likely the audio contains the wake word. The confidence score is then compared with a predefined threshold to determine whether the wake word is present at operation 340).
Regarding claim 12, Stoimenov discloses an information processing apparatus comprising:
a memory ([0098] - a machine is configured to carry out a method by having software code for that method stored in a memory that is accessible to the processor (s) of the machine);
and a processor coupled to the memory and configured to ([0098] - a machine is configured to carry out a method by having software code for that method stored in a memory that is accessible to the processor (s) of the machine):
calculate a selection rate of each of a plurality of quantization points included in a quantization table, based on quantization data obtained by quantizing features of a plurality of utterance data ([0042] - The confidence classifier can be used to assign a score that, based on a score value relative to a threshold, determines whether the audio includes an utterance of the wake word. NN layer quantization, SVD, frame skipping, frame stacking, and frame batching are discussed with regard to FIGS. and elsewhere herein);
and update the quantization table by updating the plurality of quantization points based on the selection rate ([0004] - a lookup table (LUT) indicating a hidden vector to be generated in response to a phoneme of a user-specified wake word [0074] - Then audio is passed through the decoding graph 332 with either the AM 330 and the LM 334, or the RNNT model with the LUT 444. The output can be provided to a beam search decoder 338 that produces a confidence score which shows how likely the audio contains the wake word. The confidence score is then compared with a predefined threshold to determine whether the wake word is present at operation 340).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically taught as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Stoimenov (U.S. Publication No. 20200349927) in view of Hayakawa (U.S. Publication No. 20160111112).
Regarding claim 3, Stoimenov discloses all of the limitations as in claim 2, above.
However, Stoimenov does not disclose the non-transitory computer-readable recording medium, wherein the calculating of the selection rate includes:
calculating a distance between each of the quantization data based on the feature of each of the plurality of utterance data and each of the plurality of quantization points included in the quantization table,
and selecting a quantization point of the plurality of quantization points that minimizes the distance, as the result of the selection of each of the quantization point.
Hayakawa does teach the non-transitory computer-readable recording medium, wherein the calculating of the selection rate includes:
calculating a distance between each of the quantization data based on the feature of each of the plurality of utterance data and each of the plurality of quantization points included in the quantization table ([0069] -The average quantization distortion D(ab) can be calculated, for example, as the average of distances from the features included in the feature vector a to the closest one of the average values of features of the clusters included in a codebook which is a speaker model),
and selecting a quantization point of the plurality of quantization points that minimizes the distance, as the result of the selection of each of the quantization point ([0074] - As can be seen from plots 900 to 903, the matching scores are local minimum at speaker change points regardless of decimation).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Stoimenov to incorporate the teachings of Hayakawa in order to implement the non-transitory computer-readable recording medium, wherein the calculating of the selection rate includes: calculating a distance between each of the quantization data based on the feature of each of the plurality of utterance data and each of the plurality of quantization points included in the quantization table, and selecting a quantization point of the plurality of quantization points that minimizes the distance, as the result of the selection of each of the quantization point. Doing so allows for the amount of computation for generating speaker models and calculating matching scores to be reduced (Hayakawa [0071]).
Regarding claim 4, Stoimenov in view of Hayakawa teaches all of the limitations as in claim 3, above.
Stoimenov discloses the non-transitory computer-readable recording medium, wherein the updating of the quantization table includes:
excluding, from the quantization table, the quantization points whose the selection rate is equal to or less than the predetermined reference ([0065] - If an entry in the beam search decoder 338 corresponding to the wake word is greater than a threshold, the keyword can be detected at operation 346. If the entry in the beam search decoder 338 corresponding to the wake word is not greater than the threshold, the keyword is not detected at operation 348),
adding, to the quantization table, quantization points before update, whose selection rate is equal to or more than the predetermined reference ([0065] - If an entry in the beam search decoder 338 corresponding to the wake word is greater than a threshold, the keyword can be detected at operation 346. If the entry in the beam search decoder 338 corresponding to the wake word is not greater than the threshold, the keyword is not detected at operation 348),
and updating each of the quantization points whose selection rate is equal to or less than the predetermined reference to an average value of selected quantization data ([0042] - The confidence classifier can be used to assign a score that, based on a score value relative to a threshold, determines whether the audio includes an utterance of the wake word). 
Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Stoimenov (U.S. Publication No. 20200349927) in view of Hayakawa (U.S. Publication No. 20160111112), and further in view of Hoffberg (U.S. Patent No. 10943273).
Regarding claim 5, Stoimenov in view of Hayakawa teaches all of the limitations as in claim 4, above.
Hayakawa teaches the non-transitory computer-readable recording medium, procedure further comprising:
specifying the quantization point that includes a highest selection rate among the plurality of quantization points, as a quantization point equivalent to silence ([0032] - the maximum value of autocorrelation in a frame that corresponds to a silent or background noise is Small or clear peaks do not appear in such a frame),
wherein the updating of the quantization table includes:
updating the quantization point equivalent to silence to an average value of the selected quantization data ([0032] - The feature extracting unit 21 compares the maximum one of the peak values with a predetermined threshold and, when the maximum peak value is greater than the predetermined threshold, the feature extracting unit 21 determines that the frame includes voiced Sound of a speaker. The feature extracting unit 21 then obtains the reciprocal of the time difference that is equivalent to the maximum peak value as a pitch frequency. [0068] -In this case, a speaker model includes the average value of features for each cluster obtained by clustering of features in the frames included in an analysis period),
and updating quantization points other than the quantization point equivalent to silence, based on the selection rate ([0032] - The feature extracting unit 21 compares the maximum one of the peak values with a predetermined threshold and, when the maximum peak value is greater than the predetermined threshold, the feature extracting unit 21 determines that the frame includes voiced Sound of a speaker. The feature extracting unit 21 then obtains the reciprocal of the time difference that is equivalent to the maximum peak value as a pitch frequency).
However, Stoimenov in view of Hayakawa does not teach the non-transitory computer-readable recording medium, procedure further comprising: calculating the selection rate of each of the plurality of quantization points according to white noise included in the quantization table as an initial value, based on each of quantization data generated from a first utterance data of the plurality of utterance data.
Hoffberg does teach the non-transitory computer-readable recording medium, procedure further comprising: calculating the selection rate of each of the plurality of quantization points according to white noise included in the quantization table as an initial value, based on each of quantization data generated from a first utterance data of the plurality of utterance data (Col 51, Rows 50-53 – The random variables w and v represent the processor and measurement noise (respectively). They are assumed to be independent (of each other), white, and with normal probability distributions).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Stoimenov in view of Hayakawa to incorporate the teachings of Hoffberg in order to implement the non-transitory computer-readable recording medium, procedure further comprising: calculating the selection rate of each of the plurality of quantization points according to white noise included in the quantization table as an initial value, based on each of quantization data generated from a first utterance data of the plurality of utterance data. Doing so allows for a “high discriminative ability” to determine correct parameters of inputs in the recognition phase (Hoffberg Col 30, Rows 40-65).
Regarding claim 6, Stoimenov in view of Hayakawa in view of Hoffberg teaches all of the limitations as in claim 5, above.
Stoimenov in view of Hayakawa teaches the non-transitory computer-readable recording medium, wherein the updating of the quantization table includes:
calculating a quantization error based on quantization points after update excluding the quantization point equivalent to silence (Stoimenov [0122] - Training data is fed into the NN and results are compared to an objective function that provides an indication of error. The error indication is a measure of how wrong the NN's result is compared to an expected result. This error is then used to correct the weights),
when the quantization error is equal to or larger than a threshold value (Stoimenov [0121] - At each traversal between neurons, the corresponding weight modifies the input and is tested against a threshold at the destination neuron. If the weighted value exceeds the threshold, the value is again weighted, or transformed through a nonlinear function, and transmitted to another neuron further down the NN graph — if the threshold is not exceeded then, generally, the value is not transmitted to a down-graph neuron and the synaptic connection remains inactive. The process of weighting and testing continues until an output neuron is reached; the pattern and values of the output neurons constituting the result of the NN processing),
updating the quantization point equivalent to silence by using a second utterance data of the plurality of utterance data (Hayakawa [0034] - It is preferable to adaptively set the noise determination threshold Thn in accordance with the level of back ground noise in a Voice signal in a telephone conversation. For this purpose, the feature extracting unit 21 determines that a frame in which no speaker is speaking is a silent frame that contains only background noise), 
and updating quantization points other than the quantization point equivalent to silence, based on the selection rate (Hayakawa [0032] - The feature extracting unit 21 compares the maximum one of the peak values with a predetermined threshold and, when the maximum peak value is greater than the predetermined threshold, the feature extracting unit 21 determines that the frame includes voiced Sound of a speaker. The feature extracting unit 21 then obtains the reciprocal of the time difference that is equivalent to the maximum peak value as a pitch frequency. [0068] - In this case, a speaker model includes the average value of features for each cluster obtained by clustering of features in the frames included in an analysis period),
and when the quantization error is smaller than the threshold value (Stoimenov [0121] - At each traversal between neurons, the corresponding weight modifies the input and is tested against a threshold at the destination neuron. If the weighted value exceeds the threshold, the value is again weighted, or transformed through a nonlinear function, and transmitted to another neuron further down the NN graph — if the threshold is not exceeded then, generally, the value is not transmitted to a down-graph neuron and the synaptic connection remains inactive. The process of weighting and testing continues until an output neuron is reached; the pattern and values of the output neurons constituting the result of the NN processing),
outputting the quantization table after update (Stoimenov [0121] - At each traversal between neurons, the corresponding weight modifies the input and is tested against a threshold at the destination neuron. If the weighted value exceeds the threshold, the value is again weighted, or transformed through a nonlinear function, and transmitted to another neuron further down the NN graph — if the threshold is not exceeded then, generally, the value is not transmitted to a down-graph neuron and the synaptic connection remains inactive. The process of weighting and testing continues until an output neuron is reached; the pattern and values of the output neurons constituting the result of the NN processing).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Stoimenov (U.S. Publication No. 20200349927) in view of Hoffberg (U.S. Patent No. 10943273).
Regarding claim 9, Stoimenov discloses all of the limitations as in claim 1, above.
However, Stoimenov does not disclose the non-transitory computer-readable recording medium,
wherein the selection rate is a ratio of a total of selected features to a number of selection of the quantization point.
Hoffberg does teach the non-transitory computer-readable recording medium, 
wherein the selection rate is a ratio of a total of selected features to a number of selection of the quantization point (Col 38, Rows 15-18 - The recognition rate in this case can be calculated as the ratio between number of correctly recognized speech units and total number of speech units (observation sequences) to be recognized).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Stoimenov to incorporate the teachings of Hoffberg in order to implement the non-transitory computer-readable recording medium, wherein the selection rate is a ratio of a total of selected features to a number of selection of the quantization point. Doing so allows for a “high discriminative ability” to determine correct parameters of inputs in the recognition phase (Hoffberg Col 30, Rows 40-65).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Covell (U.S. Publication No. 20200234126) teaches look-up table based neural networks. Hofer (U.S. Publication No. 20160300566) teaches the method and system of random access compression of transducer data for automatic speech recognition decoding. Jin (U.S. Publication No. 20190318726) teaches a real-time speaker-dependent neural vocoder. Printz (U.S. Patent No. 8185390) teaches zero-search, zero memory vector quantization. Shahid (U.S. Publication No. 20200349925) teaches online verification of custom wake words.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/Examiner, Art Unit 2658

/RICHEMOND DORVIL/            Supervisory Patent Examiner, Art Unit 2658