DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments (7/12/22 Remarks: page 8, line 12 – page 11, line 5) with respect to the rejection of claims 1-4 under 35 USC §102 and the rejection of claims 5-19 under 35 USC §103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, new grounds of rejection are made in view of Wang (US 20200027462, cited in 4/12/22 Office Action) and Wood (US 20190066687, cited in 4/12/22 Office Action).
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-3, 5, 9, 12-14, 16, & 19 are rejected under 35 U.S.C. 103(a) as being unpatentable over Wang (US 20200027462, cited in 4/12/22 Office Action) in view of Wood (US 20190066687, cited in 4/12/22 Office Action) and Gupta (“Automatic speech recognition technique for voice command”).
With respect to claim 1, Wang discloses:
Claim 1: A speech recognition device comprising:
a microphone configured to receive an audio signal (Wang paragraph 0118 and Figure 4, voice collecting MIC array);
a speech detection unit configured to detect whether the audio signal is a speech signal spoken by a user (Wang paragraph 0118 and Figure 4, voice recognition);
a memory configured to store the audio signal (Wang paragraph 0138, voice information collected over time; Wang paragraphs 0039-0041, storage); 
a processor (Wang paragraph 0118 and Figure 4, processor) …; and
an audio processor (Wang paragraphs 0065 & 0118 and Figure 4, co-processor) … and a second program for recognizing an activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer), wherein …,
wherein the audio processor is configured to:
receive a speech detection signal from the speech detection unit (Wang paragraph 0065, receive voice signal),
… for preprocessing the audio signal (Wang paragraph 0118, language processing),
…,
load the second program for recognizing the activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer),
determine whether the preprocessed audio signal contains an activation word using the second program (Wang paragraph 0065, analyze and recognize voice information to detect wakeup word),
generate a signal for activating the processor, based on the audio signal containing the activation word (Wang paragraph 0065, wake voice recognition processor) and
transmit, to the processor, the audio signal that is input after the audio signal containing the activation word (Wang paragraph 0065, voice recognition processor operates to recognize voice (receiving audio signal of voice in order to do so) after being awakened in response to audio signal containing wakeup word).
Wang does not expressly disclose an internal memory of insufficient size to simultaneously load the first and second programs.
The selection of a particular memory size would be an example of a selection among choices known to one of ordinary skill in the art:
…the first program for preprocessing the audio signal and the second program for recognizing an activation word cannot be simultaneously loaded on the internal memory due to an insufficient size of the internal memory…
The motivation for selecting a memory size insufficient to simultaneously store both first and second programs would be to provide a memory sufficient for the necessary function (of storing the first or the second program when each program is used) without the added resources required to provide unnecessary additional memory to store both when only one is in use.
Wang does not expressly disclose the storage and loading of an external memory program for audio signal processing.
Wood discloses:
… having an internal memory configured to load a first program for preprocessing the audio signal … (Wood paragraph 0065, voice audio preprocessing; Wood paragraphs 0269-0272, loading of software stored in memory)…
…preprocess the audio signal stored in the memory using the first program (Wood paragraph 0065, voice audio preprocessing; Wood paragraphs 0269-0272, loading of software stored in memory)…
Wang and Wood are combinable because they are from the field of speech recognition used for device control.
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to load the Wang audio processing software in an external memory as disclosed by Wood.
The suggestion/motivation for doing so would have been to enable the use of software obtained from the Internet (Wood paragraph 0200, Internet cloud) or otherwise stored outside the operating memory (Wood paragraph 0271, DVD, optical storage disk, etc).
Wang discloses the use of voice recognition to convert a spoken command to an operating command.
Wang does not disclose the application of natural language processing to convert a spoken command to an operating command.
Gupta discloses:
…configured to perform natural language processing (Gupta Section I, “Speech recognition comes under the branch of science known as Natural Language Processing (NLP) which is the field of computer science, artificial intelligence and linguistics which are concerned with the interaction between natural language (spoken by humans) and computer systems. NLP enables the computer system to drive, meaning out of the input language. It makes much simpler for the user to operate the device.”)…
Wang and Gupta are combinable because they are from the field of speech recognition used for device control.
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to implement the Wang arrangement for recognizing spoken commands using natural language processing for the recognition of spoken commands as taught by Gupta.
The suggestion/motivation for doing so would have been to enable the device to understand the meaning of a spoken command (Gupta Abstract and Section I).
Therefore, it would have been obvious to combine Wang with Wood and Gupta to obtain the invention as specified in claim 1.
Applying these teachings as applied to claim 1 above to claims 2-3, 5, 9, 12-14, 16, & 19:
Claim 2: The speech recognition device of claim 1 (see above), wherein the microphone, the speech detection unit, the memory, and the audio processor are provided in a first power domain (Wang paragraph 0065, low power operation of voice recognition to detect wakeup word), and the processor is provided in a second power domain that is different from the first power domain (Wang paragraph 0065, high power consumption by voice recognition processor once awakened), and
when the audio processor determines that the audio signal contains the activation word, the audio processor is further configured to generate a signal for supplying power to the second power domain so as to activate the processor (Wang paragraph 0065, voice recognition processor only operates (with high power consumption) when awakened).
Claim 3: The speech recognition device of claim 2 (see above), wherein when the audio processor determines that the audio signal contains the activation word, the audio processor is further configured to transmit, to the processor, a notification signal notifying that the activation word is recognized (Wang paragraph 0065, wake voice recognition processor in response to detection of wakeup word).
Claim 5: The speech recognition device of claim 1 (see above), wherein the first program for preprocessing the audio signal (see secondary reference below) and the second program for recognizing the activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer) are stored in an external memory (Wood paragraph 0200, software in the Internet cloud; Wood paragraphs 0269-0272, software stored in memory), and
the audio processor is further configured to load, from the external memory, the first program for preprocessing the audio signal (Wood paragraph 0065, voice audio preprocessing; Wood paragraphs 0269-0272, loading of software stored in memory) and the second program for recognizing the activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer).
Claim 9: The speech recognition device of claim 1 (see above), further comprising:
a transceiver (Wood paragraph 0200, communication to Internet cloud),
wherein the processor is configured to transmit the audio signal received from the audio processor, to an external natural language processing server through the transceiver (Wood paragraph 0200, hardware component received audible command, sends it over a network for processing), receive a result of recognition from the external natural language processing server to perform the natural language processing (Wood paragraph 0200, receive a response) and perform an operation corresponding to the result of recognition (Wood paragraph 0200, response directs the performance of an operation such as verbal response to the user).
Claim 12: A method of operating a speech recognition device, the method comprising:
receiving an audio signal (Wang paragraph 0118 and Figure 4, voice collecting MIC array);
storing the audio signal in a memory (Wang paragraph 0138, voice information collected over time, inherently requiring storage of earlier collected information);
detecting whether the audio signal is a speech signal spoken by a user (Wang paragraph 0118 and Figure 4, voice recognition);
when the audio signal is the speech signal spoken by the user (Wang paragraph 0065, analyze and recognize voice information),
loading, by an audio processor, a first program for preprocessing the audio signal (Wang paragraph 0118, language processing; Wood paragraph 0065, voice audio preprocessing; Wood paragraphs 0269-0272, loading of software stored in memory), wherein the audio processor has an internal memory configured to load the first program for preprocessing the audio signal and a second program for recognizing an activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer), wherein the first program for preprocessing the audio signal and the second program for recognizing an activation word cannot be simultaneously loaded on the internal memory due to an insufficient size of the internal memory (see rejection of claim 1 above with respect to selection of memory size)
preprocessing, by the audio processor, noise and an echo in the audio signal stored in the memory using the first program (Wood paragraph 0065, echo cancellation and noise cancellation on voice input);
loading, by the audio processor, the second program for recognizing the activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer);
determining, by the audio processor, whether the preprocessed audio signal contains an activation word using the second program (Wang paragraph 0065, analyze and recognize voice information to detect wakeup word);
activating a processor for natural language processing, when the preprocessed audio signal contains the activation word (Wang paragraph 0065, wake voice recognition processor); and
performing, by the processor, natural language processing on the audio signal that is received after the audio signal containing the activation word (Wang paragraph 0065, voice recognition processor operates to recognize voice (receiving audio signal of voice in order to do so) after being awakened in response to audio signal containing wakeup word).
Claim 13: The method of claim 12 (see above), wherein the activating of the processor comprises:
supplying power to a second power domain in which the processor is provided (Wang paragraph 0065, high power consumption by voice recognition processor once awakened), the second power domain being different from a first power domain in which the audio processor is provided (Wang paragraph 0065, low power operation of voice recognition to detect wakeup word).
Claim 14: The method of claim 13 (see above), wherein the activating of the processor further comprises:
transmitting, by the audio processor, a notification signal notifying that the activation word is recognized, to the processor (Wang paragraph 0065, wake voice recognition processor in response to detection of wakeup word).
Claim 16: The method of claim 12 (see above), wherein the loading of the first program for preprocessing the audio signal comprises:
loading, from an external memory, the first program for preprocessing the audio signal (Wood paragraph 0200, software in the Internet cloud; Wood paragraphs 0269-0271, software stored in memory), and
the loading of the second program for recognizing the activation word comprises:
loading, from the external memory, the second program for recognizing the activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer).
Claim 19: The method of claim 12 (see above), wherein the performing of the natural language processing comprises:
transmitting, to an external natural language processing server, the audio signal that is received after the audio signal containing the activation word (Wood paragraph 0200, hardware component received audible command, sends it over a network for processing);
receiving a result of recognition from the external natural language processing server (Wood paragraph 0200, receive a response); and
performing an operation corresponding to the result of recognition (Wood paragraph 0200, response directs the performance of an operation such as verbal response to the user).
Claims 6-7 & 17 are rejected under 35 U.S.C. 103(a) as being unpatentable over Wang in view of Wood and Gupta as applied to claim 5 above, and further in view of Li (US 20170278513, cited in 4/12/22 Office Action).
With respect to claim 6, Wang in view of Wood and Gupta teaches the invention of claim 5:
Claim 6: The speech recognition device of claim 5 (see above).
Wang in view of Wood and Gupta does not teach the use of an artificial neural network having a learned learning model and filter coefficient.
Li ’513 discloses:
…wherein the first program for preprocessing the audio signal and the second program for recognizing the activation word are programs based on an artificial neural network in which a learning model and a filter coefficient are determined by learning in advance (Li ’513 paragraphs 0015 & 0023, speech recognition neural network with training and adaptive filter parameters).
Wang in view of Wood and Gupta and Li ’513 are combinable because they are from the field of speech recognition.
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to implement the speech recognition of Wang in view of Wood and Gupta using the trained neural network arrangement of Li ’513.
The suggestion/motivation for doing so would have been to enable speech recognition with improved accuracy (Li ’513 paragraph 0015, improved accuracy for audio recognition).
Therefore, it would have been obvious to combine Wang in view of Wood and Gupta with Li ’513 to obtain the invention as specified in claim 6.
Applying these teachings as applied to claim 6 above to claims 7 & 17:
Claim 7: The speech recognition device of claim 6 (see above), wherein the audio processor has a built-in command random-access memory (RAM) (Li ’513 paragraph 0100, random access memory) storing an activation word recognition application code (Wood paragraphs 0269-0271, software stored in memory) and a built-in data RAM (Li paragraph 0100, random access memory) storing activation word recognition application data (Wang paragraph 0090, load wakeup word voice model and copy to buffer), and
the audio processor is further configured to:
load, from the external memory, the learning model and the filter coefficient of the artificial neural network for the first program for preprocessing the audio signal and the second program for recognizing the activation word (Li ’513 paragraphs 0015 & 0023, speech recognition neural network with training and adaptive filter parameters),
store the learning model and the filter coefficient in the memory (Wood paragraphs 0269-0271, software stored in memory), and
execute the programs (Wood paragraph 0253, execute software).
Claim 17: The method of claim 16 (see above), wherein the first program for preprocessing the audio signal and the second program for recognizing the activation word are programs based on an artificial neural network in which a learning model and a filter coefficient are determined by learning in advance (Li ’513 paragraphs 0015 & 0023, speech recognition neural network with training and adaptive filter parameters).
Claims 8, 10-11, & 18 are rejected under 35 U.S.C. 103(a) as being unpatentable over Wang in view of Wood, Gupta, and Li ’513 as applied to the invention of claims 7 & 17 above, and further in view of Li (US 10409357, cited in 4/12/22 Office Action).
With respect to claim 8, Wang in view of Wood, Gupta, and Li ’513 teaches:
Claim 8: The speech recognition device of claim 7 (see above), wherein in order to load the learning model and the filter coefficient of the artificial neural network, the audio processor is further configured to:
…
…read the learning model and the filter coefficient of the artificial neural network from … ,
store, in the memory, the learning model and the filter coefficient of the artificial neural network (Li ’513 paragraphs 0015 & 0023, speech recognition neural network with training and adaptive filter parameters), set the self-refresh mode of … and
….
Wang in view of Wood, Gupta and Li ’513 does not teach the particular recited memory device and mode control arrangement.
Li ’357 discloses:
…stop a low-power mode  (Li ’357 column 6, lines 34-37, start and stop low-power modes; Li ’357 column 10, lines 62-65, PHY control and self-refresh mode) of a PHY controlling a DDR DRAM (Li ’357 column 12, lines 1-7, use of any computer readable medium type; Li ’357 column 1, lines 13-14 & 30, high bandwidth (e.g. double data rate) DRAM as an example of known computer readable medium type) which is the external memory,
stop a self-refresh mode (Li ’357 column 6, lines 34-37, start and stop low-power modes; Li ’357 column 10, lines 62-65, PHY control and self-refresh mode) of the DDR DRAM (Li ’357 column 12, lines 1-7, use of any computer readable medium type; Li ’357 column 1, lines 13-14 & 30, high bandwidth (e.g. double data rate) DRAM as an example of known computer readable medium type)…
…set the PHY to be in the low-power mode (Li ’357 column 6, lines 34-37, start and stop low-power modes).
Wang in view of Wood, Gupta, and Li ’513 and Li ’357 are combinable because they are from the field of processors using memory devices for data storage.
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to apply the Li ’357 memory device and mode control arrangement to the speech recognition system of Wang in view of Wood, Gupta, and Li ’513.
The suggestion/motivation for doing so would have been to implement a memory with reduced power consumption (Li ’357 column 2, lines 8-11, reduced power consumption).
Therefore, it would have been obvious to combine Wang in view of Wood, Gupta and Li ’513 with Li ’357 to obtain the invention as specified in claim 8.
Applying these teachings as applied to claim 8 above to claims 10-11 & 18:
Claim 10: An electronic device comprising:
a user interface configured to receive a command from a user and providing operation information to the user (Wood paragraph 0053, audio responsive device receives user command and provides output to user);
a speech recognition device configured to recognize a command from speech of the user (Wood paragraph 0053, audio responsive device carries out user command);
a driving unit configured to perform mechanical and electrical operations to operate the electronic device (Wood paragraph 0053, play speakers (electrical and mechanical operation));
a processor operatively connected to the user interface, the speech recognition device, and the driving unit (Wang paragraph 0118 and Figure 4, processor; Wood paragraph 0059, processor); and
a memory operatively connected to the processor and the speech recognition device comprising,
a microphone configured to receive an audio signal (Wang paragraph 0118 and Figure 4, voice collecting MIC array);
a speech detection unit configured to detect whether the audio signal is a speech signal spoken by a user (Wang paragraph 0118 and Figure 4, voice recognition);
the memory configured to store the audio signal (Wang paragraph 0138, voice information collected over time, inherently requiring storage of earlier collected information);
the processor configured to perform natural language processing (Wang paragraph 0118 and Figure 4, processor); and
an audio processor (Wang paragraphs 0065 & 0118 and Figure 4, co-processor) having an internal memory configured to load a first program for preprocessing the audio signal (Wang paragraph 0118, language processing; Wood paragraph 0065, voice audio preprocessing; Wood paragraphs 0269-0272, loading of software stored in memory) and a second program for recognizing an activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer), wherein the first program for preprocessing the audio signal and the second program for recognizing an activation word cannot be simultaneously loaded on the internal memory due to an insufficient size of the internal memory (see rejection of claim 1 above with respect to selection of memory size),
wherein the audio processor is configured to:
receive a speech detection signal from the speech detection unit (Wang paragraph 0065, receive voice signal),
load the first program for preprocessing the audio signal (Wang paragraph 0118, language processing; Wood paragraph 0065, voice audio preprocessing; Wood paragraphs 0269-0272, loading of software stored in memory),
preprocess the audio signal stored in the memory using the first program (Wang paragraph 0065, analyze and recognize voice information),
load the second program for recognizing the activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer),
determine whether the preprocessed audio signal contains an activation word using the second program (Wang paragraph 0065, analyze and recognize voice information to detect wakeup word),
generate a signal for activating the processor, based on the audio signal containing the activation word (Wang paragraph 0065, wake voice recognition processor) and
transmit, to the processor, the audio signal that is input after the audio signal containing the activation word (Wang paragraph 0065, voice recognition processor operates to recognize voice (receiving audio signal of voice in order to do so) after being awakened in response to audio signal containing wakeup word)
wherein the speech recognition device is a speech recognition device of any one of claims 1 to 8 (see above), and
the memory is further configured to store the first program for preprocessing an audio signal (Wood paragraph 0200, software in the Internet cloud; Wood paragraphs 0269-0271, software stored in memory) and the second program for recognizing an activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer), the first and second programs being used in the speech recognition device.
Claim 11: The electronic device of claim 10 (see above), wherein the processor is further configured to set an operation of the electronic device (Wang paragraph 0118 and Figure 4, control processor; Wood paragraph 0059, control processor) and/or (Note: This is a recitation in the alternative, readable on either option) controls an operation of the driving unit (Wood paragraph 0053, play speakers (electrical and mechanical operation)), based on the command received from the user interface (Wood paragraph 0053, audio responsive device carries out user command) or (Note: This is a recitation in the alternative, readable on either option) the speech recognition device (Wood paragraph 0053, audio responsive device carries out user command).
Claim 18: The method of claim 17 (see above), wherein the loading of the first program for preprocessing the audio signal or (Note: This is a recitation in the alternative, readable upon either option) the loading of the second program for recognizing the activation word comprises:
stopping a low-power mode of a PHY controlling a DDR DRAM which is the external memory (Li ’357 column 6, lines 34-37, start and stop low-power modes; Li ’357 column 10, lines 62-65, PHY control and self-refresh mode; Li ’357 column 12, lines 1-7, use of any computer readable medium type; Li ’357 column 1, lines 13-14 & 30, high bandwidth (e.g. double data rate) DRAM as an example of known computer readable medium type),
stopping a self-refresh mode of the DDR DRAM (Li ’357 column 6, lines 34-37, start and stop low-power modes; Li ’357 column 10, lines 62-65, PHY control and self-refresh mode; Li ’357 column 12, lines 1-7, use of any computer readable medium type; Li ’357 column 1, lines 13-14 & 30, high bandwidth (e.g. double data rate) DRAM as an example of known computer readable medium type);
reading, from the DDR DRAM (Li ’357 column 12, lines 1-7, use of any computer readable medium type; Li ’357 column 1, lines 13-14 & 30, high bandwidth (e.g. double data rate) DRAM as an example of known computer readable medium type), the learning model and the filter coefficient of the artificial neural network (Li ’513 paragraphs 0015 & 0023, speech recognition neural network with training and adaptive filter parameters) for the first program for preprocessing the audio signal or (Note: This is a recitation in the alternative, readable upon either option) the second program for recognizing the activation word (Wang paragraph 0090, load wakeup word voice model and copy to buffer);
storing, in the memory, the learning model and the filter coefficient of the artificial neural network (Li ’513 paragraphs 0015 & 0023, speech recognition neural network with training and adaptive filter parameters);
setting the self-refresh mode of the DDR DRAM (Li ’357 column 6, lines 34-37, start and stop low-power modes; Li ’357 column 10, lines 62-65, PHY control and self-refresh mode; Li ’357 column 12, lines 1-7, use of any computer readable medium type; Li ’357 column 1, lines 13-14 & 30, high bandwidth (e.g. double data rate) DRAM as an example of known computer readable medium type); and
setting the PHY to be in the low-power mode (Li ’357 column 6, lines 34-37, start and stop low-power modes; Li ’357 column 10, lines 62-65, PHY control and self-refresh mode).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhang (“A Similarity-Based Approach to Recognizing Voice-Based Task Goals in Self-Adaptive Systems”) discloses an additional example of voice-recognition device control using natural language processing.
Any inquiry concerning the contents of this communication or earlier communications from the examiner should be directed to Stephen M. Brinich at 571-272-7430 (voice) or 571-273-7430 (fax).
Any inquiry relating to the status of this application, entry of papers into this application, or other any inquiries of a general nature concerning application processing should be directed to the Tech Center 2600 Customer Service center at 571-272-2600 or to the USPTO Contact Center at 800-786-9199 or 571-272-1000.
The examiner can normally be reached on weekdays 7:30-4:00 Eastern Time.
If attempts to contact the examiner and the Customer Service Center are unsuccessful, supervisor Claire Wang can be contacted at 571-270-1051.
Hand-carried correspondence may be delivered to the Customer Service Window, located at the Randolph Building, 401 Dulany Street, Alexandria, VA 22314.
/Stephen M Brinich/
Examiner, Art Unit 2663