DETAILED ACTION
Introduction
This office action is in response to Applicant submission filed on 6/14/2022. Claims 1-20 are pending in this application. As such, Claims 1-20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The response filed on 6/14/2022 has been correspondingly accepted and considered in this Office Action. As such, Claims 1-20 have been examined. 

Response to Arguments
With respect to the rejections of Claims 1-3, 5-12, and 14-20 stand rejected under 35 U.S.C. §102(a)(1) as being anticipated by Watanabe U.S. Patent No. 9,443,527B1 (hereinafter referred to as
“Watanabe’) and claims 4 and 13 stand rejected under 35 U.S.C. § 103 as being unpatentable over
Watanabe in view of Meaney et al. U.S. Patent No. 9,484,030B1 (hereinafter referred to
as “Meaney’), Applicant appears to present the following position on Remarks pp. 8 filed 6/14/2022.
	“Watanabe is directed to a system for speech control of a non-speech processing device that performs operations based on communication between the devices. Watanabe discloses a speech processing model that identifies a first speech command or a second speech command and executing one or more instructions by a device. Applicant respectfully submits that there is no disclosure by Watanabe of “. . . analyzing a frequency of use of the smart object...”

Instead, Watanabe discloses a device which incorporates grammar for processing speech commands for multiple devices and a weighting system that is utilized to adjust internal weights to accurately to capture a user's speech and intended commands. Watanabe also discloses that based on a user's previous behavior, where a speech command is received, the most recent user command, etc. an automatic speech recognition device may weigh the incoming speech and interpret the received command in different ways to control a non-automatic speech recognition device. Watanabe also discloses that based on a user’s previous behavior, where a speech command is received, a most recent command may influence the weight of the incoming speech and the interpretation of the command. However, there is no disclosure nor any suggestion by Watanabe of analyzing a frequency of use of a smart object by a user.

Applicant respectfully submits that the disclosure of weighting of the incoming speech and adjusting internal weights applied during speech processing is not tantamount to analyzing a frequency of use of the smart object. The Office Action makes a contention that frequency of use of the smart objects is considered when taken spoken commands in certain locations. However, Applicant respectfully submits that there is no disclosure nor any suggestion of analyzing at least one function of a smart object, analyzing a frequency of use of the smart object, and analyzing at least one statement spoken by a user to the smart object.”

	In response, the Examiner respectfully notes that Watanabe does disclose “a frequency of use”. This argument is disclosed to apply to original and amended independent and dependent claims 1, 8, 10, 17, and 19. This can be seen in Watanabe column 17 lines 27-31 “the ASR device may also apply the user's history and/or spoken preferences for other devices to the new device if appropriate, based on the overlapping functionality of the new device, the location of the new device, or other factors” and Watanabe column 18 lines 26-36 “For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police.”
	The Examiner respectfully notes that, applying the user’s history from Watanabe column 17 for other devices acknowledges the frequency of use of the smart object. One function that the ASR system allows for when replacing a device is that the new device is able to understand the frequency of use of the old device that it is replacing in order to have a more seamless transition. This is used later on when the device is considering words the user is uttering as seen in Watanabe column 18 and actions it should take. Based on a user's previous behavior interacting with the smart objects around them, where the speech command is received, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police.” One factor that weighs into this decision is for the ASR device to understand how frequently the user utilizes the speech to call function of one device versus the device which is designed to play music (mentioned before as the user history of each device). The Examiner respectfully disagrees with the Applicant’s position, and as such, Applicant’s arguments are found not persuasive.

In response to the art rejections of the remainder of dependent claims 2-7, 9, 11-16, 18, and 20 are rejected under 35 U.S.C. 102  and 35 U.S.C. 103 in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 6/14/2022, Examiner respectfully notes as follows. For completeness, the mentioned claims are likewise traversed for similar reasons to independent claims 1, 10, and 19 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 2-7, 9, 11-16, 18, and 20 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and as such, Applicant’s arguments are also found not persuasive.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 5-12, and 14-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Watanabe et al. (US 9443527 B1).

	Regarding Claim 1, Watanabe teaches a computer-implemented method for providing smart object virtual communication, comprising: analyzing at least one function of a smart object; (Watanabe - Column 2 Lines 1-4. To perform speech control of non-ASR devices (smart objects), one or more ASR devices (the method) may be configured to learn the capabilities (functions) of one or more non-ASR devices (smart objects) and how those capabilities (functions) are controlled in each non-ASR device (smart object).) analyzing a frequency of use of the smart object; (Watanabe - Column 18 Lines 11 - 40. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects in considered when taking spoken commands in certain locations) analyzing at least one statement spoken by a user to the smart object; (Watanabe - Column 6 Lines 9-16. The ASR module 214 includes an acoustic front end (AFE) 216, a speech recognition engine 218, and speech storage 220. The AFE 216 transforms audio data into data for processing by the speech recognition engine 218. The speech recognition engine 218 compares the speech recognition data with the acoustic, language, and other data models and information stored in the speech storage 220 for recognizing the speech contained in the original audio data.) and controlling the smart object to virtually communicate with the user based on at least one of: the at least one statement spoken by the user, the at least one function of the smart object, and the frequency of use of the smart object. (Watanabe - Column 1 Lines 65-67. As described here, by connecting ASR devices with non-ASR devices, a system may be configured to control non-ASR devices with speech commands. For frequency of use Watanabe - Column 18 Lines 11 – 40 is considered. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects is considered when taking spoken commands in certain locations.).

	Regarding Claim 2, Watanabe teaches all of the limitations of Claim 1. Watanabe also teaches the computer-implemented method of claim 1, further including training a neural network on at least one instance of virtual communication between the user and the smart object, (Watanabe - Column 6 Lines 45-51. A number of approaches may be used by the AFE 216 to process the audio data. Such approaches may include using mel-frequency cepstral coefficients (MFCCs), perceptual linear predictive (PLP) techniques, neural network feature vector techniques, linear discriminant analysis, semi-tied covariance matrices, deep belief networks or other approaches known to those of skill in the art.) [[the]] at least one function of the smart object, (Watanabe - Column 2 Lines 1-4. To perform speech control of non-ASR devices (smart objects), one or more ASR devices (the method) may be configured to learn the capabilities (functions) of one or more non-ASR devices (smart objects) and how those capabilities (functions) are controlled in each non-ASR device (smart object).) the frequency of use of the smart object, (Watanabe - Column 18 Lines 11 - 40. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects is considered when taking spoken commands in certain locations.) and user customization information with respect to the smart object (Watanabe - Column 16 Lines 61-67 and Column 17 Lines 1-7. In one aspect the non-ASR device (smart object) may also communicate an available catalog of media (user customization information) to the ASR device for purposes of refining speech control. For example, a non-ASR media player may indicate to the ASR device the names of songs, artists, movies, television shows, etc. available to the media player, thus enabling the ASR device to configure a grammar/NLP settings (command phrases) package based on the catalog of available media. In another aspect, the grammar used by the ASR device to control the functionality of the media player may be separate from the grammar used by the ASR device to refer to specific media. In this manner separate grammars may be constructed/updated and shared across ASR devices allowing for customized, and possibly more easily updated, speech controls for the non-ASR media player.).

Regarding Claim 3, Watanabe teaches all of the limitations of Claim 2. Watanabe also teaches the computer-implemented method of claim 2, wherein the user customization information includes at least one of: an object name selected by the user that pertains to the smart object, a trigger phrase, [[and]] or a command phrase that is pre-programmed to command the smart object to enable a particular function (Watanabe - Column 17 Lines 17-21. After the new device is added to the network, as the ASR device (computer implemented method) incorporates (or creates) a grammar (user customization info) for the new device, the ASR device may request that the user provide spoken examples of commands (trigger phrase) that will be used to operate the new device.).

Regarding Claim 5, Watanabe teaches all of the limitations of Claim 2. Watanabe also teaches the computer-implemented method of claim 2, wherein analyzing the at least one function includes accessing the neural network and determining at least one function code that is associated with the smart object, wherein the at least one function code is analyzed to determine [[the]] at least one function of the smart object (Watanabe - Column 16 Lines 61-67 and Column 17 Line 1. In one aspect the non-ASR device (smart object) may also communicate an available catalog of media (user customization information) to the ASR device for purposes of refining speech control. For example, a non-ASR media player may indicate to the ASR device the names of songs, artists, movies, television shows, etc. (function codes) available to the media player, thus enabling the ASR device to configure a grammar/NLP settings (command phrases/ function) package based on the catalog of available media.).

Regarding Claim 6, Watanabe teaches all of the limitations of Claim 5. Watanabe also teaches the computer-implemented method of claim 5, wherein analyzing the frequency of use includes accessing the neural network and analyzing at least one time stamp that is associated with utilization of at least one function of the smart object by the user (Watanabe - Column 6 Lines 36-43. The feature vector may represent different qualities of the audio data within the frame. FIG. 4 shows a digitized audio data waveform 402, with multiple points 406 of the first word 404 as the first word 404 is being processed. The audio qualities of those points may be stored into feature vectors. Feature vectors may be streamed or combined into a matrix that represents a time period of the spoken utterance.).

Regarding Claim 7, Watanabe teaches all of the limitations of Claim 1. Watanabe also teaches the computer-implemented method of claim 1, wherein analyzing the at least one statement spoken by the user includes sensing a voice and determining human speech patterns to be analyzed to determine one or more statements that are communicated to the smart object by the user (Watanabe - Column 6 Lines 9-16. The ASR module 214 includes an acoustic front end (AFE) [to sense the voice] 216, a speech recognition engine [to analyze and determine] 218, and speech storage 220. The AFE 216 transforms audio data into data for processing by the speech recognition engine 218. The speech recognition engine 218 compares the speech recognition data with the acoustic, language, and other data models and information stored in the speech storage 220 for recognizing the speech contained in the original audio data. This analyzation of the statement spoken by the user could also be seen in Figures 4-6 of Watanabe.).

Regarding Claim 8, Watanabe teaches all of the limitations of Claim 1. Watanabe also teaches the computer-implemented method of claim 1, wherein analyzing the at least one statement spoken includes determining Watanabe - Column 18 Lines 11 - 40. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects in considered when taking spoken commands in certain locations.).

Regarding Claim 9, Watanabe teaches all of the limitation of Claim 1. Watanabe also teaches the computer-implemented method of claim 1, wherein controlling the smart object to virtually communicate with the user includes providing a synthesized voice of the smart object that is based on at least one factor that includes at least one of: an age of the smart object, Watanabe - Column 15 Lines 39 - 67. The ASR device may then take the identity/functionality information and communicate with a central storage device that tracks non-ASR device types/functionality and stores corresponding ASR grammars and NLP settings which may be used by ASR devices to receive and process speech commands for the respective non-ASR devices. The ASR device may send the central storage device an identity (such as model number, serial number, or other identifier) for the central storage device to cross reference when looking up to see if the non-ASR device type is recognized. If the device identity is recognized (906, yes), the central storage device then sends the ASR device one or more grammars and/or NLP settings packages (908) associated with the non-ASR device(s) to the ASR device to store in storage 228 and incorporate into its grammar configuration module 230, as shown in block 910. The grammar and/or NLP settings package may also be obtained from the non-ASR device itself, should the non-ASR device have such information available. The grammar and/or NLP settings package may also be obtained from a different source. With that additional information incorporated, the ASR device may be configured to receive and process speech commands for the new device. When commands for the new device are received, the ASR device may perform ASR/NLP processing and communicate the results of that processing using commands recognized by the new non-ASR device to the new device through the appropriate communication protocol, such as over a network, using IR commands (with an appropriate IR transmitter), etc. In other words, the computer implemented method can take the function of the smart object with the command spoken by the user to communicate the results with a synthesized voice for the smart object.).

Regarding Claim 10, Watanabe teaches a system for providing smart object virtual communication, comprising: a memory storing instructions when executed by a processor (Watanabe - Column 3 Lines 52-67 and Column 4 Lines 1-3. The device 100 includes one or more controllers/processors 204 for processing data and computer-readable instructions, and a memory 206 for storing data and processor-executable instructions. The memory 206 may include volatile random access memory (RAM), non-volatile read only memory (ROM) or flash memory, and/or other types of memory. Also included is a non-volatile data storage component 208, for storing data and instructions. The data storage component 208 may include one or more storage types of non-volatile storage such as magnetic storage, optical storage, solid-state storage, etc. The ASR device 100 may also be connected to removable or external memory and/or storage (such as a removable memory card, memory key drive, networked storage, etc.) through the input/output device interfaces 202. Data and instructions may be loaded selectively loaded into memory 206 from storage 208 at runtime, although instructions may also be embedded as firmware such as instructions stored the non-volatile flash or ROM.) to cause the processor to: analyze at least one function of a smart object (Watanabe - Column 2 Lines 1-4. To perform speech control of non-ASR devices (smart objects), one or more ASR devices (the method) may be configured to learn the capabilities (functions) of one or more non-ASR devices (smart objects) and how those capabilities (functions) are controlled in each non-ASR device (smart object).); analyze a frequency of use of the smart object (Watanabe - Column 18 Lines 11 - 40. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects in considered when taking spoken commands in certain locations.); analyze at least one statement spoken by a user to the smart object (Watanabe - Column 6 Lines 9-16. The ASR module 214 includes an acoustic front end (AFE) 216, a speech recognition engine 218, and speech storage 220. The AFE 216 transforms audio data into data for processing by the speech recognition engine 218. The speech recognition engine 218 compares the speech recognition data with the acoustic, language, and other data models and information stored in the speech storage 220 for recognizing the speech contained in the original audio data.); and control the smart object to virtually communicate with the user based on at least one of: Watanabe - Column 1 Lines 65-67. As described here, by connecting ASR devices with non-ASR devices, a system may be configured to control non-ASR devices with speech commands. For frequency of use Watanabe - Column 18 Lines 11 – 40 is considered. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects is considered when taking spoken commands in certain locations.).

Regarding Claim 11, Watanabe teaches all of the limitations of Claim 10. Claim 11 also teaches the system of claim 10, further including training a neural network on at least one instance of virtual communication between the user and the smart object (Watanabe - Column 6 Lines 45-51. A number of approaches may be used by the AFE 216 to process the audio data. Such approaches may include using mel-frequency cepstral coefficients (MFCCs), perceptual linear predictive (PLP) techniques, neural network feature vector techniques, linear discriminant analysis, semi-tied covariance matrices, deep belief networks or other approaches known to those of skill in the art.), [[the]] at least one function of the smart object (Watanabe - Column 2 Lines 1-4. To perform speech control of non-ASR devices (smart objects), one or more ASR devices (the method) may be configured to learn the capabilities (functions) of one or more non-ASR devices (smart objects) and how those capabilities (functions) are controlled in each non-ASR device (smart object).), the frequency of use of the smart object (Watanabe - Column 18 Lines 11 - 40. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects is considered when taking spoken commands in certain locations.), and user customization information with respect to the smart object (Watanabe - Column 16 Lines 61-67 and Column 17 Lines 1-7. In one aspect the non-ASR device (smart object) may also communicate an available catalog of media (user customization information) to the ASR device for purposes of refining speech control. For example, a non-ASR media player may indicate to the ASR device the names of songs, artists, movies, television shows, etc. available to the media player, thus enabling the ASR device to configure a grammar/NLP settings (command phrases) package based on the catalog of available media. In another aspect, the grammar used by the ASR device to control the functionality of the media player may be separate from the grammar used by the ASR device to refer to specific media. In this manner separate grammars may be constructed/updated and shared across ASR devices allowing for customized, and possibly more easily updated, speech controls for the non-ASR media player.).

Regarding Claim 12, Watanabe teaches all of the limitations of Claim 11. Watanabe also teaches the system of claim 11, wherein the user customization information includes at least one of: an object name selected by the user that pertains to the smart object, a trigger phrase, [[and]] or a command phrase that is pre-programmed to command the smart object to enable a particular function (Watanabe - Column 17 Lines 17-21. After the new device is added to the network, as the ASR device (computer implemented method) incorporates (or creates) a grammar (user customization info) for the new device, the ASR device may request that the user provide spoken examples of commands (trigger phrase) that will be used to operate the new device.).

Regarding Claim 14, Watanabe teaches all of the limitations of Claim 11. Watanabe also teaches the system of claim 11, wherein analyzing the at least one function includes accessing the neural network and determining at least one function code that is associated with the smart object, wherein the at least one function code is analyzed to determine [[the]] at least one function of the smart object (Watanabe - Column 16 Lines 61-67 and Column 17 Line 1. In one aspect the non-ASR device (smart object) may also communicate an available catalog of media (user customization information) to the ASR device for purposes of refining speech control. For example, a non-ASR media player may indicate to the ASR device the names of songs, artists, movies, television shows, etc. (function codes) available to the media player, thus enabling the ASR device to configure a grammar/NLP settings (command phrases/ function) package based on the catalog of available media.).

Regarding Claim 15, Watanabe teaches all of the limitations of Claim 14. Watanabe also teaches the system of claim 14, wherein analyzing the frequency of use includes accessing the neural network and analyzing at least one time stamp that is associated with utilization of at least one function of the smart object by the user (Watanabe - Column 6 Lines 36-43. The feature vector may represent different qualities of the audio data within the frame. FIG. 4 shows a digitized audio data waveform 402, with multiple points 406 of the first word 404 as the first word 404 is being processed. The audio qualities of those points may be stored into feature vectors. Feature vectors may be streamed or combined into a matrix that represents a time period of the spoken utterance.).

Regarding Claim 16, Watanabe teaches all of the limitations of Claim 10. Watanabe also teaches the system of claim 10, wherein analyzing the at least one statement spoken by the user includes sensing a voice and determining human speech patterns to be analyzed to determine one or more statements that are communicated to the smart object by the user (Watanabe - Column 6 Lines 9-16. The ASR module 214 includes an acoustic front end (AFE) [to sense the voice] 216, a speech recognition engine [to analyze and determine] 218, and speech storage 220. The AFE 216 transforms audio data into data for processing by the speech recognition engine 218. The speech recognition engine 218 compares the speech recognition data with the acoustic, language, and other data models and information stored in the speech storage 220 for recognizing the speech contained in the original audio data. This analyzation of the statement spoken by the user could also be seen in Figures 4-6 of Watanabe.).

Regarding Claim 17, Watanabe teaches all of the limitations of Claim 10. Watanabe also teaches the system of claim 10, wherein analyzing the at least one statement spoken includes determining at least one of: words, commands, requests, and questions that are spoken by the user that relate to Watanabe - Column 18 Lines 11 - 40. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects in considered when taking spoken commands in certain locations.). 

Regarding Claim 18, Watanabe teaches all of the limitations of Claim 10. Watanabe also teaches the system of claim 10, wherein controlling the smart object to virtually communicate with the user includes providing a synthesized voice of the smart object that is based on at least one factor that includes at least one of: an age of the smart object, Watanabe - Column 15 Lines 39 - 67. The ASR device may then take the identity/functionality information and communicate with a central storage device that tracks non-ASR device types/functionality and stores corresponding ASR grammars and NLP settings which may be used by ASR devices to receive and process speech commands for the respective non-ASR devices. The ASR device may send the central storage device an identity (such as model number, serial number, or other identifier) for the central storage device to cross reference when looking up to see if the non-ASR device type is recognized. If the device identity is recognized (906, yes), the central storage device then sends the ASR device one or more grammars and/or NLP settings packages (908) associated with the non-ASR device(s) to the ASR device to store in storage 228 and incorporate into its grammar configuration module 230, as shown in block 910. The grammar and/or NLP settings package may also be obtained from the non-ASR device itself, should the non-ASR device have such information available. The grammar and/or NLP settings package may also be obtained from a different source. With that additional information incorporated, the ASR device may be configured to receive and process speech commands for the new device. When commands for the new device are received, the ASR device may perform ASR/NLP processing and communicate the results of that processing using commands recognized by the new non-ASR device to the new device through the appropriate communication protocol, such as over a network, using IR commands (with an appropriate IR transmitter), etc. In other words, the computer implemented method can take the function of the smart object with the command spoken by the user to communicate the results with a synthesized voice for the smart object.).

Regarding Claim 19, Watanabe teaches a non-transitory computer readable storage medium storing instruction that, when executed by a computer, which includes a processor performing a method, the method comprising: analyzing at least one function of a smart object (Watanabe - Column 3 Lines 52-67 and Column 4 Lines 1-3. The device 100 includes one or more controllers/processors 204 for processing data and computer-readable instructions, and a memory 206 for storing data and processor-executable instructions. The memory 206 may include volatile random access memory (RAM), non-volatile read only memory (ROM) or flash memory, and/or other types of memory. Also included is a non-volatile data storage component 208, for storing data and instructions. The data storage component 208 may include one or more storage types of non-volatile storage such as magnetic storage, optical storage, solid-state storage, etc. The ASR device 100 may also be connected to removable or external memory and/or storage (such as a removable memory card, memory key drive, networked storage, etc.) through the input/output device interfaces 202. Data and instructions may be loaded selectively loaded into memory 206 from storage 208 at runtime, although instructions may also be embedded as firmware such as instructions stored the non-volatile flash or ROM.); analyzing a frequency of use of the smart object; (Watanabe - Column 18 Lines 11 - 40. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects in considered when taking spoken commands in certain locations) analyzing at least one statement spoken by a user to the smart object; (Watanabe - Column 6 Lines 9-16. The ASR module 214 includes an acoustic front end (AFE) 216, a speech recognition engine 218, and speech storage 220. The AFE 216 transforms audio data into data for processing by the speech recognition engine 218. The speech recognition engine 218 compares the speech recognition data with the acoustic, language, and other data models and information stored in the speech storage 220 for recognizing the speech contained in the original audio data.) and controlling the smart object to virtually communicate with the user based on at least one of: or the frequency of use of the smart object. (Watanabe - Column 1 Lines 65-67. As described here, by connecting ASR devices with non-ASR devices, a system may be configured to control non-ASR devices with speech commands. For frequency of use Watanabe - Column 18 Lines 11 – 40 is considered. As the ASR device incorporates grammars for processing speech commands for multiple devices (including itself), a weighting system may be incorporated to adjust the internal weights applied during ASR and NLP processing in an attempt to more accurately capture a user's speech and intended command. For example, when processing a speech input to determine whether the speech included the word “bake” or “take”, certain paths for ASR processing may be weighted depending on whether the user is likely to be entering commands for an oven under the present conditions. For example, words for commands directed to an oven may be weighted lower than words for commands for a music player when a spoken command is received from a microphone located in a family room (while the reverse may be true for a spoken command coming from a kitchen). Weighting may also be applied to NLP portions of processing. For example, a user may say “turn it down” when intending to lower a volume of a music player, when intending to dim lights in a certain room, or when intending to adjust a temperature. Based on a user's previous behavior, where the speech command is received, the most recent user command, etc. the ASR device may weigh the incoming speech and interpret the received command in different ways. Similarly, a spoken command of “call up the police” may be interpreted to initiate a telephone call with law enforcement or to play songs by the band The Police. Various conditions may be evaluated to push ASR/NLP results in one direction or another depending on what the ASR device deems the most likely command intended by the user. In other words, frequency of use of the smart objects is considered when taking spoken commands in certain locations.).

Regarding Claim 20, Watanabe teaches all of the limitations of Claim 19. Watanabe also teaches the non-transitory computer readable storage medium of claim 19, wherein controlling the smart object to virtually communicate with the user includes providing a synthesized voice of the smart object that is based on at least one factor that includes at least one of: an age of the smart object, or a priority of communication assigned to the smart object (Watanabe - Column 15 Lines 39 - 67. The ASR device may then take the identity/functionality information and communicate with a central storage device that tracks non-ASR device types/functionality and stores corresponding ASR grammars and NLP settings which may be used by ASR devices to receive and process speech commands for the respective non-ASR devices. The ASR device may send the central storage device an identity (such as model number, serial number, or other identifier) for the central storage device to cross reference when looking up to see if the non-ASR device type is recognized. If the device identity is recognized (906, yes), the central storage device then sends the ASR device one or more grammars and/or NLP settings packages (908) associated with the non-ASR device(s) to the ASR device to store in storage 228 and incorporate into its grammar configuration module 230, as shown in block 910. The grammar and/or NLP settings package may also be obtained from the non-ASR device itself, should the non-ASR device have such information available. The grammar and/or NLP settings package may also be obtained from a different source. With that additional information incorporated, the ASR device may be configured to receive and process speech commands for the new device. When commands for the new device are received, the ASR device may perform ASR/NLP processing and communicate the results of that processing using commands recognized by the new non-ASR device to the new device through the appropriate communication protocol, such as over a network, using IR commands (with an appropriate IR transmitter), etc. In other words, the computer implemented method can take the function of the smart object with the command spoken by the user to communicate the results with a synthesized voice for the smart object.).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Watanabe et al. (US 9443527 B1) in view of Meaney et al. (US 9484030 B1).

	Regarding Claim 4, Watanabe teaches all of the limitations of Claim 2, but fails to teach further including determining that the user is within a predetermined proximity of the smart object, wherein vocal data associated with the at least one statement spoken by the user is analyzed in comparison with stored voice identification data associated with the user to determine that the user is within the predetermined proximity of the smart object. However, Watanabe does teach the importance of taking proximity to the devices into account when considering the spoken command when talking about the "bake" or "take", "turn it down", and "call up the police" in Column 18 Lines 11-40. The addition of the proximity data would help the computer implemented method to further improve understanding in analyzing the user statement. Additionally, Meaney teaches a way for the microphone array on the smart object or method to determine proximity and comparing the audio data to the stored voice identification data associated with the user. (Meaney - Column 23 Lines 49-59. Approximate distance to a sound's point of origin may be performed acoustic localization based on time and amplitude differences between sounds captured by different microphones of the array. The microphone 104 may be configured to capture audio, including speech including an utterance. The device 110 (using microphone 104, audio processing component 220, etc.) may be configured to determine audio data corresponding to the captured audio. The device 110 (using input/output device interfaces 1102, antenna 1114, etc.) may also be configured to transmit the audio data to server 120 for further processing.)
	Watanabe and Meaney are both considered to be analogous to the claimed invention because both relate to interpreting user intent to engage in natural dialogue to accomplish complex tasks. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Watanabe on analyzing the importance of the proximity of user to the smart devices and then use the teachings based on Meaney to improve approximate location data and then comparing that data to the stored voice identification data. (Meaney - Column 23 Lines 49-59. Approximate distance to a sound's point of origin may be performed acoustic localization based on time and amplitude differences between sounds captured by different microphones of the array. The microphone 104 may be configured to capture audio, including speech including an utterance. The device 110 (using microphone 104, audio processing component 220, etc.) may be configured to determine audio data corresponding to the captured audio. The device 110 (using input/output device interfaces 1102, antenna 1114, etc.) may also be configured to transmit the audio data to server 120 for further processing.)

Regarding Claim 13, Watanabe teaches all of the limitations of Claim 11, but fails to teach further including determining that the user is within a predetermined proximity of the smart object, wherein vocal data associated with the at least one statement spoken by the user is analyzed in comparison with stored voice identification data associated with the user to determine that the user is within the predetermined proximity of the smart object. However, Watanabe does teach the importance of taking proximity to the devices into account when considering the spoken command when talking about the "bake" or "take", "turn it down", and "call up the police" in Column 18 Lines 11-40. The addition of the proximity data would help the computer implemented method to further improve understanding in analyzing the user statement. Additionally, Meaney teaches a way for the microphone array on the smart object or method to determine proximity and comparing the audio data to the stored voice identification data associated with the user. (Meaney - Column 23 Lines 49-59. Approximate distance to a sound's point of origin may be performed acoustic localization based on time and amplitude differences between sounds captured by different microphones of the array. The microphone 104 may be configured to capture audio, including speech including an utterance. The device 110 (using microphone 104, audio processing component 220, etc.) may be configured to determine audio data corresponding to the captured audio. The device 110 (using input/output device interfaces 1102, antenna 1114, etc.) may also be configured to transmit the audio data to server 120 for further processing.)
	Watanabe and Meaney are both considered to be analogous to the claimed invention because both relate to interpreting user intent to engage in natural dialogue to accomplish complex tasks. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Watanabe on analyzing the importance of the proximity of user to the smart devices and then use the teachings based on Meaney to improve approximate location data and then comparing that data to the stored voice identification data. (Meaney - Column 23 Lines 49-59. Approximate distance to a sound's point of origin may be performed acoustic localization based on time and amplitude differences between sounds captured by different microphones of the array. The microphone 104 may be configured to capture audio, including speech including an utterance. The device 110 (using microphone 104, audio processing component 220, etc.) may be configured to determine audio data corresponding to the captured audio. The device 110 (using input/output device interfaces 1102, antenna 1114, etc.) may also be configured to transmit the audio data to server 120 for further processing.)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Proctor et al. (US-9554061-B1), Chen et al. (US-9547980-B2), NPL - A. S. Sharma and R. Bhalley, "ASR — A real-time speech recognition on portable devices," 2016 2nd International Conference on Advances in Computing, Communication, & Automation (ICACCA) (Fall), 2016, pp. 1-4, doi: 10.1109/ICACCAF.2016.7749004. (Year: 2016), and NPL - Z. Kozhirbayev, B. A. Erol, A. Sharipbay and M. Jamshidi, "Speaker Recognition for Robotic Control via an IoT Device," 2018 World Automation Congress (WAC), 2018, pp. 1-5, doi: 10.23919/WAC.2018.8430295. (Year: 2018).
Proctor et al. (US-9554061-B1) discloses an invention that features “an intelligent hub for interfacing with other devices.” (Proctor – Abstract).
Chen et al. (US-9547980-B2) discloses an invention that features “a smart controlling method applied to a smart home system for controlling a number of home appliances.” (Chen – Abstract).
A. S. Sharma and R. Bhalley, "ASR — A real-time speech recognition on portable devices," 2016 2nd International Conference on Advances in Computing, Communication, & Automation (ICACCA) (Fall), 2016, pp. 1-4, doi: 10.1109/ICACCAF.2016.7749004. (Year: 2016) discloses “the implementation of real-time automatic speech recognition (ASR) for portable devices.” (Sharma – Abstract).
	Z. Kozhirbayev, B. A. Erol, A. Sharipbay and M. Jamshidi, "Speaker Recognition for Robotic Control via an IoT Device," 2018 World Automation Congress (WAC), 2018, pp. 1-5, doi: 10.23919/WAC.2018.8430295. (Year: 2018) discloses  that “the speaker identification method has been extensively appealing for its broad application in many fields, such as smart environments, … [and the paper] present[s] a novel model to increase the recognition accuracy of the short utterance speaker recognition system. We developed a technique to train a Neural Network (NN) on the extracted Mel-Frequency Cepstral Coefficient (MFCC) features from audio samples. Therefore, the recognition system gains the significant accuracy.” (Kozhirbayev – Abstract).
	Please, see additional references in form PTO-892 for more details.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to UTHEJ KUNAMNENI whose telephone number is (571)272-5428. The examiner can normally be reached M-F 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/UTHEJ KUNAMNENI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656