Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION


Response to Arguments
Applicant's arguments with respect to claims 1, 10, and 19 have been considered but are moot in view of the new ground(s) of rejection. Shams has been introduced which teaches external services including a music playing system dedicated using examples of services specifically focused on Siri, Cortana, etc.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 10-16, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Phillips et al. US 20110066634 A1 (hereinafter Phillips) in view of .
Re claims 1, 10, and 19, Phillips teaches
1. A network microphone device comprising: one or more microphones; a network interface; one or more processors; memory comprising tangible, non-transitory computer-readable media storing instructions executable by the one or more processors to cause the network microphone device to perform operations comprising: 
receiving, via the one or more microphones, voice data indicating a voice input; (voice input 0091, 0093, 0097, 0099, 0172, with fig. 1, 7a, and 7b)
identifying… from among a plurality of voice services…(voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input 0091, 0093, 0097, 0099, 0172, with fig. 1, 7a, and 7b) 

However, Phillips while teaching speech recognition and performing commands with wakeup words thereof, fails to teach
wherein the received voice data includes a first portion representing an activation word and a second portion representing a voice command; (Sharifi ok computer = activation and “remind me to by milk” = command one device shows processing and the others cease, update=improve wherein data must be used for an improvement/adaptation etc, 0035)
… a voice service to process the voice input, wherein the identifying comprises determining a closest match of the first portion of the received voice data representing the activation word (Sharifi best score, listening for voice input to determine which voice service to select to process the audio with ASR such that other devices cease operation or selection per se as in fig. 1 and 0033, 0042, 0035 servers, 0045, music analogously 0047 ok computer = activation and “remind me to by milk” = command one device shows processing and the others cease, update=improve wherein data must be used for an improvement/adaptation)
selecting, based on the determined closest match (Sharifi other devices cease operation or selection per se as in fig. 1 and 0033, 0042,0035 servers, and 0045)
transmitting, via the network interface, (Sharifi ok computer = activation and “remind me to by milk” = command one device shows processing and the others cease, update=improve wherein data must be used for an improvement/adaptation etc, 0035)
receiving, from the identified voice service, an indication whether the first portion of the received voice data representing the activation (Sharifi indication, the device shows an active or processing state, ok computer = activation and “remind me to by milk” = command one device shows processing and the others cease, update=improve wherein data must be used for an improvement/adaptation etc, 0035 and e.g. 0034)
in response to the received indication whether the first portion of the received voice data representing the activation word was recognized by the identified voice service, updating the activation word data in the recognition dataset. (Sharifi ok computer = activation and “remind me to by milk” = command one device shows processing and the others cease, update=improve wherein data must be used for an improvement/adaptation etc, 0035)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by Sharifi to allow for more efficient processing of wakeup words such that devices can be accessed simultaneously for faster retrieval but withot processing commands until a hotword is matched, thereby speeding up the delivery of content, and comparing input audio to the best matching device when multiple devices are present to avoid simultaneous activation of an unwanted device such e.g. turning a thermostat off versus turning a tv off, wherein a keyword/hotword is used to identify the best voice service by comparing match probabilities which reduces system resource usage (only the service needed processes), processing time (discards unneeded device processing), and error e.g. incorrect device, and further for the use of physically present 

However the combination while teaching wake words with voice commands for the ability to control different devices or applications external to a user’s mobile device fails to teach
corresponding to one of a plurality of voice services (Shams external voice services, examples given with varying types of systems 0042, 0061, 0065, 0088, and fig. 4b)
wherein the plurality of voice services are externally registered to a media playback system associated with the networked microphone device. (Shams external voice services, media playback or music service is the platform to implement into one of said external services such as that embedded in an AI appliance or be a remotely located service 0042, 0061, 0065, 0088, and fig. 4b)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips in view of Sharifi to incorporate the above claim limitations as taught by Sham to allow for a device which is specific to trademarked or dedicated voice command services e.g. Siri or Cortana which can only be performed on those specific devices, thereby improving the combinations external servers to process pre-existing commands for an entire service thereby expanding the external resources of Phillips to include application services as in Shams, as well as 


Re claims 2, 11, and 20, Phillips teaches 
2. The network microphone device of claim 1, wherein identifying the voice service further comprises:60 PA TENT Attorney Docket No. 17-0303 determining a confidence score of the closest match of the voice data with the corresponding activation word data.  (best match… media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7b)
Analogous to claim 1, Phillips while teaching speech recognition and performing commands thereof, fails to teach
identifying, prior to performing speech recognition (Sharifi listening for voice input to determine which voice service to select to process the audio with ASR such that other devices cease operation or selection per se as in fig. 1 and 0033, 0042, 0035 servers, and 0045)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips to incorporate the above claim 


Re claims 3 and 12, Phillips teaches 
3. The network microphone device of claim 2, wherein the instructions stored on the memory further include instructions for: comparing the confidence score with a predetermined threshold score; and (confidence score threshold 0097… media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
transmitting, via the network interface, at least the portion of the received voice input data to the identified voice service only if the confidence score is greater than or (must meet threshold for service at different models for instance 0097 and 0093… media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
Analogous to claim 1, Phillips while teaching speech recognition and performing commands thereof, fails to teach
identifying, prior to performing speech recognition (Sharifi listening for voice input to determine which voice service to select to process the audio with ASR such that other devices cease operation or selection per se as in fig. 1 and 0033, 0042, 0035 servers, and 0045)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by Sharifi to allow for comparing input audio to the best matching device when multiple devices are present to avoid simultaneous activation of an unwanted device such e.g. turning a thermostat off versus turning a tv off, wherein a keyword/hotword is used to identify the best voice service by comparing match probabilities which reduces system resource usage (only the service needed processes), processing time (discards unneeded device processing), and error e.g. incorrect device, and further for the use of physically present devices wherein these devices can operate in the same capacity as hardware remote servers, while still 


Re claims 4 and 13, Phillips teaches 
4. The network microphone device of claim 3, wherein the predetermined threshold score has a first value if the closest match of the voice data with the corresponding activation word data is associated with a first voice service, and the predetermined threshold score has a second, different value if the closest match of the voice data with the corresponding activation word data is associated with a second voice service.  (comparing to each model will have a different score e.g. “play music” will have a higher score in the  music model than the GPS model… must meet threshold for service at different models for instance 0097 and 0093… media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
Analogous to claim 1, Phillips while teaching speech recognition and performing commands thereof, fails to teach
identifying, prior to performing speech recognition (Sharifi listening for voice input to determine which voice service to select to process the audio with ASR such that other devices cease operation or selection per se as in fig. 1 and 0033, 0042, 0035 servers, and 0045)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by Sharifi to allow for comparing input audio to the best matching device when multiple devices are present to avoid simultaneous activation of an unwanted device such e.g. turning a thermostat off versus turning a tv off, wherein a keyword/hotword is used to identify the best voice service by comparing match probabilities which reduces system resource usage (only the service needed processes), processing time (discards unneeded device processing), and error e.g. incorrect device, and further for the use of physically present devices wherein these devices can operate in the same capacity as hardware remote servers, while still utilizing further servers tied to physical devices to process the audio data, and additionally Sharifi improves Phillips such that it can process both audio as well as ASR results at different devices.


Re claims 5 and 14, Phillips teaches 
5. The network microphone device of claim 3, wherein the network microphone device is a first device of the media playback system, and wherein the instructions stored on the memory further include instructions for: 
transmitting, via the network interface, the input voice data to a second device of the media playback system, wherein the second device is configured to further analyze (if a second device is a server with hardware: if the internal score is low it transmits it to an external device with ASR at a server AND/OR sends to different model e.g. “play music” will have a higher score in the  music model than the GPS model … media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
However, in lieu of official notice, and when Phillips fails to teach in the instance that a second device is a not a server, Sharifi has been incorporated to read upon a second non-server physical device present to a user (Sharifi multiple second devices with multiple scores for threshold comparison 0031-0033 with fig. 1)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by Sharifi to allow for the use of physically present devices wherein these devices can operate in the same capacity as hardware remote servers, and additionally Sharifi improves Phillips such that it can process both audio as well as ASR results at different devices.


Re claims 6 and 15, Phillips teaches 

receiving, via the network interface from the second device, an indication of a second confidence score, wherein the second confidence score is greater than the first confidence score; (“play music” will have a higher score in the  music model than the GPS model … media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
comparing the second confidence score with the predetermined threshold score; and (comparison thereof, “play music” will have a higher score in the  music model than the GPS model … media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
transmitting, via the network interface, at least the portion of the received voice input data to the identified voice service only if the second confidence score is greater than or equal to the predetermined threshold score.  (sending results back to user as in complex fig. 7a or 7b and variants thereof… if the internal score is low it transmits it to an external AND/OR sends to different model e.g. “play music” will have a higher score in the  music model than the GPS model … media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
However, in lieu of official notice, and when Phillips fails to teach in the instance that a second device is a not a server, Sharifi has been incorporated to read upon a second non-server physical device present to a user (Sharifi multiple second devices with multiple scores for threshold comparison 0031-0033 with fig. 1)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by Sharifi to allow for the use of physically present devices wherein these devices can operate in the same capacity as hardware remote servers, and additionally Sharifi improves Phillips such that it can process both audio as well as ASR results at different devices.


Re claims 7 and 16, Phillips teaches 

receiving, via the network interface from the second device, an indication of a second confidence score, wherein the second confidence score is greater than the first confidence score; (“play music” will have a higher score in the  music model than the GPS model … media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
comparing the second confidence score with the predetermined threshold score; and (comparison thereof, “play music” will have a higher score in the  music model than the GPS model … media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)
outputting, via the transducer, a request for additional user voice input if the second confidence score is less than the predetermined threshold score.  (disambiguation as in complex fig. 7b then sending results back to user… if the internal score is low it transmits it to an external AND/OR sends to different model e.g. “play music” will have a higher score in the  music model than the GPS model … media, video, GPS, etc.) as well as external or internal device (i.e. best score to match a result to process the information as well as disambiguation if unclear… voice services e.g. SMS, browser, media playback, video playback, GPS, etc. all of which are on a media playback capable device… voice input including wakeup words or activation words as in “send SMS”… 0091, 0093, 0097, 0099, 0142, 0172, with fig. 1, 7a, and 7)62 PA TENT Attorney Docket No. 17-0303  


Claims 9, 18, 21, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Phillips et al. US 20110066634 A1 (hereinafter Phillips) in view of Sharifi US 20160104480 A1 (hereinafter Sharifi) and Shams and further in view of Sharifi; Matthew et al. US 20170110144 A1 (hereinafter Sharifi2).
Re claims 9, 18, 21, and 22, Phillips in view of Sharifi  fails to teach 
22. (New) The tangible, non-transitory computer-readable medium of claim 13, wherein updating the activation word data in the recognition dataset comprises adjusting the first value of the predetermined threshold score (Sharifi2 adjusting threshold i.e. threshold score based on input data to balance sensitivity of recognition for instance 0055-0056)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the system of Phillips in view of Sharifi to incorporate the above claim limitations as taught by Sharifi2 to allow for a margin of error depedending .



Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL COLUCCI whose telephone number is (571)270-1847.  The examiner can normally be reached on M-F 9 AM - 5 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571)-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL COLUCCI/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        (571)-270-1847
Examiner FAX:  (571)-270-2847
Michael.Colucci@uspto.gov