Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence 02/04/22 regarding application 16/633,792, in which claims 1-8 and 10-16 were amended. Claims 1-16 are pending and have been considered.

Response to Arguments
The amended title of the invention overcomes the objection for not being descriptive, so the objection is withdrawn.
  Amended claims 4 and 7 overcome the objections for minor informalities, and so they are withdrawn.
Applicant’s arguments on pages 7-8 regarding the 35 U.S.C. 103 rejections based on Lovitt, Yu and Mutagi have been considered but are moot in view of the new grounds for rejection, necessitated by Applicant’s amendments. However, the examiner disagrees with and believes it may be helpful to address two of Applicant’s assertions in the arguments. 
First, Applicant argues that “claim 16 is patentable for similar reasons” to those for claim 1, but claim 16 has not been amended to include the “voice quality model including a voice waveform or a frequency spectrum”. The amendments to independent claim 16 merely delete instances of “talking identification unit” and “voice operation recognition unit”, as well as changing instances of “a talker” to “the talker”. Claim 16 remains unpatentable over Lovitt in view of Yu for the same reasons as before.
Second, with regard to claim 1, Applicant argues that “Yu just describes… a learning model for a neural network that does not include a voice waveform or a frequency spectrum”. The examiner disagrees. As Yu describes at Col 4 lines 52-63, the Yu model computes Mel-Frequency Cepstrum Coefficients by exaction from 26 filter-bank channels. As an artisan in the field of speech processing would have understood, the Mel-frequency Cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency. As Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC, computation of a “frequency spectrum” is inherently part of any MFCC production. Nonetheless, the rejection of claim 1 has been modified to include the newly discovered reference to Mitsufuji, which explicitly describes the frequency spectrum.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8 and 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Lovitt (2016/0125879) in view of Mitsufuji et al. (2015/0058015).

Consider claim 1, Lovitt discloses a voice operation system comprising: a memory storing instructions (memory, [0041]), and a processor configured to execute the instructions (processors executing instructions, [0039]) to: identify the user as a talker of a voice operation, based on a voice information, a voice model of a user registered in advance and an auxiliary information (first user 108 speaks “Hey Assistant” followed by the command “please book a meeting for today at 3pm for me and Joh for one hour.”, Computing system 102 detects the key phrase from acoustic data and uses the key phrase to recognize the identity of the speaker making command, also using contextual environmental data, [0013], template matching of audio streams by comparison against a model for the user, i.e. a “user registered in advance”, [0028]); and perform voice recognition on the voice information and generate voice operation information, wherein the auxiliary information are at least one of the voice operation information, position information on a voice operation system, direction information on a talker, distance information on a talker, and time information (recognizing and performing the action command if the user is identified, above the probability threshold, [0025], the computing system performing the voice identification may collect additional acoustic data from before and/or after the utterance of the key phrase, i.e. “voice operation information”. The additional acoustic data may be analyzed to determine if the speaker identified by the voice identification was also speaking (or not speaking) before or after the key phrase was uttered. If the analysis indicates the speaker was speaking before or after the utterance of the key phrase, then it may be determined to be more probable that the identified speaker did speak the key phrase. It will be understood that any environmental information that may be detected via sensing devices and that may be indicative of an identity of a person who 
Lovitt does not specifically mention a voice quality model, the voice quality model including a voice waveform or a frequency spectrum.
Mitsufuji discloses a voice quality model, the voice quality model including a voice waveform or a frequency spectrum (voice quality model uses spectral envelope provided by time frequency conversion units, i.e. “frequency spectrum”, [0128-0130]). 
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.
 In combination, Lovitt performs the same function as it does separately of user recognition via speech. Mitsufuji performs the same function as it does separately of evaluating voice quality using a voice quality model including a voice waveform or a frequency spectrum. 
Therefore one of ordinary skill in the art could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately.
The results of the combination would have been predictable and resulted in modifying the invention of Lovitt to include evaluating voice quality using a voice quality model including a voice waveform or a frequency spectrum, as disclosed by Mitsufuji thereby improving the user recognition of Lovitt by additionally evaluating the quality of the voice signal, overcoming the difficulty of accurately identifying the speaker of the key phrase using the key phrase data alone, as discussed by Lovitt ([0014]). Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 




Consider claim 2, Lovitt discloses: a microphone that converts voice vibration into the voice information (microphone 310, [0045]); wherein the voice model of the user is registered in advance in registered user information (user profile, [0024]). 
Lovitt does not specifically mention a voice quality model.
Mitsufuji discloses a voice quality model (voice quality model, [0128-0130]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt by using Mitsufuji’s voice quality model as or in addition to the voice model in Lovitt for reasons similar to those for claim 1. 

Consider claim 3, Lovitt discloses the voice model is one of a plurality of voice models on a per-user basis in the registered user information (probability for a selected user of the one or more possible users, [0049], associated with a particular user profile, [0024]), and wherein the processor is further configured to execute the instructions to select the voice model in accordance with the auxiliary information (based on other environmental sensor data collected, [0009]). 
Lovitt does not specifically mention a voice quality model.
Mitsufuji discloses a voice quality model (voice quality model, [0128-0130]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt by using Mitsufuji’s voice quality model as or in addition to the voice model in Lovitt for reasons similar to those for claim 1. 


Lovitt does not specifically mention a voice quality model.
Mitsufuji discloses a voice quality model (voice quality model, [0128-0130]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt by using Mitsufuji’s voice quality model as or in addition to the voice model in Lovitt for reasons similar to those for claim 1. 

Consider claim 5, Lovitt discloses the processor is further configured to execute the instructions to calculate a similarity between the voice information and the voice model and identify the talker based on the similarity (comparing the audio streams, [0028]). 
Lovitt does not specifically mention a voice quality model.
Mitsufuji discloses a voice quality model (voice quality model, [0128-0130]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt by using Mitsufuji’s voice quality model as or in addition to the voice model in Lovitt for reasons similar to those for claim 1. 

Consider claim 6, Lovitt discloses the processor is further configured to execute the instructions to identify the user having a highest similarity as the talker of the voice operation (the identified user has the highest probability of all possible users, [0024]). 


Consider claim 8, Lovitt discloses the registered user information has a correction value of the similarity on a per-user basis, and wherein the processor is further configured to execute the instructions to reflect a result of identifying the talker in the correction value and learn a correlation between the auxiliary information and the correction value (learning adjustments to probability in real time based on the audio history, [0031]). 


Consider claim 11, Lovitt discloses a ranging sensor that acquires a distance to the talker of the voice operation as the distance information, wherein the processor is further configured to execute the instructions to identify the talker by using the distance information (using a depth camera, [0011]). 
Consider claim 12, Lovitt discloses a clock that acquires utterance time of the voice operation as the time information, wherein the processor is further configured to execute the instructions to identify the talker by using the time information (determining the identified user was speaking within a window 

Consider claim 13, Lovitt discloses a GPS device that acquires a position of the voice operation system as the position information, wherein the processor is further configured to execute the instructions to selects the voice model in accordance with the position information (location information via GPS, [0021]). 
Lovitt does not specifically mention a voice quality model.
Mitsufuji discloses a voice quality model (voice quality model, [0128-0130]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt by using Mitsufuji’s voice quality model as or in addition to the voice model in Lovitt for reasons similar to those for claim 1. 

Consider claim 14, Lovitt discloses the registered user information further includes schedule information on the user, and wherein the processor is further configured to execute the instructions to identify the talker by further using the schedule information (schedule information, [0022]). 
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Lovitt (2016/0125879) in view of Mitsufuji et al. (2015/0058015), in further view of Decanne (2016/0072915).

Consider claim 9, Lovitt discloses the registered user information includes information unique to the user (user profile, [0024]), and wherein the processor is further configured to execute the instructions to correct the similarity in accordance with the information included in the voice operation information (if the analysis of the additional acoustic data indicates the user was not speaking before or 
Lovitt and Mitsufuji do not specifically mention a keyword unique to the user.
Decanne discloses a keyword unique to the user (user-specific unique keyword, [0046]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt and Mitsufuji by including a keyword unique to the user in order to better individualize content, as suggested by Decanne ([0003]).


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Lovitt (2016/0125879) in view of Mitsufuji et al. (2015/0058015), in further view of Mohideen et al. (2015/0058003).

Consider claim 10, Lovitt discloses the microphone is an array microphone (microphone array, [0011]) and acquiring a location of the talker of a voice operation as the location information (location data, [0009]), and wherein the processor is further configured to execute the instructions to identify the talker by using the location information ([0009]). 
Lovitt and Mitsufuji do not specifically mention an array microphone acquires a direction of a talker.
Mohideen discloses an array microphone acquires a direction of a talker (the microphone array determines a direction from the array 120 to the speaker 110, [0010]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt and Mitsufuji such that an array microphone acquires a direction of a talker in order to avoid noise interference, as suggested by Mohideen ([0006]).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Lovitt (2016/0125879) in view of Mitsufuji et al. (2015/0058015), in further view of Mutagi et al. (10,418,033).

Consider claim 15, Lovitt and Mitsufuji do not, but Mutagi discloses the registered user information has information customized in accordance with user preference on a per-user basis, wherein the processor is further configured to execute the instructions to perform an operation process in accordance with the user preference corresponding to the voice operation information, and wherein the voice operation system further comprises a speaker that provides a notification of a performance result by voice (custom output of weather results via TTS according to user preferences, Col 3 lines 4-26).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt and Mitsufuji such that the registered user information has information customized in accordance with the user's preference on the user basis, a control calculation unit that wherein the processor is further configured to execute the instructions to perform an operation process in accordance with the user's preference corresponding to the voice operation information, and wherein the voice operation system further comprises a speaker that provides a notification of a performance result by voice in order to improve human computer interactions, as suggested by Mutagi (Col 1 lines 18-20).


Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Lovitt (2016/0125879) in view of Yu et al. (10,283,142).


Consider claim 16, Lovitt discloses a control method of a voice operation apparatus that based on voice information and a voice model of a user registered in advance, identifies the user as a talker of a voice operation and performs voice recognition on the voice information, and generates voice operation information (first user 108 speaks “Hey Assistant” followed by the command “please book a meeting for today at 3pm for me and Joh for one hour.”, Computing system 102 detects the key phrase from acoustic data and uses the key phrase to recognize the identity of the speaker making command, also using contextual environmental data, [0013], template matching of audio streams by comparison against a model for the user, i.e. a “user registered in advance”, [0028]), the control method comprising: identifying the talker by using, as auxiliary information, at least one of the voice operation information, position information on a voice operation apparatus, direction information on the talker, distance information on the talker, and time information (recognizing and performing the action command if the user is identified, above the probability threshold, [0025], the computing system performing the voice identification may collect additional acoustic data from before and/or after the utterance of the key phrase, i.e. “voice operation information”. The additional acoustic data may be analyzed to determine if the speaker identified by the voice identification was also speaking (or not speaking) before or after the key phrase was uttered. If the analysis indicates the speaker was speaking before or after the utterance of the key phrase, then it may be determined to be more probable that the identified speaker did speak the key phrase. It will be understood that any environmental information that may be detected via sensing devices and that may be indicative of an identity of a person who utters a key phrase may be utilized. Examples include, but are not limited to, acoustic data, image data (such as infrared image data, visible image data, depth image data, I.e. “distance information on a talker”, etc.), location data, motion data, and olfactory data, [0009]).
Lovitt does not specifically mention a voice quality model.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lovitt by using Yu’s voice quality model as or in addition to the voice model in Lovitt in order to leverage contextual information, as suggested by Yu (Col 1 lines 25-27). 


Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                                  02/17/22