DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed.
Applicant’s title should be updated to reflect an invention as now claimed. 
The following title is suggested: Securely Executing Voice Actions with Speaker Identification and Authorization Code
The disclosure is objected to because of the following informalities:
In ¶[0001], Application Serial No. 16/308,570 should be updated as “now U.S. Patent No. 10,770,093 issued on 08 September 2020” and Application Serial No. 15/178,895 should be updated as “now U. S. Patent No. 10,127,926 issued on 13 November 2018”.
In ¶[0005], “(i) one or more values” should be “(ii) one or more values” because there is already “(i) a request”. 
In ¶[0006], “(i) one or more values” should be “(ii) one or more values” because there is already “(i) a request”.  
Appropriate correction is required.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 2, 4 to 6, 9 to 12, 14 to 16, and 19 to 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (U.S. Patent Publication 2015/0302856) in view of Wishne et al. (U.S. Patent Publication 2017/0330277).
(Note: Wishne et al. has support for its subject matter in the provisional application for an effective filing date of 11 December 2015.)
Concerning independent claims 1 and 11, Kim et al. discloses a method and apparatus for performing speech recognition, comprising:
“receiving[, at a voice action server,] from a user computing device, audio data representing a voice command spoken by a speaker” – a speaker 110 may speak a speech command (“a voice command spoken by a speaker”) associated with a function which may be performed by activated voice assistant application 130; mobile device 120 (“a user computing device”) may receive an input sound stream (“audio data”) which includes a speech command spoken by speaker 110 (¶[0025] - ¶[0026]: Figure 1); 
“obtaining[, by the voice action server,] based on the audio data received from the user computing device, a speaker identification result indicating that the voice 
“selecting[, by the voice action server,] a voice action based on a transcription of the audio data” – mobile device 120 may receive an input sound stream which includes the speech command, and voice assistant application 120 may recognize the speech command; once the speech command is recognized, voice assistant application 130 e.g., activating a banking application 140, a photo application 150, or a web browser application 160 (¶[0026] - ¶[0027]: Figure 1); upon receiving an input sound stream, speech recognition unit 410 may recognize the speech command from the received portion of the input stream; upon recognizing the speech command, speech recognition unit 410 may identify the function associated with the speech command to activate an associated application (¶[0050] - ¶[0051]: Figure 4); implicitly, speech recognition generates “a transcription of the audio data”; here, “a voice action” is a function associated with the speech command, e.g., a banking application 140, a photo application 150, or a web browser application 160;
“selecting[, by the voice action server,] from among a plurality of different service providers, a service provider that can perform the selected voice action” – speaker 110 may speak a speech command associated with a function which may be performed by activated voice assistant application 130, which is configured to perform any suitable number of functions, e.g., functions associated with accessing, controlling, or managing various applications, e.g., banking application 140, photo application 150, or web browser application 160 (¶[0025]: Figure 1); upon recognizing the speech command, speech recognition unit 410 may identify the function associated with the speech command to activate an associated application (¶[0051]: Figure 4); broadly, a banking application 140, a photo application 150, and a web browser 160 are being construed as ‘service providers’, so that “a service provider” is selected based on a function of a speech command (“the selected voice action”);
e.g.¸ “I want to check my bank account”, “Please show me my photos”, or “Open web browser” (¶[0038): Figure 2); storage unit 260 may store a lookup table which maps one or more words in a speech command to a specified function (¶[0062]: Figure 5: Step 550); here, “I want to check my bank account” is “a request to perform the selected voice action” with a banking application, “Please show me my photos” is “a request to perform the selected voice action” with a photo application, and “Open web browser” is “a request to perform the selected voice action” with a web browser application; voice assistant unit 242 may perform a function based on a security level; a security level may indicate whether or not the security level requests speaker verification for performing this function; when the security level requests speaker verification, voice assistant unit 242 may perform the associated function when a speaker of the speech command is verified as a user authorized to perform the function (¶[0053] - ¶[0054]: Figure 4); Figure 11 illustrates a plurality of functions including a function for performing a call application, a function for performing a web search, a function for taking a photo, where lookup tables 1110, 1120, and 1130 map a plurality of functions to a plurality of security levels for each application (¶[0083] - ¶[0085]: Figure 11); each application has functions, then, that may require speaker verification (“providing . . . to the selected service provider . . . the speaker verification result”);

“after providing the additional authorization request to the user computing device, providing[, by the voice action server,] to the user computing device, an indication that the selected service provider performed the selected voice action” – if a verification score exceeds a verification threshold for a verification keyword, voice assistant unit 242 or function control unit 440 may perform the function associated with the speech command (¶[0060]: Figure 4); broadly, if a function is performed associated with a request for a bank account balance, displaying photos, or opening a web browser, then there is “an indication that the selected service provider performed the selected voice action”; that is, “an indication” is displaying a bank account balance, displaying photos, or displaying a web browser on a voice assistant unit. 
Kim et al. generally discloses all of the limitations of these independent claims, but does not provide that these steps are performed “at a voice action server”.  Still, even if performing these steps “at a voice action server” is not disclosed by Kim et al., it is well known in speech recognition applications to distribute functionality between a client and a server.  Here, Kim et al. appears to perform all of these steps at mobile device 120 with voice assistant application 130, but a voice assistant application is commonly implemented through a remote server.
Concerning independent claims 1 and 11, Wishne et al. teaches any of the limitations omitted by Kim et al. as directed to performing these steps “at a voice action server”.  Generally, Wishne et al. teaches voice-controller account servicing, where a voice command is determined to be directed to a banking-related inquiry, and a request is transmitted for user authentication information.  Data indicative of the requested information is outputted (“providing . . . an indication that the selected service provider performed the selected voice action”), and responsive to a request to initiate payment from a banking account of the user to a third party, an electronic payment to the third party is initiated.  (Abstract)  Depending on a nature of a request, a server may output a response to a user device, which the user device can provide to the user as a verbal response to a display associated with the user device.  (¶[0016])  Specifically, computing device 220 may be operatively connected to one or more remote servers including voice recognition application server 215, authentication server 220, and a third party server 225 through network 201.  Voice recognition application server 215 can be configured to receive audio files from computing device 210 (“receiving, at a voice e.g., application 306 (“selecting, by the voice action server, a voice action based on a transcription of the audio data”).  VR APP 304 may transmit 313 at least a portion of the data file to a proper application for further processing the command, e.g., application 306 (“selecting, by the voice action server, from among a plurality of different service providers, a service provider that can perform the selected voice action”).  After receiving at least a portion of the data file, application 306 may transmit the at least a portion of the data file to an associated application server 220, which can be a server associated with computing device user’s 205 bank account.  (¶[0033] - ¶[0039]: Figures 2 to 3)  In some implementations, application server 220 may determine that based on the natural of the voice command, e.g., that the voice command relates to sensitive financial information, additional Wishne et al., then, teaches these limitations directed to “a voice action server” that receives a voice command, selects a voice action based on a transcription of audio data, selects a service provider, provides to a service provider that can perform the voice action a request to perform the voice action, provides to the user computing device a request for an explicit authorization code, and provides an indication to the user computing device that the selected voice action is performed.  An objective is provide an improved experience when accessing sensitive content of account information that enables users to interact with an account using natural language.  (¶[0004])  It would have been obvious to one having ordinary skill in the art to perform functionality at a voice action server as taught by Wishne et al. to verify if a user is authorized to perform a function based on a security level in Kim et al. for a purpose of improving an experience when accessing sensitive account information using natural language. 

Concerning claims 2 and 12, Kim et al. discloses that a speaker model database 264 includes one or more speaker models for use in verifying whether a speaker is an authorized user (¶[0033]: Figure 2); if voice activation unit 252 verifies that speaker as an authorized user based on the speaker model, voice activation unit 252 may activate voice assistant unit 242, and may generate an activation signal (¶[0037]: Figure 2); a verification score for an activation keyword may be compared with a verification threshold associated with the activation keyword; if the verification score exceeds a verification threshold, a speaker of the activation keyword may be verified as the authorized user (¶[0048]: Figure 3); here, a verification score is “a likelihood that the audio data representing the voice command matches a stored voice print associated with the speaker”; a speaker model is equivalent to “a voice print associated with the speaker”.
Concerning claims 4 and 14, Kim et al. discloses that upon recognizing a speech command, speech recognition unit 410 may identify the function associated with the speech command, e.g., activating an associated application, e.g., a banking application, a photo application, a web browser application (“in response to determining that the mapping of voice actions indicates that the service provider can perform the selected voice action, selecting the service provider”) (¶[0051]: Figure 4); words in a speech command are mapped to a specified function using a lookup table (¶[0062]: Figure 5); Figure 11 illustrates a plurality of lookup tables 1110, 1120, and 1130, in which a plurality of security levels associated with a plurality of functions are associated with an email application, a contact application, a call application, a web search application, and 
Concerning claims 5 and 15, Kim et al. discloses that upon recognizing a speech command, speech recognition unit 410 may identify the function associated with the speech command, e.g., activating an associated application, e.g., a banking application, a photo application, a web browser application (“determining that the one or more terms in the transcription match the one or more terms that correspond to the voice action”); (¶[0051]: Figure 4); words in a speech command are mapped to a specified function using a lookup table (“obtaining a set of voice actions, wherein each voice action in the set of voice actions identifies one or more terms that correspond to that voice action”) (¶[0062]: Figure 5); Figure 11 illustrates a plurality of lookup tables 1110, 1120, and 1130, in which a plurality of security levels associated with a plurality of functions are associated with an email application, a contact application, a call application, a web search application, and a photo application (“in response to determining that the one or more terms in the transcription match the one or more terms that correspond to the voice action, selecting the voice action from among the set of voice actions”) (¶[0083] - ¶[0084]: Figure 11).
Concerning claims 6 and 16, Kim et al. discloses that activated voice assistant application 130 may recognize a speech command (¶[0026]: Figure 1); speech Wishne et al. teaches that voice recognition application server 235 converts an audio file into a text file.  (¶[0034]: Figure 2)  A text file is “a transcription of the audio data”.   
Concerning claims 9 and 19, Kim et al. discloses various “input data types” for authorizing a user (“to perform authentication for the selected voice action”) that include verification keywords of a name, a birthday, or a personal identification number (PIN).  (¶[0058]: Figure 4)  Similarly, Wishne et al. teaches that additional security information could be necessary as a PIN number or a Social Security Number as account-verifying information that can provided verbally or manually.  (¶[0041]: Figure 3)  Broadly, verbally or manually entering additional security information as a PIN, a name, a birthday, or a social security number is “one or more input data types that the selected service provider uses to perform authentication for the selected voice action.”
Concerning claims 10 and 20, Kim et al. discloses that mobile device 120 includes a voice assistant application 130 (“wherein the user computing device comprises a voice automation device”).  (¶[0023]: Figure 1) 

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (U.S. Patent Publication 2015/0302856) in view of Wishne et al. (U.S. Patent Publication 2017/0330277) as applied to claims 1 and 11 above, and further in view of Warford et al. (U.S. Patent No. 9,837,079).
Kim et al. discloses that speaker verification unit 320 may determine a verification based on extracted sound features and a speaker model in a speaker model database 264.  If a verification score exceeds a verification threshold a speaker is verified as an authorized user.  (¶[0048]: Figure 3)  Here, a speaker model in a speaker model database can be construed as equivalent to ‘a stored voice print’, and a verification score is “a corresponding likelihood that the audio data representing the voice command matches one of a plurality of stored voice prints associated with different speakers”.  Implicitly, speaker model database 264 stores speaker models with a plurality of speakers that can be verified.  However, Kim et al. does not expressly disclose “selecting, from among the plurality of speaker identification results, the speaker identification result having the highest corresponding likelihood as the speaker identification result indicating that the voice command was spoken by the speaker.”  That is, Kim et al. is only expressly directed to determining if a verification score exceeds a threshold for a given speaker model.  Still, it would appear implicit that there must be a determination of which speaker model to use from speaker models stored for a plurality of different speakers.   
Anyway, Warford et al. teaches a method and apparatus for identifying fraudulent callers, where known voice prints are stored in a database or library.  Control system 142 may perform one-to-few or one-to-many comparisons of customer voice prints with known voice prints to identify any matches between them, or at least a probability of a match (“a likelihood”), meaning that the customer voice print and one of the known voice prints are likely from the same person.  The identification of any matches can be used to authorize a transaction requested by a customer, by comparing the customer Warford et al., then, teaches “selecting, from among a plurality of speaker identification results, the speaker identification result having the highest corresponding likelihood as the speaker identification result indicating that the voice command was spoken by the speaker” by comparing a customer voice print with a plurality of known voice prints to identify a probability of a match.  Here, a ‘highest’ scoring voice print to a plurality of known voice prints identifies a customer.  An objective is to provide more secure transactions and to prevent fraud in communications.  (Column 1, Lines 20 to 38)  It would have been obvious to one having ordinary skill in the art to determine a speaker identification result according to a highest scoring voice print of a plurality of voice prints that are compared to a customer’s voice as taught by Warford et al. to determine a verification of a speaker by comparison to a speaker model to perform speaker verification of Kim et al. for a purpose of providing more secure transactions and preventing fraud.

Claims 7 to 8 and 17 to 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (U.S. Patent Publication 2015/0302856) in view of Wishne et al. (U.S. Patent Publication 2017/0330277) as applied to claims 1 and 11 above, and further in view of Himmelstein (U.S. Patent Publication 2005/0275505).
Wishne et al. teaches that a request for information relating to a banking application can provide data indicative of the requested information.  (Abstract)  Additionally, Wishne et al. teaches that application server 220 can provide 323 requested information to application 306, and can generate an appropriate response so that application server 220 can output 323 the account balance information for display at computing device 210 or application server 220 can output 323 an account balance in an audio format via sound interface 116, e.g., as a spoken response to the inquiry.  If the voice command asked, “How much did I spend last evening”, application server 220 may output a response of, “You made three purchases totaling $124” to be output via sound interface 116.  (¶[0044]: Figure 3)  Arguably, then, Wishne et al. teaches these limitations of “output the indication that the selected service provider performed the selected voice action provided from the voice action server” and “receiving, at the voice action server, from the selected service provider, the indication that the selected service provider performed the selected voice action” simply because outputting the response includes “an indication” that the response was performed.  Moreover, Wishne et al. only states that a response can be provided in an audio format, but does not expressly teach that the voice action is provided “as synthesized speech”.  Still, a most common way of outputting audio information in an application of this nature is by synthesized speech due to the large number of alternative responses precluding the use of recorded speech.  
Anyway, Himmelstein teaches whatever limitations might be omitted by Wishne et al.  Generally, Himmelstein teaches a voice-controlled security system that analyzes an audible signal to determine whether it matches a voiceprint stored in memory.  Himmelstein, then, teaches generating an audible instruction signal that a voice command is being performed when a user is authorized so as “to output the indication that the selected service provider performed the selected voice action provided . . . as synthesized speech” and “receiving . . . the indication that the selected service provider performed the selected voice action.”  An objective is to automatically log in a user and set of level of access being provided to a particular user of a computer system so that security levels may restrict access to a computer.  (¶[0047])  It would have been obvious to one having ordinary skill in the art to provide an indication that a service provider performed an action by synthesized speech as taught by Himmelstein to output a response in an audio format of Wishne et al. for a purpose of setting a level of access to a particular user for restricting access to a computer.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
James (U.S. Patent No. 10,770,093) is Applicant’s parent patent.
Bruckert et al., Kanevsky et al., Ben-David et al., Broman et al., Thatiparthi et al., and Johnson et al. disclose related prior art. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        January 25, 2022