DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 03 November 2020 in reference to application 17/052,736.  Claims 1-15 are pending and have been examined.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-3 and 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bocklet et al. (US PAP 2017/0200451) in view of Khoury et al. (US PAP 2018/0254046).

Consider claim 1, Bocklet teaches a device for authenticating a voice input provided from a user (abstract), the device comprising: 
a microphone configured to receive the voice input (0030-31, receiving utterance using microphone 201); 

a processor configured to execute the one or more instructions (0018, processor), 
wherein the processor is further configured to execute the one or more instructions to obtain, from the voice input, signal characteristic data representing signal characteristics of the voice input (0032-33, feature extraction of input utterance, i.e. MFCCs), and authenticate the voice input by applying the obtained signal characteristic data to a first model configured to determine an attribute of the voice input (0034-35, features fed to classifier module, which classifies signal as live or replay), and 
wherein the first learning model is trained to determine the attribute of the voice input based on a voice uttered by a person and a voice output by an apparatus (0035-36, 0020-22, detecting wither utterance is live, or a recording being replayed from a device).
Bocklet does not specifically teach applying the obtained signal characteristic data to a first learning model.
In the same field of detecting replay attacks, Khoury teaches applying the obtained signal characteristic data to a first learning model (0045-47, applying features to a Deep Neural Network to determine if voice is a playback spoof or live).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a DNN for classification as taught by Khoury in the system of Bocklet in order to increase accuracy and thus create a more secure authentication (Khoury 0002).

Consider claim 2, Bocklet teaches the device of claim 1, wherein the signal characteristic data comprises information about a per-frequency cumulative power of the voice input (0032-33, using Mel-frequency ceptral coefficients, which is a measure of pre-frequency cumulative power).

Consider claim 3, Bocklet and Khoury teaches the device of claim 2, wherein the first learning model is trained to determine the attribute of the voice input differently according to per-frequency cumulative powers of the voice uttered by the person and the voice output by the apparatus (Bocklet 0039-40, figure 3, training the model based on extracted MFCCs in order to recognition the difference between live and replay, Khoury (0045-47, applying features to a Deep Neural Network trained to determine if voice is a playback spoof or live).

Consider claim 8, Bocklet teaches A method of authenticating a voice input provided from a user (abstract), the method comprising: 
receiving the voice input (0030-31, receiving utterance using microphone 201); 
obtaining, from the voice input, signal characteristic data representing signal characteristics of the voice input (0032-33, feature extraction of input utterance, i.e. MFCCs), and 
authenticating the voice input by applying the obtained signal characteristic data to a first model configured to determine an attribute of the voice input (0034-35, features fed to classifier module, which classifies signal as live or replay), and 

Bocklet does not specifically teach applying the obtained signal characteristic data to a first learning model.
In the same field of detecting replay attacks, Khoury teaches applying the obtained signal characteristic data to a first learning model (0045-47, applying features to a Deep Neural Network to determine if voice is a playback spoof or live).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a DNN for classification as taught by Khoury in the system of Bocklet in order to increase accuracy and thus create a more secure authentication (Khoury 0002).

Claim 9 contains similar limitations as claim 2 and is therefore rejected for the same reasons.

Claim 10 contains similar limitations as claim 3 and is therefore rejected for the same reasons.

Claims 4, 5, 11, and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bocklet and Khoury as applied to claims 1 and 8 above, and further in view of Korjani (US Patent 10,706,856).

Consider claim 4, Bocklet teaches the device of claim 1, wherein the processor is further configured to execute the one or more instructions to authenticate the user by applying the voice input to a second model configured to authenticate the user who utters the voice input (0025, If the utterance is an original utterance, it may be further evaluated for user identification).
Bocklet and Khoury do not specifically teach applying the voice input to a second learning model configured to authenticate the user who utters the voice input based on a voice input pattern of the user.
In the same field of voice authentication, Korjani teaches applying the voice input to a learning model configured to authenticate the user who utters the voice input based on a voice input pattern of the user (col 1 line 50- col 2 line 25, MFCCs from speech may be applied to a deep neural network tuned to each speaker to authenticate the user).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a DNN for conventional voice matching as taught by Korjani in the system of Bocklet and Khoury in order to allow for accurate and efficient voice verification (Korjani col 1 lines 40-47).

Consider claim 5, Bocklet teaches the device of claim 4, wherein the processor is further configured to execute the one or more instructions to apply the obtained signal characteristic data to the first learning model to determine whether a first user authentication is required, and selectively apply the voice input to the second learning 

Claim 11 contains similar limitations as claim 4 and is therefore rejected for the same reasons.

Claim 12 contains similar limitations as claim 5 and is therefore rejected for the same reasons.

Claims 6 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bocklet, Khoury, and Korjani as applied to claims 4 and 11 above, and further in view of Foerster et al. (US PAP 2017/0345430).

Consider claim 6, Bocklet, Khoury, and Korjani teaches the device of claim 4, but do not specifically teach:
wherein the processor is further configured to execute the one or more instructions to obtain context information including at least one of surrounding environment information of the device, state information of the device, a user's device usage history information, and user schedule information, and 
the authenticating of the user comprises inputting the context information to the second learning model together with the voice input.
In the same field of user authentication, Foerster teaches 
at least one of surrounding environment information of the device, state information of the device, a user's device usage history information, and user schedule information (0005-07, using environmental context), and 
the authenticating of the user comprises inputting the context information to the second learning model together with the voice input (0005-07, using environmental context to adjust speaker authentication).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use environmental context as taught by Foerster in the system of Bocklet, Khoury, and Korjani in order to allow accurate speaker identification in noise environments (Foerster 0013).

Claim 13 contains similar limitations as claim 6 and is therefore rejected for the same reasons.

Claims 7, 14, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bocklet, Khoury, and Korjani as applied to claims 4 and 11 above, and further in view of Bhimanaik et al. (US Patent 10,079,024).

Consider claim 7, Bocklet, Khoury, and Korjani teach the device of claim 4, but does not specifically teach wherein the processor is further configured to execute the one or more instructions to apply the voice input to the second learning model to 
In the same field of user authentication, Bhimanaik teaches wherein the processor is further configured to execute the one or more instructions to apply the voice input to the second learning model to determine whether a second user authentication is required, and selectively additionally authenticate the user who utters the voice input based on a result of the determining (Figure 3B, Col 11 lines 32-44, if voice prints do not match, user may be prompted to perform another type of authentication such answering knowledge based questions described at Col 4 lines 9-31).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to provide a fallback authentication as taught by Bhimanaik in the system of Bocklet, Khoury, and Korjani in order to allow for access to the system in the event of a false rejection.

Claim 14 contains similar limitations as claim 7 and is therefore rejected for the same reasons.

Consider claim 15, Bhimanaik teaches the method of claim 14, further comprising obtaining context information including at least one of surrounding environment information of the device, state information of the device, a user's device usage history information, and user schedule information (at Col 4 lines 9-31 Knowledge like user’s purchase history on device), 
.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Publications to Lesso (2019/0149932, 20190147888) teach a similar method of detecting playback attacks. “Audio replay attack detection with deep learning frameworks” also teaches a similar method of detecting replay attacks with neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655