DETAILED ACTION

Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The present office action is responsive to communications received on 5/23/2022. Applicant cancelled claims 6, 7, and 35-38. Claims 1, 2, 4, 5, 9-11, 16, 17, 23-25, 33, and 34 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 7/20/2021 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Examiner’s Notes
Claims 23 and 33 are not rejected under 35 U.S.C. 101 because the claimed invention is directed to statutory subject matter. The claims are not considered software per se. Specification [page 6, line 5-7; page 3, line 34-35; page 4, line 5-6] recites “FIG. 3 … The device 50 may for example be the audio processor 14 shown in FIG. 1, the audio processor 36 shown in FIG. 2, or the first device 30 shown in FIG. 2.”; “Signals generated by the microphone 12 are passed to a first integrated circuit, in the form of a first processing device 14, which is referred to herein as an audio processor”; “The audio processor 14 is connected to a first bus 16. The first bus 16 is connected by a bridge circuit 18 to a second bus 20”. Therefore, the claims can be interpreted as comprising hardware.

Election/Restrictions
Applicant’s election without traverse of Group I, claims 1, 2, 4, 5, 9-11, 16, 17, 23-25, 33, and 34 in the reply filed on 5/23/2022 is acknowledged.

Claim Objections
Claims 2, 4, 5, 9-11, 16, 17, 24, 25, and 34 are objected to because of the following informalities: 
Claims 2, 4, 5, 9-11, 16, 17, 24, 25, and 34 recite “A method according to claim …” or “A device according to claim …”, which should be “The method according to claim …” or “The device according to claim …”.
Claim 24 recites “wherein the output is configured for transmitting the received signal and the certificate to a separate second device …”, which should be “the separate second device”.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 9-11, 16-17,23-25 and 33 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Page (US 20180130475 A1).

Regarding claim 1, Page teaches a method of authenticating a speech signal in a first device, the method comprising: ([0001] methods and apparatus for authenticating the voice of a user of an electronic device.)
receiving a speech signal; ([0026] FIG. 2 shows an example of an electronic device 100, … The device comprises one or more microphones 112 for receiving voice input from the user, a speaker recognition processor (SRP) 120 connected to the microphones 112, and an application processor (AP) 150 connected to the SRP 120.)
performing a live speech detection process to determine whether the received signal represents live speech, wherein the live speech detection process generates a live speech detection output; ([0075] The biometric authentication module 130 may also initiate an algorithm to determine whether or not the voice data signal is a spoof signal. For example, it is known to attack biometric authentication algorithms by recording the user's voice, or synthesizing an audio signal to correspond to the user's voice, and playing that recorded or synthesized signal back to the authentication module in an attempt to “spoof” the biometric authentication algorithm. The biometric authentication module 130 may thus perform an algorithm to determine whether the voice data signal is a spoof signal, and generate a corresponding score indicating the likelihood that the voice data signal is a genuine signal (i.e. not a spoof signal).) Here Page also discloses anti-spoofing score (for determining whether the voice data signal is genuine or recorded/synthesized) in ¶85.
forming a certificate by encrypting at least the live speech detection output; and ([0063] The biometric authentication result may be authenticated (i.e. with a digital signature) to further protect against man-in-the-middle attacks attempting to spoof the result, including protection against replay attacks. For example, this may be performed by the AP 150 sending to the SRP 120 a biometric verification result request containing a random number. The SRP 120 may then append the authentication result to this message, sign the whole message with a private key, and send it back to the AP. The AP 150 can then validate the signature with a public key, ensure that the returned random number matches that transmitted, and only then use the biometric authentication result.) Here the result can be the anti-spoofing score disclosed in ¶75 and ¶85. In addition, digitally signing with a private key is a form of encryption.
transmitting the received signal and the certificate to a separate second device. ([0063] The SRP 120 may then append the authentication result to this message, sign the whole message with a private key, and send it back to the AP.) Here Page discloses SRP 120 sending the audio data to the AP 150 (e.g. over the AIF 128) in ¶76 as well.

Regarding claim 9, Page teaches all the features with respect to claim 1, as outlined above. Page further teaches wherein the live speech detection process generates the live speech detection output in the form of a score value having at least two possible values. ([0075] The biometric authentication module 130 may also initiate an algorithm to determine whether or not the voice data signal is a spoof signal. For example, it is known to attack biometric authentication algorithms by recording the user's voice, or synthesizing an audio signal to correspond to the user's voice, and playing that recorded or synthesized signal back to the authentication module in an attempt to “spoof” the biometric authentication algorithm. The biometric authentication module 130 may thus perform an algorithm to determine whether the voice data signal is a spoof signal, and generate a corresponding score indicating the likelihood that the voice data signal is a genuine signal (i.e. not a spoof signal). [0085] for comparison with the anti-spoofing score (for determining whether the voice data signal is genuine or recorded/synthesized).) Here score in “corresponding score “ and “anti-spoofing score” implies at least two possible values.

Regarding claim 10, Page teaches all the features with respect to claim 1, as outlined above. Page further teaches forming the certificate by encrypting a quality metric of the received signal with the live speech detection output. ([0062] Once generated, the biometric authentication result is output from the SRP 120 via CIF 136 and provided to AP 150… The biometric authentication result may be appended with the indication of the threshold values used by the comparison circuitry to generate the result. Thus, where the control signal received on the control interface 136 specifies a particular FAR/FRR value or a label, the biometric authentication result may be appended with that same FAR/FRR value or label. [0063] The biometric authentication result may be authenticated (i.e. with a digital signature) to further protect against man-in-the-middle attacks attempting to spoof the result, including protection against replay attacks. For example, this may be performed by the AP 150 sending to the SRP 120 a biometric verification result request containing a random number. The SRP 120 may then append the authentication result to this message, sign the whole message with a private key, and send it back to the AP. The AP 150 can then validate the signature with a public key, ensure that the returned random number matches that transmitted, and only then use the biometric authentication result.) Here FAR/FRR value can be “quality metric” under broadest reasonable interpretation because “the FAR/FRR values may be based on a context in which the voice data signal was acquired. For example, the AP 150 may be able to determine one or more of: a location of the electronic device 100; a velocity of the electronic device 100; an acceleration of the electronic device 100; a level of noise in the voice data signal; one or more peripheral devices to which the electronic device 100 is connected; and one or more networks to which the electronic device 100 is connected” (¶79).

Regarding claim 11, Page teaches all the features with respect to claim 10, as outlined above. Page further teaches wherein the quality metric of the received signal comprises a signal-noise ratio of the received signal, or a fundamental frequency of speech in the received signal. ([0062] Once generated, the biometric authentication result is output from the SRP 120 via CIF 136 and provided to AP 150… The biometric authentication result may be appended with the indication of the threshold values used by the comparison circuitry to generate the result. Thus, where the control signal received on the control interface 136 specifies a particular FAR/FRR value or a label, the biometric authentication result may be appended with that same FAR/FRR value or label.) Here FAR/FRR value can be “quality metric” under broadest reasonable interpretation because “the FAR/FRR values may be based on a context in which the voice data signal was acquired. For example, the AP 150 may be able to determine one or more of: a location of the electronic device 100; a velocity of the electronic device 100; an acceleration of the electronic device 100; a level of noise in the voice data signal; one or more peripheral devices to which the electronic device 100 is connected; and one or more networks to which the electronic device 100 is connected” (¶79).

Regarding claim 16, Page teaches all the features with respect to claim 1, as outlined above. Page further teaches transmitting a device specific identifier to the separate second device. ([0062] Once generated, the biometric authentication result is output from the SRP 120 via CIF 136 and provided to AP 150… The biometric authentication result may be appended with the indication of the threshold values used by the comparison circuitry to generate the result. Thus, where the control signal received on the control interface 136 specifies a particular FAR/FRR value or a label, the biometric authentication result may be appended with that same FAR/FRR value or label.) Here device specific information is implied through FAR/FRR value transmission because “the FAR/FRR values may be based on a context in which the voice data signal was acquired. For example, the AP 150 may be able to determine one or more of: a location of the electronic device 100; …” (¶79).

Regarding claim 17, Page teaches all the features with respect to claim 16, as outlined above. Page further teaches forming the certificate by encrypting said device specific identifier with the live speech detection output. ([0062] Once generated, the biometric authentication result is output from the SRP 120 via CIF 136 and provided to AP 150… The biometric authentication result may be appended with the indication of the threshold values used by the comparison circuitry to generate the result. Thus, where the control signal received on the control interface 136 specifies a particular FAR/FRR value or a label, the biometric authentication result may be appended with that same FAR/FRR value or label.) Here device specific information is implied through FAR/FRR value because “the FAR/FRR values may be based on a context in which the voice data signal was acquired. For example, the AP 150 may be able to determine one or more of: a location of the electronic device 100; …” (¶79) and “encrypting” is disclosed in ¶63.

Regarding claim 23, the scope of the claim is similar to that of claim 1, respectively. Accordingly, the claim is rejected using a similar rationale.

Regarding claim 24, Page teaches all the features with respect to claim 23, as outlined above. Page further teaches wherein the output is configured for transmitting the received signal and the certificate to a separate second device, wherein the device (SRP 120) and the separate second device (AP 150) are located in a single host device (electronic device 100). ([0026] FIG. 2 shows an example of an electronic device 100, which may for example be a mobile telephone or a mobile computing device such as laptop or tablet computer. The device comprises one or more microphones 112 for receiving voice input from the user, a speaker recognition processor (SRP) 120 connected to the microphones 112, and an application processor (AP) 150 connected to the SRP 120. The SRP 120 may be provided on a separate integrated circuit.)

Regarding claim 25, Page teaches all the features with respect to claim 23, as outlined above. Page further teaches wherein the device (SRP 120) comprises a first integrated circuit, and the separate second device (AP 150) comprises a second integrated circuit located within the same host device (electronic device 100) as the first integrated circuit. ([0026] FIG. 2 shows an example of an electronic device 100, which may for example be a mobile telephone or a mobile computing device such as laptop or tablet computer. The device comprises one or more microphones 112 for receiving voice input from the user, a speaker recognition processor (SRP) 120 connected to the microphones 112, and an application processor (AP) 150 connected to the SRP 120. The SRP 120 may be provided on a separate integrated circuit.)

Regarding claim 33, Page teaches a device comprising:
an input, for receiving an audio signal; ([0029] The SRP 120 comprises one or more inputs 122 for receiving audio data from the microphones 112.)
a first output, for supplying the received audio signal as an output of the device; ([0031] the routing module 124 comprises two routing module outputs. A first output is coupled to an audio interface (AIF) 128, which provides an audio output interface for SRP 120, and is coupled to the AP 150.)
a check processor, for determining whether the received audio signal represents live speech, and generating an output signal based on said determination; and ([0037, 0043] The voice biometric authentication module 130 may be implemented for example as a DSP. The voice authentication module 130 carries out biometric authentication on the pre-processed audio data in order to generate an authentication score. an indication of one or more threshold values to be used in determining whether the voice contained within the audio signal is an authorised user or not. This indication may be passed to a threshold interpretation module 138, which generates the threshold value(s) specified within the control signal, and the threshold value(s) are then input to comparison circuitry 140. Comparison circuitry 140 compares the threshold value(s) to the biometric score stored in the buffer 134, and generates a biometric authentication result to indicate whether the voice contained within the audio signal is that of an authorised user or not.)
a second output, for supplying an output of the check processor as an output of the device. ([0043, 0062] The SRP 120 further comprises a control interface (CIF) 136 for receiving control signals (e.g. from AP 150) and outputting control signals (e.g. to AP 150). Once generated, the biometric authentication result is output from the SRP 120 via CIF 136 and provided to AP 150.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2 and 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Page (US 20180130475 A1) in view of Lesso (WO 2019145708 A1, listed in IDS).

Regarding claim 2, Page teaches all the features with respect to claim 1, as outlined above. But Page does not teach forming a hash of the received signal; and comprising: forming the certificate by encrypting the hash of the received signal with the live speech detection output. This aspect of the claim is identified as a difference.
However, Lesso in an analogous art explicitly teaches 
forming a hash of the received signal; and comprising: ([page 42, line 19-23] One example of how a suitable Message Authentication Certificate (MAC) may be generated and verified by the first voice biometric block 82 and the second voice biometric block 84 is to pass the received digital data to a hash module which performs a hash of the data in appropriate frames. The hash module may determine a hash value, H, for example according to the known SHA-256 algorithm.)
forming the certificate by encrypting the hash of the received signal with the live speech detection output. ([page 42, line 26-29] In the first voice biometric block 82, the hash value may be digitally signed using a signing module. The signing module may apply a known cryptographic signing protocol, e.g. based on the RSA algorithm or Elliptic-curve-cryptography (ECC) using a private key Kpnvate that is known to the first voice biometric block 82.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the “biometric authentication for authenticating voice” concept of Page, and the “analyzing speech signals” approach of Lesso. One of ordinary skill in the art would have been motivated to perform such a modification to improve data integrity and enable detection of system which has been subject to an attack by injecting invalid data. Suitable Message Authentication Certificate (MAC) may be generated and verified using hash function and cryptography (Lesso [page 42, line 16-29]).

Regarding claim 4, Page teaches all the features with respect to claim 1, as outlined above. But Page does not teach receiving the speech signal with a first sample rate; performing the live speech detection process using the received speech signal with the first sample rate; decimating the received signal to a second sample rate; and transmitting the received signal to the separate second device with the second sample rate. This aspect of the claim is identified as a difference.
However, Lesso in an analogous art explicitly teaches 
receiving the speech signal with a first sample rate; ([page 64, line 20-22] Figure 19 shows a speaker verification system 401 , which receives audio comprising speech from an input 402 such as a high-resolution microphone that is capable of generating signals with a high sample rate such as 192kHz)
performing the live speech detection process using the received speech signal with the first sample rate; ([page 65, line 10-14] The audio validation module 410 is configured to determine if the received audio is valid or invalid. In particular, the audio validation module 410 is configured to detect if the received audio is all from a single speaker, and/or to determine if the received audio is genuine audio, or is the product of a spoof or replay attack, wherein a hacker or other malicious actor is trying to deceive the speaker verification system 401.)
decimating the received signal to a second sample rate; and ([page 67, line 10-12] Figure 19 shows that the input signal is applied to a downsampler 422, and it is the decimated or downsampled version of the input signal that is passed to the speaker validation block 408, and also to the audio buffer 420.)
transmitting the received signal to the separate second device with the second sample rate. ([page 67, line 10-12] Figure 19 shows that the input signal is applied to a downsampler 422, and it is the decimated or downsampled version of the input signal that is passed to the speaker validation block 408, and also to the audio buffer 420.) In summary, Lesso discloses that “the system may be configured to provide for different bandwidths or sample rates between the first and second devices”; as well as “the first device may perform a decimation of the sample rate of the received audio, wherein the second device is configured to process the decimated version of the received audio” [page 64, line 5-6; page 67, line 33 - page 68, line 1].
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the “biometric authentication for authenticating voice” concept of Page, and the “decimated/downsampled input signal” approach of Lesso. One of ordinary skill in the art would have been motivated to perform such a modification for bandwidth and power consumption management. As mentioned above, the first device may be configured to receive an input signal having a relatively high sample rate, and therefore a relatively high bandwidth. Such a high bandwidth signal may be required by an anti-spoofing module, as an indication as to whether the audio signal results from a replay attack. However, a reduced bandwidth or sample rate between modules or devices can provide improved overall system efficiency due to a reduction in power consumption of the system, for example when the first device is located separate to the second device, and connected using e.g. a wireless data link, a reduction in the bandwidth of data to be communicated via said link can provide improvements to the power consumption and battery life of such devices (Lesso [page 67, line 2-7; page 68, line 3-8]).

Regarding claim 5, Page in view of Lesso teaches all the features with respect to claim 4, as outlined above. The combination further teaches wherein the first sample rate is higher than 16 kHz ([Lesso page 64, line 20-22] Figure 19 shows a speaker verification system 401 , which receives audio comprising speech from an input 402 such as a high-resolution microphone that is capable of generating signals with a high sample rate such as 192kHz) and the second sample rate is 16 kHz. ([Lesso page 83, line 9-10] downsampling the received audio signal to a sample rate below 20kHz.) Indeed, it would be obvious to changes the size of sample rate if it is desired; See MPEP 2144.04(IV)(A).

Claim 34 is rejected under 35 U.S.C. 103 as being unpatentable over Page (US 20180130475 A1) in view of Boyadjiev (US 20200035247 A1).

Regarding claim 34, Page teaches all the features with respect to claim 33, as outlined above. Page further teaches wherein the check processor is configured for generating said output signal based on said determination such that the output signal may take one of at least three [score ranges], namely: ([0075] The biometric authentication module 130 may also initiate an algorithm to determine whether or not the voice data signal is a spoof signal. For example, it is known to attack biometric authentication algorithms by recording the user's voice, or synthesizing an audio signal to correspond to the user's voice, and playing that recorded or synthesized signal back to the authentication module in an attempt to “spoof” the biometric authentication algorithm. The biometric authentication module 130 may thus perform an algorithm to determine whether the voice data signal is a spoof signal, and generate a corresponding score indicating the likelihood that the voice data signal is a genuine signal (i.e. not a spoof signal).) Here Page discloses anti-spoofing score (for determining whether the voice data signal is genuine or recorded/synthesized) in ¶85. Page also discloses that “It will be noted that more than one threshold value may also be specified and utilized in the following manner. For example, the control signal may specify an upper FAR/FRR value and a lower FAR/FRR value (corresponding to upper and lower threshold values)” in ¶86.
a first [score range], indicating that the received audio signal has a high probability of representing live speech, ([0086] If the biometric score exceeds the upper threshold value, the voice within the voice data signal may be authenticated as that of an authorised user.) This biometric score corresponds to “a genuine signal” in ¶75.
a second [score range], indicating that the received audio signal has a low probability of representing live speech, and ([0086] If the biometric score is less than the lower threshold, a negative authentication result may be provided, i.e. the SRP 120 is confident that the voice within the audio signal is not that of an authorised user.) This biometric score corresponds to “a spoof signal” in ¶75.
a third [score range], indicating that the received audio signal has an intermediate probability of representing live speech. ([0086] If the biometric score is between the upper and lower thresholds, however, this is an indication that the SRP 120 is unsure as to whether or not the voice is that of an authorised user.)

Page teaches generating said output signal based on said determination such that the output signal may take one of at least three score ranges, but does not explicitly teach three score ranges being three values. This aspect of the claim is identified as a difference.
However, Boyadjiev in an analogous art explicitly teaches 
a first value, indicating that the received audio signal has a high probability of representing live speech, ([0052] FIG. 6 shows an example of the probability of detecting spoofing using the output of the plurality of multi-dimensional acoustic feature vector CNNs 121. For example, the FIG. 6 shows the probability of detecting a spoofed voice from the output…The notations Excellent, Good, Acceptable and Low indicate the probability of detecting a voice spoofed using the spoofing algorithms.)
a second value, indicating that the received audio signal has a low probability of representing live speech, and ([0052] The notations Excellent, Good, Acceptable and Low indicate the probability of detecting a voice spoofed using the spoofing algorithms.)
a third value, indicating that the received audio signal has an intermediate probability of representing live speech. ([0052] The notations Excellent, Good, Acceptable and Low indicate the probability of detecting a voice spoofed using the spoofing algorithms.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the “biometric authentication for authenticating voice” concept of Page, and the “indicator for detecting spoofed voice” approach of Boyadjiev. One of ordinary skill in the art would have been motivated to perform such a modification where the probabilities of spoofing detection are quantified according to different levels, which are straightforward and easy to understand.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20190214022 A1, "Voice user interface", by Vaquero Avilés-Casco, teaches speaker authentication comprises: receiving a speech signal; dividing the speech signal into segments; and, following each segment, obtaining an authentication score based on said segment and previously received segments, wherein the authentication score represents a probability that the speech signal comes from a specific registered speaker. In response to an authentication request, an authentication result is output based on the authentication score. Weightings given to the authentication scores can depend on some quality measure associated with each segment, such as a respective signal to noise ratio measured during the segment.
US 11011178 B2, "Detecting replay attacks in voice-based authentication", by Bhimanaik, teaches detecting replay attacks in voice-based authentication systems. In one embodiment, audio is captured via an audio input device. It is then verified that the audio includes a voice authentication factor spoken by a user. If it is determined that the audio includes unexpected environmental audio in addition to the voice authentication factor that has been verified, one or more actions may be performed.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAN YANG whose telephone number is (408)918-7638.  The examiner can normally be reached on Monday to Friday, 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carl Colin can be reached on 571-272-3862.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HAN YANG/Examiner, Art Unit 2493