DETAILED ACTION

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged.   

Information Disclosure Statement
The references listed in the Information Disclosure Statement submitted on 10/29/2021 and 01/26/2022 have been considered by the examiner (see attached PTO-1449).  

Response to Amendment
This communication is responsive to the applicant's amendment dated 01/26/2022.  The applicant(s) amended claims 19, 29 and 31 (see the amendment: pages 2-3).
The examiner withdrew previous claim rejection under 35 USC 112 (a), because the applicant amended the corresponding claim(s).  


Response to Arguments
Applicant's arguments filed on 01/26/2022 with respect to the claim rejection under 35 USC 103, have been fully considered but are moot in view of the new ground(s) of rejection, since the amended claims introduce new issue and/or change the scope of the claims. Accordingly, response to the applicant’s arguments based on the newly amended claims (see Remarks: pages 5-7) is directed to new claim rejection with necessitated new ground (see below).  It is also noted that the previously cited references are still applicable to the amended claims for prior art rejection with necessitated new ground(s) (may include newly combined teachings and/or interpretations) (see detail rejection below).

In addition, applicant's arguments (filed on 01/26/2022) with respect to claim rejection under 35 USC 103 have been fully considered but they are not persuasive.
In response to applicant's arguments with respect to claim 19 (also related to claims 29 and 31) that “the cited references cannot render obvious the claims of the present application at least because the cited references, whether taken alone or in combination, fail to teach, disclose, or suggest each and every feature of the recited claims” (Remarks: page 5, paragraph 3); “Valenti also does not teach using a trained neural network to determine a score indicative of a likelihood that the speech content is live speech (emphasized by the applicant)”, “Valenti does not disclose: 1. A method of detecting a replay attack. 2. That the method comprises identifying speech content present in at least a portion of the audio signal. 3. That the information obtained for each portion of the audio signal for which speech content is identified. 4. That a trained neural network is used determine a score indicative of a likelihood that the speech content is live speech 5. That a score is determined for each portion of the audio signal for which speech content is identified” (Remarks: page 5, paragraph 4 to page 6, paragraph 3); and “Khoury does not” teach “identifying speech content present in a portion of an audio signal”, “obtaining information about a frequency spectrum for each portion of the received audio signal for which speech content is identified”, “using either the DNN 210 or the second DNN to determine a likelihood” and “there is no teaching in Khoury of determining a score for each portion of the audio signal for which speech content is identified” (Remarks: pages 6-7, bridge paragraph), examiner respectfully disagrees with applicant’s arguments and has a different view of prior art teachings and claim interpretations.  
It is noted that the argued limitation of “identifying speech content present in at least a portion of the audio signal” (i.e. argued issue 2 for reference of VALENTI) is claimed in a broad manner, wherein “speech content” could be broadly and properly read on any speech/voice /utterance-related features/characteristics and “a portion of the audio signal” could be broadly and properly read on any segment, part, section, block, sequence, number, or unit (such as a phoneme, phone, spoken word or syllable of/in audio/speech/voice signal/data/samples (including spoken pass-phrase) related to the identifying content.  Further, it should be pointed out that the argument(s) failed to treat the prior art teachings as a whole.  Clearly, in this case, VALENTI not only discloses identifying the speech/utterance content in at least a portion of the audio signal (such as audio/linguistic features/content/characteristics regarding phonemes, phones, spoken words or syllables) for enrollment, but also determining/authenticating (also read on identifying) the content for ‘authentication’ (p41).  Thus, the rejection based on teachings of 
 Further, it is noted that the above arguments regarding limitations for reference of VALENTI (i.e. argued issues 1 and 3-5) and for reference of KHOUCK (see Remarks: page-6-7, bridge paragraph) are not persuasive because the rejection for the argued limitations are based on combined teachings of the both references.  In response to the applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  In this case, the rejection clearly and articulately analyzes/addresses that what the primary reference (VALENTI) teaches and how to interpret the claimed limitations comparing with the primary reference teachings respectively, what the primary reference lacks, what teachings of the second reference (KHOUCK) could make up the lacking limitations, and how to combine the teachings of both references together to achieve claimed feature with obviousness /motivation statement/explanation, in which all claimed/argued limitations are properly covered/addressed based on broadest reasonable interpretation in light of the specification (see detail below).  In addition, response to those arguments based on newly amended limitation(s) 
 For above reason(s), the applicant’s arguments are not persuasive and the amended limitations could not overcome combined teachings of the prior art rejection as stated below.    

Claim Rejections - 35 USC § 103
Claims 19-23 and 27-31 are rejected under 35 U.S.C. 103 as being unpatentable over VALENTI et al. (IDS: US 2018/0060557, hereinafter referenced as VALENTI) in view of KHOURY et al. (US 2018/0254046, hereinafter referenced as KHOURY).
As per claim 19,  as best understood in view of claim rejection under 35 USC 112 (a), see above, VALENTI discloses ‘spoken pass-phrase suitability determination’ (title) for indicating ‘a spoofing attack’ and ‘anti-spoofing check’ (p(paragraph)58-p59, p70, for authenticating ‘the user’), comprising:
receiving an audio signal (read on ‘at least one utterance of the pass-phrase’, or ‘audio of the utterance’) representing speech (‘utterance’ or ‘spoken pass-phrase’) (Fig.2, ‘200’, p42, p66); 
identifying (or ‘determining’, ‘authenticated’ and/or ‘measure of’) speech content (read on various ‘glottal voice source’/‘articulatory’/‘linguistic’/‘audio’ ‘features’/‘characteristics’ regarding ‘linguistic elements’ including ‘phonetic part’, ‘phones or phonetics’, ‘phonemes’, ‘spoken pass-phrase’ or ‘spoken words or syllables’, or ‘linguistic content’ related ‘phonetic or prosodic content or patterns thereof’) present in at least a portion (read on ‘segmentation’, ‘duration’, or ‘frames (i.e. temporal sub-sections)’ of ‘the utterance’ corresponding to ‘one or more’, ‘individual’ or ‘groups’ of ‘phonemes’, ‘phones’, ‘spoken words or syllables’, ‘prosodic 
obtaining information (read on ‘spoken linguistic features’/‘audio features’ of ‘linguistic element(s)’ regarding the ‘phoneme’, ‘phone’, ‘syllable’, ‘word’, including, but not limiting to  analyzed/determined ‘score(s)’, ‘characteristics’, ‘patterns’, and/or their ‘changes’, used in ‘the different types of linguistic analysis’, such as ‘variation of fundamental frequency’, ‘changes in or rates of change in …frequency’, ‘frequency fading’, ‘cut off’ in ‘frequency’, ‘pitch variation or the formant distances’) about a frequency spectrum (read on ‘spectral components’ or ‘formant’) of each portion (read on ‘segmentation’, ‘duration’, or ‘frames (i.e. temporal sub-sections)’ and/or related ‘each of the phonemes’, ‘phones’, or ‘other linguistic (phonetic) elements’ including ‘spoken words or syllables’, or ‘prosodic units’, in a broad sense, in light of the specification: p101) of the audio signal for which speech content is identified (same as stated above) (Fig.2, p18, p41, p46, p51-p61); 
for each portion (such as each phoneme or phone) of the audio signal for which speech content is identified (same above), using a [trained] neural network to determine a score (read on ‘a measure’, one of ‘scores’, ‘pass-phrase score’, ‘log-likelihood ratio’, or ‘linguistic-element-score’ of ‘each’ of ‘phonemes and/or phones’, in a broad sense, Note: the specification does not specifically recite/define the claimed term of “score”) indicative of a likelihood (read on ‘spoken pass-phrase suitability’, or ‘log-likelihood’ in a broad sense) that the speech content is live speech (read on ‘spoken pass-phrase’ provided by a ‘user’ after ‘enrolment of the user’, not by ‘someone other than the user’ such as ‘a voice synthesiser’ used in ‘a spoofing attack’, in a broad sense, Note: the specification does not specifically recite/define the claimed term of “live 
It is noted that VALENTI does not expressly disclose the spoofing attack as being or including a “replay” attack (recited in the preamble, which, in alternative, would also be interpreted as intended use or field of use without giving a patentable weight) and the neural network being “trained” for detecting a replay attack.  However, the same/similar concept/feature is well known in the art as evidenced by KHOURY who in the same field of endeavor, discloses ‘method and apparatus for detecting spoofing conditions’ (title) providing ‘detecting voice spoofing attacks’ including ‘replay attack’ (p2-p4, p34), comprising extracting ‘acoustic features’ including ‘deep constant Q cepstral coefficients (CQCC) features’ and/or ‘sub-band cepstral coefficients (SBCC) features’ (either or both are also read on the claimed “obtained information about the frequency spectrum” in a broad sense) and using ‘deep neural network (DNN)(s)’ being ‘pre-trained’/‘trained’ for ‘extraction’ and/or for detecting (determining /discriminating/classifying /identifying) ‘the voice sample’ (“audio signal” or “speech content”) as being ‘genuine’ (read on claimed “live speech”) or ‘spoofed’/‘artifacts’ (including ‘a replay attack’) based on a ‘measure’/ ‘likelihood’ from ‘trained’ DNN(s) (Figs. 2-7, p5-p11, p36-p38, 46-47, p50-p56, p60-p63).  Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine teachings of VALENTI and KHOURY together by providing a mechanism of extracting/identifying audio /acoustic features including frequency/spectrum information/feature for each of linguistic elements (including phonemes and/or phones, as portions) of received spoken pass-phrase and detecting (determining/discriminating/classifying /identifying) a score/measure indicative of a likelihood of a voice/speech/utterance (such as received spoken pass-phrase) as being genuine 
As per claim 20 (depending on claim 19), VALENTI in view of KHOURY further discloses “wherein the score indicative of the likelihood that the speech content is live speech is based on the obtained information about the frequency spectrum (same as stated for claim 19), and on an identified acoustic class (read on at least one of ‘spoken words’, ‘syllables’, ‘phones’, ‘phonemes’ such as ‘nasal vowel phonemes’) of the speech content  (VALENTI: p51-p53, p61).
As per claim 21 (depending on claim 19), VALENTI in view of KHOURY further discloses “removing effects of a channel and/or noise from the received audio signal; and using the audio signal after removing the effects of the channel and/or noise when obtaining the information about the frequency spectrum of each portion of the audio signal for which speech content is identified” (VALENTI: p63, ‘noise removal’).
As per claim 22 (depending on claim 19), VALENTI in view of KHOURY further discloses “…identifying at least one test acoustic class (read on at least one of ‘spoken words’, ‘syllables’, ‘phones’, ‘phonemes’ such as ‘nasal vowel phonemes’)” (VALENTI: p51-p53, p61).
As per claim 23 (depending on claim 22), the rejection is based on the same reason described for claim 22, because it also reads on the limitation(s) of claim 23.
As per claim 27 (depending on claim 21), VALENTI in view of KHOURY further discloses “… identifying a location (one of ‘locations’, or ‘positioning’) of occurrences of the test acoustic class in known speech content (such as the ‘utterance’ of ‘pass phrase’)”, (VALENTI: p64, p66).
claim 28 (depending on claim 27), the rejection is based on the same reason described for claim 27, because it also reads on the limitation(s) of claim 28.
As per claim 29, it recites a system.  The rejection is based on the same reason described for claim 19, because the claim recites/includes the same/similar limitation(s) as claim 19, wherein VALENTI also discloses claimed “input” and “processor’ (VALENTI: Fig.1, p35).
As per claim 30 (depending on claim 29), VALENTI in view of KHOURY further discloses “wherein the device comprises one of: a smartphone, a tablet or laptop computer, a games console, a home control system, a home entertainment system, an in-vehicle entertainment system, or a domestic appliance”, (VALENTI: p21).
As per claim 31, it recites a computer program product comprising a tangible, non-transitory computer-readable medium. The rejection is based on the same reason described for claim 19, because the claim recites/includes the same/similar limitations as claim 19.

Claims 24-26 are rejected under 35 U.S.C. 103 as being unpatentable over VALENTI in view of KHOURY as applied to claim 23, and further in view of KRISHNASWAMY et al. (US 2018/0146370, hereinafter referenced as KRISHNASWAMY). 
As per claims 24-26 (depending on claim 4), VALENTI in view of KHOURY does not expressly disclose the at least one test acoustic class comprises “fricatives”, “sibilants” and/or “plosives” respectively.  However, the same/similar concept/feature is well known in the art as evidenced by KRISHNASWAMY who, in the same field of endeavor, discloses ‘method and apparatus for secured authentication using voice biometrics and watermarking’ (title) concerning ‘speaker recognition to prevent spoofing or mimicry attempts of authorized user/customer’s voice’ (p1), comprising: speech recognition/verification/identification (p17 and p58), measuring .
	 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to QI HAN whose telephone number is (571)272-7604.  The examiner can normally be reached on 9-19:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.                                                                                                                                                                                          

QH/qh
March 21, 2022 
/QI HAN/Primary Examiner, Art Unit 2659