DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on June 9, 2021. 
Claims 1-20 are pending in the application. As such, claims 1-20 have been examined. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings were received on June 9, 2021.  These drawings have been accepted and considered by the Examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-11, 13-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Fer et al. (US Patent Pub. No. 2016/0352708), hereinafter Fer, in view of Chari et al. (US Patent Pub. No. 2018/0205726), hereinafter Chari.
Regarding claim 1, Fer teaches a method comprising: 
establishing a conference session with a plurality of participant user devices (Fer [0031] FIG. 2 is a schematic diagram of an example of a system 200 for conducting a conference call among a plurality of participant devices 202, 204, 206. The plurality of participant devices 202, 204, 206 are each connected to an associated accessory device 212, 214, 216. Each accessory device 212, 214, 216 includes a participant secure element 222, 224, 226, an audio input 203, 205, 207 and a data network interface. Each participant secure element 222, 224, 226 maintains key information not accessible to the associated participant device 202, 204, 206, and each participant secure element 222, 224, 226 is configured to perform scrambling and unscrambling of the media signals (audio and/or video) communicated during the conference call. The data network interface in each accessory device 212, 214, 216 is configured to maintain a secure media session with each of the other accessory devices 212, 214, 216); 
receiving, via the conference session, a digitized audio signal from a participant user device of the plurality of participant user devices (Fer [0031] FIG. 2 is a schematic diagram of an example of a system 200 for conducting a conference call among a plurality of participant devices 202, 204, 206. The plurality of participant devices 202, 204, 206 are each connected to an associated accessory device 212, 214, 216. Each accessory device 212, 214, 216 includes a participant secure element 222, 224, 226, an audio input 203, 205, 207 and a data network interface. Each participant secure element 222, 224, 226 maintains key information not accessible to the associated participant device 202, 204, 206, and each participant secure element 222, 224, 226 is configured to perform scrambling and unscrambling of the media signals (audio and/or video) communicated during the conference call. The data network interface in each accessory device 212, 214, 216 is configured to maintain a secure media session with each of the other accessory devices 212, 214, 216).
Fer does not teach, however Chari teaches
establishing a user account identity associated with the participant user device (Chari [0030] The cognitive system further builds behavior and context based profiles for callers (herein referred to as users), that indicate the speaking behavior (e.g., intervals between words, inflections, or any other audio characteristics of the users speaking patterns) and context information (e.g., word or terminology usage) for the user); 
determining reference speech mannerism features associated with the user account identity (Chari [0070] Marked portions of the text data, and its corresponding speech pattern metadata, are provided to the user profile comparison engine 126 which compares the terms/phrases used in the marked portion of text, as well as the speech pattern features of the marked portion of text, to term/phrase usage information and speech pattern features associated with the user profile of the user of the communication device 140, as retrieved from the user profiles database 127. A degree of matching between the user profile's term/phrase usage information and speech pattern features and those extracted from the marked portions of the communication is calculated to generate a confidence score indicative of how confident the engine 120 is that the marked portion is fraudulent, or not fraudulent based on the particular implementation); 
converting the digitized audio signal to text (Chari [0055] Thus, as a conversation is being conducted between the caller and the callee, the textual or audio (converted to text) input may be input to the QA pipeline for processing); 
generating, based on the text, observed speech mannerism features that are exhibited by the digitized audio signal (Chari [0065] The voice to text conversion preferably retains voice characteristic information about the original voice input as metadata. For example, the voice metadata may include information including length of pauses between spoken words, inflections, accents, or any other characteristics indicative of the speech patterns or manner by which the user speaks); 
determining a similarity measure between the reference speech mannerism features and the observed speech mannerism features (Chari [0090] The user profile comparison engine 396 compares user profile speech patterns, obtained from the user profile retrieved from the user profiles database 397 for the purported user providing the communication input, to the speech pattern data for the marked portions of the communication. The user profile comparison engine 396 calculates a degree of matching between the speech pattern of the user and the speech pattern present in the portions of the communications to generate an initial indication as to which of the marked portions of the communications are likely to be fraudulent and thereby flag those portions of the communications for further evaluation); 
validating an integrity of the digitized audio signal based on the similarity measure (Chari [0093]. The use of cognitive speech pattern analysis on the targeted portions of the communications allows for a more accurate indication of whether or not the user is in fact the person that the user alleges they are so that a more accurate determination of whether the communication is fraudulent or not can be performed); 
and selectively maintaining the participant user device in the conference session based on the validating. (Chari [0071] In other cases, the operation may be to log the communications for future action. In still further cases, the operation may be to disable access to an account or data attempting to be accessed as part of the communication. Of course any combination of operations for responding to fraudulent communications may be used without departing from the spirit and scope of the illustrative embodiments).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.

Regarding claim 8, Fer teaches an apparatus, comprising: 
a network interface configured to enable network communications; one or more processors; and one or more memories storing instructions that when executed configure the one or more processors to perform operations (Fer [0009] In another example implementation, a conference call server is provided. The conference call server comprises a communication interface configured to communicate over a data network with a plurality of participant devices. Each participant device is connected to an accessory device having audio input and output devices and a participant secure element for maintaining participant key information. The communication interface communicates scrambled audio signals with each accessory device on a corresponding secure media session relayed by the associated participant device. A cryptographic interface is connected to a plurality of server secure elements configured to scramble and unscramble audio signals communicated with the accessory devices connected to corresponding participant devices using server key information stored therein. An audio mixer mixes audio signals from incoming scrambled audio signals unscrambled by the server secure element corresponding to the accessory devices connected to the plurality of participant devices. The audio mixer mixes the audio signals to generate conference call data to be communicated to the users in the conference call as a mixed audio signal. The mixed audio signal is provided to the cryptographic interface. Each server secure element generates a scrambled audio signal to provide to the cryptographic interface to communicate via the communication interface to each participant device to relay to its associated accessory device)
comprising: 
establishing a conference session with a plurality of participant user devices (Fer [0031] FIG. 2 is a schematic diagram of an example of a system 200 for conducting a conference call among a plurality of participant devices 202, 204, 206. The plurality of participant devices 202, 204, 206 are each connected to an associated accessory device 212, 214, 216. Each accessory device 212, 214, 216 includes a participant secure element 222, 224, 226, an audio input 203, 205, 207 and a data network interface. Each participant secure element 222, 224, 226 maintains key information not accessible to the associated participant device 202, 204, 206, and each participant secure element 222, 224, 226 is configured to perform scrambling and unscrambling of the media signals (audio and/or video) communicated during the conference call. The data network interface in each accessory device 212, 214, 216 is configured to maintain a secure media session with each of the other accessory devices 212, 214, 216); 
receiving, via the conference session, a digitized audio signal from a participant user device of the plurality of participant user devices (Fer [0031] FIG. 2 is a schematic diagram of an example of a system 200 for conducting a conference call among a plurality of participant devices 202, 204, 206. The plurality of participant devices 202, 204, 206 are each connected to an associated accessory device 212, 214, 216. Each accessory device 212, 214, 216 includes a participant secure element 222, 224, 226, an audio input 203, 205, 207 and a data network interface. Each participant secure element 222, 224, 226 maintains key information not accessible to the associated participant device 202, 204, 206, and each participant secure element 222, 224, 226 is configured to perform scrambling and unscrambling of the media signals (audio and/or video) communicated during the conference call. The data network interface in each accessory device 212, 214, 216 is configured to maintain a secure media session with each of the other accessory devices 212, 214, 216).
Fer does not teach, however Chari teaches
establishing a user account identity associated with the participant user device (Chari [0030] The cognitive system further builds behavior and context based profiles for callers (herein referred to as users), that indicate the speaking behavior (e.g., intervals between words, inflections, or any other audio characteristics of the users speaking patterns) and context information (e.g., word or terminology usage) for the user); 
determining reference speech mannerism features associated with the user account identity (Chari [0070] Marked portions of the text data, and its corresponding speech pattern metadata, are provided to the user profile comparison engine 126 which compares the terms/phrases used in the marked portion of text, as well as the speech pattern features of the marked portion of text, to term/phrase usage information and speech pattern features associated with the user profile of the user of the communication device 140, as retrieved from the user profiles database 127. A degree of matching between the user profile's term/phrase usage information and speech pattern features and those extracted from the marked portions of the communication is calculated to generate a confidence score indicative of how confident the engine 120 is that the marked portion is fraudulent, or not fraudulent based on the particular implementation); 
converting the digitized audio signal to text (Chari [0055] Thus, as a conversation is being conducted between the caller and the callee, the textual or audio (converted to text) input may be input to the QA pipeline for processing); 
generating, based on the text, observed speech mannerism features that are exhibited by the digitized audio signal (Chari [0065] The voice to text conversion preferably retains voice characteristic information about the original voice input as metadata. For example, the voice metadata may include information including length of pauses between spoken words, inflections, accents, or any other characteristics indicative of the speech patterns or manner by which the user speaks); 
determining a similarity measure between the reference speech mannerism features and the observed speech mannerism features (Chari [0090] The user profile comparison engine 396 compares user profile speech patterns, obtained from the user profile retrieved from the user profiles database 397 for the purported user providing the communication input, to the speech pattern data for the marked portions of the communication. The user profile comparison engine 396 calculates a degree of matching between the speech pattern of the user and the speech pattern present in the portions of the communications to generate an initial indication as to which of the marked portions of the communications are likely to be fraudulent and thereby flag those portions of the communications for further evaluation); 
validating an integrity of the digitized audio signal based on the similarity measure (Chari [0093]. The use of cognitive speech pattern analysis on the targeted portions of the communications allows for a more accurate indication of whether or not the user is in fact the person that the user alleges they are so that a more accurate determination of whether the communication is fraudulent or not can be performed); 
and selectively maintaining the participant user device in the conference session based on the validating. (Chari [0071] In other cases, the operation may be to log the communications for future action. In still further cases, the operation may be to disable access to an account or data attempting to be accessed as part of the communication. Of course any combination of operations for responding to fraudulent communications may be used without departing from the spirit and scope of the illustrative embodiments).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.

Regarding claim 15, Fer teaches 
a non-transitory computer readable storage medium comprising instructions that when executed configure one or more processors to perform operations (Fer [0009] In another example implementation, a conference call server is provided. The conference call server comprises a communication interface configured to communicate over a data network with a plurality of participant devices. Each participant device is connected to an accessory device having audio input and output devices and a participant secure element for maintaining participant key information. The communication interface communicates scrambled audio signals with each accessory device on a corresponding secure media session relayed by the associated participant device. A cryptographic interface is connected to a plurality of server secure elements configured to scramble and unscramble audio signals communicated with the accessory devices connected to corresponding participant devices using server key information stored therein. An audio mixer mixes audio signals from incoming scrambled audio signals unscrambled by the server secure element corresponding to the accessory devices connected to the plurality of participant devices. The audio mixer mixes the audio signals to generate conference call data to be communicated to the users in the conference call as a mixed audio signal. The mixed audio signal is provided to the cryptographic interface. Each server secure element generates a scrambled audio signal to provide to the cryptographic interface to communicate via the communication interface to each participant device to relay to its associated accessory device)
comprising: 
establishing a conference session with a plurality of participant user devices (Fer [0031] FIG. 2 is a schematic diagram of an example of a system 200 for conducting a conference call among a plurality of participant devices 202, 204, 206. The plurality of participant devices 202, 204, 206 are each connected to an associated accessory device 212, 214, 216. Each accessory device 212, 214, 216 includes a participant secure element 222, 224, 226, an audio input 203, 205, 207 and a data network interface. Each participant secure element 222, 224, 226 maintains key information not accessible to the associated participant device 202, 204, 206, and each participant secure element 222, 224, 226 is configured to perform scrambling and unscrambling of the media signals (audio and/or video) communicated during the conference call. The data network interface in each accessory device 212, 214, 216 is configured to maintain a secure media session with each of the other accessory devices 212, 214, 216); 
receiving, via the conference session, a digitized audio signal from a participant user device of the plurality of participant user devices (Fer [0031] FIG. 2 is a schematic diagram of an example of a system 200 for conducting a conference call among a plurality of participant devices 202, 204, 206. The plurality of participant devices 202, 204, 206 are each connected to an associated accessory device 212, 214, 216. Each accessory device 212, 214, 216 includes a participant secure element 222, 224, 226, an audio input 203, 205, 207 and a data network interface. Each participant secure element 222, 224, 226 maintains key information not accessible to the associated participant device 202, 204, 206, and each participant secure element 222, 224, 226 is configured to perform scrambling and unscrambling of the media signals (audio and/or video) communicated during the conference call. The data network interface in each accessory device 212, 214, 216 is configured to maintain a secure media session with each of the other accessory devices 212, 214, 216).
Fer does not teach, however Chari teaches
establishing a user account identity associated with the participant user device (Chari [0030] The cognitive system further builds behavior and context based profiles for callers (herein referred to as users), that indicate the speaking behavior (e.g., intervals between words, inflections, or any other audio characteristics of the users speaking patterns) and context information (e.g., word or terminology usage) for the user); 
determining reference speech mannerism features associated with the user account identity (Chari [0070] Marked portions of the text data, and its corresponding speech pattern metadata, are provided to the user profile comparison engine 126 which compares the terms/phrases used in the marked portion of text, as well as the speech pattern features of the marked portion of text, to term/phrase usage information and speech pattern features associated with the user profile of the user of the communication device 140, as retrieved from the user profiles database 127. A degree of matching between the user profile's term/phrase usage information and speech pattern features and those extracted from the marked portions of the communication is calculated to generate a confidence score indicative of how confident the engine 120 is that the marked portion is fraudulent, or not fraudulent based on the particular implementation); 
converting the digitized audio signal to text (Chari [0055] Thus, as a conversation is being conducted between the caller and the callee, the textual or audio (converted to text) input may be input to the QA pipeline for processing); 
generating, based on the text, observed speech mannerism features that are exhibited by the digitized audio signal (Chari [0065] The voice to text conversion preferably retains voice characteristic information about the original voice input as metadata. For example, the voice metadata may include information including length of pauses between spoken words, inflections, accents, or any other characteristics indicative of the speech patterns or manner by which the user speaks); 
determining a similarity measure between the reference speech mannerism features and the observed speech mannerism features (Chari [0090] The user profile comparison engine 396 compares user profile speech patterns, obtained from the user profile retrieved from the user profiles database 397 for the purported user providing the communication input, to the speech pattern data for the marked portions of the communication. The user profile comparison engine 396 calculates a degree of matching between the speech pattern of the user and the speech pattern present in the portions of the communications to generate an initial indication as to which of the marked portions of the communications are likely to be fraudulent and thereby flag those portions of the communications for further evaluation); 
validating an integrity of the digitized audio signal based on the similarity measure (Chari [0093]. The use of cognitive speech pattern analysis on the targeted portions of the communications allows for a more accurate indication of whether or not the user is in fact the person that the user alleges they are so that a more accurate determination of whether the communication is fraudulent or not can be performed); 
and selectively maintaining the participant user device in the conference session based on the validating. (Chari [0071] In other cases, the operation may be to log the communications for future action. In still further cases, the operation may be to disable access to an account or data attempting to be accessed as part of the communication. Of course any combination of operations for responding to fraudulent communications may be used without departing from the spirit and scope of the illustrative embodiments).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.


Regarding claims 2, 9 and 16, Fer in view of Chari teaches the method and apparatus and non-transitory computer readable storage medium of claims 1, 8 and 15.
Fer does not teach, however Chari teaches
wherein the generating of the observed speech mannerism features comprises 
determining, based on the text, one or more features of speech pauses, grammatical errors, use of idioms, use of phrases, use of filler words, or word choices (Chari [0065] The voice to text conversion preferably retains voice characteristic information about the original voice input as metadata. For example, the voice metadata may include information including length of pauses between spoken words, inflections, accents, or any other characteristics indicative of the speech patterns or manner by which the user speaks), 
wherein the determining of the similarity measure is based on the one or more features (Chari [0090] The user profile comparison engine 396 compares user profile speech patterns, obtained from the user profile retrieved from the user profiles database 397 for the purported user providing the communication input, to the speech pattern data for the marked portions of the communication. The user profile comparison engine 396 calculates a degree of matching between the speech pattern of the user and the speech pattern present in the portions of the communications to generate an initial indication as to which of the marked portions of the communications are likely to be fraudulent and thereby flag those portions of the communications for further evaluation).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.

Regarding claims 3, 10 and 17, Fer in view of Chari teaches the method and apparatus and non-transitory computer readable storage medium of claims 1, 8 and 15.
Fer does not teach, however Chari teaches
further comprising: 
validating the integrity of the digitized audio signal (Chari [0093]. The use of cognitive speech pattern analysis on the targeted portions of the communications allows for a more accurate indication of whether or not the user is in fact the person that the user alleges they are so that a more accurate determination of whether the communication is fraudulent or not can be performed); 
associating, based on the validating, the observed speech mannerism features with the user account identity in a data store (Chari [0030] The cognitive system further builds behavior and context based profiles for callers (herein referred to as users), that indicate the speaking behavior (e.g., intervals between words, inflections, or any other audio characteristics of the users speaking patterns) and context information (e.g., word or terminology usage) for the user); 
and second validating a second integrity of a second digitized audio signal based on the observed speech mannerism features (Chari [0031] This score may be compared to a threshold score which, if met or exceeded, indicates the call to be most likely fraudulent and initiates an operation, such as sending an alert message to an authorized individual, such as a human operator of the callee engaged in the communication, logging call and caller information, rejecting an access attempt by the caller, initiating a secondary authentication operation, such as requiring the user/caller to perform an operation via another device or communication system on a registered device, or the like. In this way, cognitive intelligence based targeted identification of potentially fraudulent communications with a callee may be performed).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.

Regarding claims 4, 11 and 18, Fer in view of Chari teaches the method and apparatus and non-transitory computer readable storage medium of claims 1, 8 and 15.
Fer does not teach, however Chari teaches
wherein the determining of the similarity measure is provided by a machine learning model (Chari [0032] A cognitive system comprises artificial intelligence logic, such as natural language processing (NLP) based logic, for example, and machine learning logic, which may be provided as specialized hardware, software executed on hardware, or any combination of specialized hardware and software executed on hardware).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.

Regarding claims 6, 13 and 20, Fer in view of Chari teaches the method and apparatus and non-transitory computer readable storage medium of claims 1, 8 and 15.
Fer does not teach, however Chari teaches
further comprising: 
establishing a training session with the participant user device, the training session based on the user account identity (Chari [0066] The textual data and speech pattern metadata generated from the conversion is provided as input to the cognitive system 100 and associated request processing pipeline 108 for processing. It should be noted that when the user of communication device 140 initiates the communication with communication device 150 via computing device 105, the user submits an identifier of who the user purports to be, e.g., a name, user identifier, phone number, or other identifier. This information may be passed to the cognitive system 100 as part of the information associated with the communication session between the communication device 140 and communication device 150. The cognitive system 100 may use this user identifier information to retrieve a corresponding user profile for the user, or generate a new user profile for the user based on analysis of the current voice communications); 
generating, over the training session, an audio prompt (Chari [0013] The illustrative embodiments provide mechanisms for implementing cognitive intelligence to detect voice fraud and authenticate a caller. As noted above, in known voice authentication systems, the caller is typically authenticated by the callee, whether it be a human being or automated system, by requesting that the caller provide the particular security information for the account or information that the caller is attempting to access. For example, the callee may request that the caller provide the caller's home address, social security number (SSN), personal identifier number (PIN), password, etc. which may then be verified with stored information associated with the account or information attempting to be accessed. Thus, if the caller provides the correct security information, it is assumed that the caller's identity is the identity of the person whose account or information is being accessed); 
receiving, over the training session, a training audio signal (Chari [0066] The textual data and speech pattern metadata generated from the conversion is provided as input to the cognitive system 100 and associated request processing pipeline 108 for processing. It should be noted that when the user of communication device 140 initiates the communication with communication device 150 via computing device 105, the user submits an identifier of who the user purports to be, e.g., a name, user identifier, phone number, or other identifier. This information may be passed to the cognitive system 100 as part of the information associated with the communication session between the communication device 140 and communication device 150. The cognitive system 100 may use this user identifier information to retrieve a corresponding user profile for the user, or generate a new user profile for the user based on analysis of the current voice communications); 
converting the training audio signal to training text (Chari [0066] The textual data and speech pattern metadata generated from the conversion is provided as input to the cognitive system 100 and associated request processing pipeline 108 for processing. It should be noted that when the user of communication device 140 initiates the communication with communication device 150 via computing device 105, the user submits an identifier of who the user purports to be, e.g., a name, user identifier, phone number, or other identifier. This information may be passed to the cognitive system 100 as part of the information associated with the communication session between the communication device 140 and communication device 150. The cognitive system 100 may use this user identifier information to retrieve a corresponding user profile for the user, or generate a new user profile for the user based on analysis of the current voice communications); 
generating, based on the training text, training speech mannerism features exhibited by the training audio signal (Chari [0067] The request processing pipeline 108 of the cognitive system 100 performs initial processing of the unstructured text data to extract natural language features indicative of content of the communication, e.g., the terms and phrases used, and the like. Thus, through analysis of the text itself, the terms and phrases used by the user are identifiable, where each term may be associated with a count of the usage of each term indicating the terms that are most used by the user. The speech pattern metadata generated from the conversion of voice to text provides speech pattern features which, along with the term/phrase usage information, represents the user profile data for characterizing the way in which the user speaks. This information may be compiled and associated with the user identifier of the user to generate a user profile which is stored in a user profile database 127 in response to a user profile not already being present in the user profiles database); 
and associating the training speech mannerism features with the user account identity (Chari [0067] The request processing pipeline 108 of the cognitive system 100 performs initial processing of the unstructured text data to extract natural language features indicative of content of the communication, e.g., the terms and phrases used, and the like. Thus, through analysis of the text itself, the terms and phrases used by the user are identifiable, where each term may be associated with a count of the usage of each term indicating the terms that are most used by the user. The speech pattern metadata generated from the conversion of voice to text provides speech pattern features which, along with the term/phrase usage information, represents the user profile data for characterizing the way in which the user speaks. This information may be compiled and associated with the user identifier of the user to generate a user profile which is stored in a user profile database 127 in response to a user profile not already being present in the user profiles database).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.

Regarding claims 7 and 14, Fer in view of Chari teaches the method and apparatus of claims 1 and 8.
Fer does not teach, however Chari teaches
further comprising validating an integrity of a respective digitized audio signal of each of the plurality of participant user devices (Chari [0072] It should be appreciated that while the depicted example shows the cognitive system 100 and voice authentication and fraud detection engine 120 being associated with or deployed in the protected or protective computing system 104, the illustrative embodiments are not limited to such. Rather, in other illustrative embodiments, the computing device 104 may operate as a centralized processing system which receives inputs from agent software modules executing on one or more other computing systems 104 that handle communications from users. In such embodiments, the agents deployed at these other computing systems 104 may convert communications to unstructured textual content and provide that content to the cognitive system 100 and voice authentication and fraud detection engine 120 for processing, with corresponding results being returned to the agents indicating whether or not the communications are likely to be fraudulent. Based on the results of such processing, each individual computing system 104 may have its own fraud response systems which may be invoked for handling the results of detection of a fraudulent communication, similar to that described above with regard to fraud response system).
Chari is considered to be analogous to the claimed invention because it is in the same field of caller voice authentication. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fer further in view of Chari to allow for authenticating via speech patterns and mannerisms. Doing so would allow for targeted analysis of speech patterns of the user with regard to trigger terms/phrases which identify portions of communications directed to areas where potential security issues may be present.

Claims 5, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Fer in view of Chari in view of Nguyen et al. (US Patent Pub. No. 2021/0049700), hereinafter Nguyen.

Regarding claims 5, 12 and 19, Fer in view of Chari teaches the method and apparatus and non-transitory computer readable storage medium of claims 4, 11 and 18.
Fer in view of Chari does not teach, however Nguyen teaches
wherein the machine learning model is a stochastic classifier (Nguyen [0088] Specifically, some embodiments described herein relate to architectures for machine learning and natural language processing, including, the use of recurrent neural networks (RNN) operating or long short-term memory architecture (LSTM). Validation of the results is provided in the figures, indicating an improved level of accuracy and an improved f1 score. Machine learning or neural network approaches, such as a stochastic gradient descent classifier, and the one class support vector machine and isolation forest mechanisms are described in further embodiments).
Since Nguyen and Fer are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of a machine learning model incorporating a stochastic classifier in order to improve caller voice authentication. One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J. MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 8:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J. MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657