DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 8/20/2020. Claims 1-20 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

3.	Claims 1, 2, 5-9, 12-16 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable Hansen and T. Hasan, "Speaker Recognition by Machines and Humans: A tutorial review," in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 74-99, Nov. 2015in view of Lu, L., Liu, L., Hussain, M. J., & Liu, Y. (2017). I sense you by breath: Speaker recognition via breath biometrics. IEEE Transactions on Dependable and Secure Computing, 17(2), 306-319.

    PNG
    media_image1.png
    412
    820
    media_image1.png
    Greyscale
Regarding claim 1, Hansen teaches a method, comprising: obtaining, by an electronic device, an audio segment comprising one or more audio events of a target subject (see Hansen, pg. 83, fig.4, col 2 In automatic speaker recognition, computer programs designed to operate independently with minimum human intervention identify a speaker's voice. The system user may adjust the design parameters, but to make the comparison between speech segments, all the user needs to do is provide the system with the audio recordings); extracting, by the electronic device, audio embeddings from the one or more audio events using an embedding model, the embedding model comprising a trained machine learning model (see Hansen, fig. 4 and pg. 84, fig.4, col 1  Predefined feature parameters are first extracted from the audio recordings that are designed to capture the idiosyncratic characteristics of a person's speech in mathematical parameters. These features obtained from an enrollment speaker are used to build/train mathematical models that summarize their speaker-dependent properties; mathematical models are interpreted as the embedding model) ; comparing, by the electronic device, the extracted audio embeddings with a match profile of the target subject, the match profile generated during an enrollment stage (see Jansen, pg. 84, col 1 For an unknown test segment, the same features are then extracted, and they are compared against the model of the enrollment/claimed speaker. The models are designed so that such a comparison provides a score (a scalar value) indicating whether the two utterances are from the same speaker. If this score is higher (or lower) than a predefined threshold then the system accepts (or rejects) the test speaker). However, Hansen fails to teach generating, by the electronic device, a label for the audio segment based on whether or not the extracted audio embeddings match the match profile, wherein the label enables correlation of the audio segment with the target subject for monitoring a health condition of the target subject.  However Lu teaches generating, by the electronic device, a label for the audio segment based on whether or not the extracted audio embeddings match the match profile, wherein the label enables correlation of the audio segment with the target subject for monitoring a health 
    PNG
    media_image2.png
    389
    550
    media_image2.png
    Greyscale
condition of the target subject (see Lu, pg. 313 sect 4.3 During verification step, breath samples from unknown speaker are matched to stored and reference models, and a similarity score is calculated. In analogy to training phase, we evaluate GMM, HMM, SVM, ANN and KNN algorithms, and a simple similarity based scheme (we term it as “Decision Maker”, labelled as BreathID in Figs. 7c and 7d).  
Hansen and Lu are considered to be analogous to the claimed invention because they relate to speaker recognition using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Hansen on using a trained model to identify and process audio recording with the breath fingerprint processing teachings of Lu to process and verify a user’s identity based on the breath features (see Lu, pg. 306, sect. 1).
the method of Claim 1. Hansen further teaches performing the enrollment stage by the electronic device, comprising:
obtaining enrollment audio data of the target subject, the enrollment audio data comprising samples of one or more enrollment audio events of the target subject (see Hansen, pg. 84, col 1 Predefined feature parameters are first extracted from the audio recordings that are designed to capture the idiosyncratic characteristics of a person's speech in mathematical parameters); extracting enrollment audio embeddings associated with the target subject from the one or more enrollment audio events using the embedding model (see Hansen, pg. 84, col 1, These features obtained from an enrollment speaker are used to build/train mathematical models that summarize their speaker-dependent properties; mathematical models are interpreted as the embedding model); and creating the match profile of the target subject using the extracted enrollment audio embeddings (see Hansen, pg. 84, col. 1, The models are designed so that such a comparison provides a score (a scalar value) indicating whether the two utterances are from the same speaker).
Regarding claim 5, Hansen and Lu teach the method of Claim 1. Hansen further teaches, wherein comparing the extracted audio embeddings with the match profile of the target subject comprises evaluating one or more distancing metrics (see Hansen, pg. 91, col. 2, In [79], the cosine similarity measure-based scoring was proposed for speaker verification. In this measure, the match score between a target and test i-vector wtarget and wtest is computed as their normalized dot product).
Regarding claim 6, Hansen and Lu teach the method of Claim 1. Hansen further teaches, wherein the audio segment comprises audio data received from multiple devices (see 
    PNG
    media_image3.png
    208
    342
    media_image3.png
    Greyscale
Hansen, Fig.1, pg. 76, to consider variability, Figure 1 highlights a range of factors that can contribute to mismatch for speaker recognition. These can be partitioned based on three broad classes: 1) speaker based, 2) conversation based, and 3) technology based. Technology-or external-based variability sources: these include how and where the audio is captured; Fig. 1 depicts audio captured from various devices).
Regarding claim 7, Hansen and Lu teach the method of Claim 1. Hansen further teaches, wherein the one or more audio events comprise at least one of a cough, a sneeze, or a speech of the target subject (see Hansen, Fig.1, pg. 76, to consider variability, Figure 1 highlights a range of factors that can contribute to mismatch for speaker recognition. These can be partitioned based on three broad classes: 1) speaker based, 2) conversation based, and 3) technology based. Speaker-based variability sources: these reflect a range of changes in how a speaker produces speech and will affect system performance for speaker recognition. These can be thought of as intrinsic or within-speaker variability and include the following factors including Physiological—the subject has some illness or is intoxicated or under the influence of medication; this can include aging as well).
Regarding claim 8, Hansen teaches an electronic device comprising: at least one memory configured to store instructions; and a processor configured when executing the instructions (see Hansen, pg. 83, col. 1, in automatic speaker recognition, computer programs designed to operate independently with minimum human intervention identify a speaker's voice; computer program is interpreted as the processor configured to execute instructions) to: obtain an audio segment comprising one or more audio events of a target subject (see Hansen, pg. 83, fig.4, col 2 In automatic speaker recognition, computer programs designed to operate independently with minimum human intervention identify a speaker's voice. The system user may adjust the design parameters, but to make the comparison between speech segments, all the user needs to do is provide the system with the audio recordings); extract audio embeddings from the one or more audio events using an embedding model, the embedding model comprising 
    PNG
    media_image1.png
    412
    820
    media_image1.png
    Greyscale
a trained machine learning model (see Hansen, fig. 4 and pg. 84, fig.4, col 1  Predefined feature parameters are first extracted from the audio recordings that are designed to capture the idiosyncratic characteristics of a person's speech in mathematical parameters. These features obtained from an enrollment speaker are used to build/train mathematical models that summarize their speaker-dependent properties; mathematical models are interpreted as the embedding model); compare the extracted audio embeddings with a match profile of the target subject, the match profile generated during an enrollment stage(see Jansen, pg. 84, col 1 For an unknown test segment, the same features are then extracted, and they are compared against the model of the enrollment/claimed speaker. The models are designed so that such a comparison provides a score (a scalar value) indicating whether the two utterances are from the same speaker. If this score is higher (or lower) than a predefined threshold then the system accepts (or rejects) the test speaker). However, Hansen fails to teach generate a label for the audio segment based on whether or not the extracted audio embeddings match the match profile, wherein the label enables correlation of the audio segment with the target subject for monitoring a health condition of the target subject. However Lu teaches generate a label for the audio segment based on whether or not the extracted audio embeddings match the match profile, wherein the label enables correlation of the audio segment with the target 
    PNG
    media_image2.png
    389
    550
    media_image2.png
    Greyscale
subject for monitoring a health condition of the target subject(see Lu, pg. 313 sect 4.3 During verification step, breath samples from unknown speaker are matched to stored and reference models, and a similarity score is calculated. In analogy to training phase, we evaluate GMM, HMM, SVM, ANN and KNN algorithms, and a simple similarity based scheme (we term it as “Decision Maker”, labelled as BreathID in Figs. 7c and 7d).  
Hansen and Lu are considered to be analogous to the claimed invention because they relate to speaker recognition using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Hansen on using a trained model to identify and process audio recording with the breath fingerprint processing teachings of Lu to process and verify a user’s identity based on the breath features (see Lu, pg. 306, sect. 1).
Regarding claim 9, Hansen and Lu teach the electronic device of Claim 8. Hansen further teaches wherein: the processor is further configured to perform the enrollment stage; and to perform the enrollment stage, the processor is further configured to: obtain enrollment audio data of the target subject, the enrollment audio data comprising samples of one or more enrollment audio events of the target subject(see Hansen, pg. 84, col 1 Predefined feature parameters are first extracted from the audio recordings that are designed to capture the idiosyncratic characteristics of a person's speech in mathematical parameters); extract enrollment audio embeddings associated with the target subject from the one or more enrollment audio events using the embedding model(see Hansen, pg. 84, col 1, These features obtained from an enrollment speaker are used to build/train mathematical models that summarize their speaker-dependent properties; mathematical models are interpreted as the embedding model); and create the match profile of the target subject using the extracted enrollment audio embeddings(see Hansen, pg. 84, col. 1, The models are designed so that such a comparison provides a score (a scalar value) indicating whether the two utterances are from the same speaker).
Regarding claim 12, Hansen and Lu teach the electronic device of Claim 8. Hansen further teaches, wherein to compare the extracted audio embeddings with the match profile of the target subject, the processor is further configured to evaluate one or more distancing metrics(see Hansen, pg. 91, col. 2, In [79], the cosine similarity measure-based scoring was proposed for speaker verification. In this measure, the match score between a target and test i-vector wtarget and wtest is computed as their normalized dot product).

    PNG
    media_image3.png
    208
    342
    media_image3.png
    Greyscale
Regarding claim 13, Hansen and Lu teach the electronic device of Claim 8. Hansen further teaches, wherein the audio segment comprises audio data received from multiple devices (see Hansen, Fig.1, pg. 76, to consider variability, Figure 1 highlights a range of factors that can contribute to mismatch for speaker recognition. These can be partitioned based on three broad classes: 1) speaker based, 2) conversation based, and 3) technology based. Technology-or external-based variability sources: these include how and where the audio is captured; Fig. 1 depicts audio captured from various devices).
Regarding claim 14, Hansen and Lu teach the electronic device of Claim 8. Hansen wherein the one or more audio events comprise at least one of a cough, a sneeze, or a speech of the target subject (see Hansen, Fig.1, pg. 76, to consider variability, Figure 1 highlights a range of factors that can contribute to mismatch for speaker recognition. These can be partitioned based on three broad classes: 1) speaker based, 2) conversation based, and 3) technology based. Speaker-based variability sources: these reflect a range of changes in how a speaker produces speech and will affect system performance for speaker recognition. These can be thought of as intrinsic or within-speaker variability and include the following factors including Physiological—the subject has some illness or is intoxicated or under the influence of medication; this can include aging as well).
Regarding claim 15, Hansen teaches a non-transitory computer readable medium containing computer readable program code that, when executed, causes at least one processor of an electronic device(see Hansen, pg. 83, col. 1, in automatic speaker recognition, computer programs designed to operate independently with minimum human intervention identify a speaker's voice; computer program is interpreted as being stored in a computer readable medium) to: obtain an audio segment comprising one or more audio events of a target subject (see Hansen, pg. 83, fig.4, col 2 In automatic speaker recognition, computer programs designed to operate independently with minimum human intervention identify a speaker's voice. The system user may adjust the design parameters, but to make the comparison between speech segments, all the user needs to do is provide the system with the audio recordings); extract audio embeddings from the one or more audio events using an 
    PNG
    media_image1.png
    412
    820
    media_image1.png
    Greyscale
embedding model, the embedding model comprising a trained machine learning model (see Hansen, fig. 4 and pg. 84, fig.4, col 1  Predefined feature parameters are first extracted from the audio recordings that are designed to capture the idiosyncratic characteristics of a person's speech in mathematical parameters. These features obtained from an enrollment speaker are used to build/train mathematical models that summarize their speaker-dependent properties; mathematical models are interpreted as the embedding model); compare the extracted audio embeddings with a match profile of the target subject, the match profile generated during an enrollment stage(see Jansen, pg. 84, col 1 For an unknown test segment, the same features are then extracted, and they are compared against the model of the enrollment/claimed speaker. The models are designed so that such a comparison provides a score (a scalar value) indicating whether the two utterances are from the same speaker. If this score is higher (or lower) than a predefined threshold then the system accepts (or rejects) the test speaker). However, Hansen fails to teach generate a label for the audio segment based on whether or not the extracted audio embeddings match the match profile, wherein the label enables correlation of the audio segment with the target subject for monitoring a health condition of the target subject. However Lu teaches generate a label for the audio segment based on whether or not the extracted audio embeddings match the match profile, wherein the label enables correlation of the audio segment with the target subject for monitoring a health condition of the target subject (see Lu, pg. 313 sect 4.3 During verification step, breath samples from unknown speaker are matched to stored and reference models, and a similarity score is calculated. In analogy to training phase, we evaluate GMM, HMM, SVM, ANN and KNN algorithms, and a simple similarity based scheme (we term it as “Decision Maker”, labelled as BreathID in Figs. 7c and 7d).  
Hansen and Lu are considered to be analogous to the claimed invention because they relate to speaker recognition using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Hansen on using a trained model to identify and process audio recording with the breath fingerprint processing teachings of Lu to process and verify a user’s identity based on the breath features (see Lu, pg. 306, sect. 1).
Regarding claim 16, Hansen and Lu teach the non-transitory computer readable medium of Claim 15. Hansen further teaches wherein the computer readable program code further causes the at least one processor to: perform the enrollment stage, comprising: obtain enrollment audio data of the target subject, the enrollment audio data comprising samples of one or more enrollment audio events of the target subject(see Hansen, pg. 84, col 1 Predefined feature parameters are first extracted from the audio recordings that are designed to capture the idiosyncratic characteristics of a person's speech in mathematical parameters); extract enrollment audio embeddings associated with the target subject from the one or more enrollment audio events using the embedding model extract enrollment audio embeddings associated with the target subject from the one or more enrollment audio events using the embedding model(see Hansen, pg. 84, col 1, These features obtained from an enrollment speaker are used to build/train mathematical models that summarize their speaker-dependent properties; mathematical models are interpreted as the embedding model); and create the match profile of the target subject using the extracted enrollment audio embeddings(see Hansen, pg. 84, col. 1, The models are designed so that such a comparison provides a score (a scalar value) indicating whether the two utterances are from the same speaker).
Regarding claim 19, Hansen and Lu teach the non-transitory computer readable medium of Claim 15. Hansen further teaches, wherein to compare the extracted audio embeddings with the match profile of the target subject, the processor is further configured to evaluate one or more distancing metrics (see Hansen, pg. 91, col. 2, In [79], the cosine similarity measure-based scoring was proposed for speaker verification. In this measure, the match score between a target and test i-vector wtarget and wtest is computed as their normalized dot product).
	Regarding claim 20, Hansen and Lu teach the non-transitory computer readable medium of Claim 15. Hansen further teaches, wherein the one or more audio events comprise at least one of a cough, a sneeze, or a speech of the target subject (see Hansen, Fig.1, pg. 76, to consider variability, Figure 1 highlights a range of factors that can contribute to mismatch for speaker recognition. These can be partitioned based on three broad classes: 1) speaker based, 2) conversation based, and 3) technology based. Speaker-based variability sources: these reflect a range of changes in how a speaker produces speech and will affect system performance for speaker recognition. These can be thought of as intrinsic or within-speaker variability and include the following factors including Physiological—the subject has some illness or is intoxicated or under the influence of medication; this can include aging as well).
4.	Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over J. H. L. Hansen and T. Hasan, "Speaker Recognition by Machines and Humans: A tutorial review," in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 74-99, Nov. 2015in view of Lu, L., Liu, L., (2017). I sense you by breath: Speaker recognition via breath biometrics. IEEE Transactions on Dependable and Secure Computing, 17(2), 306-319 further in view of Patel, et.al., US Patent Application Publication 2020/0029929 (referenced in Applicant’s IDS of 8/20/2020).
Regarding claim 3, Hansen and Lu teach the method of Claim 2 but fail to teach wherein: the samples are processed to extract one or more spectral audio features; and the embedding model transforms the spectral audio features to the enrollment audio embeddings in order to correlate the enrollment audio data with a physiological structure of the target subject. However, Patel teaches wherein: the samples are processed to extract one or more spectral audio features (see Patel, [0056] FIG. 3 is a flowchart of an example of cough reconstruction in accordance with an example of the present invention. In box 310, extracted features from an audio signal may be received. The extracted features may correspond to those, for example, stored in box 250 of FIG. 2. The extracted features may include the mean, normalization constant, and/or phase of the spectrogram in addition to projection scores, which may be used in reconstruction); and the embedding model transforms the spectral audio features to the enrollment audio embeddings in order to correlate the enrollment audio data with a physiological structure of the target subject (see Patel, [0051] So, for example, the frequency-based representation of the audio signal or portion of audio signal may be compared with the cough model including principal components indicative of coughs. The lesser-dimensional matrix provided in box 230 may include a score for each of the principal components of the audio signal or portion thereof based on the vectors of the cough model. A plurality of scores (e.g. one score per eigenvalue) may be obtained in box 230 for use in determining whether or not the audio signal or portion thereof corresponds with a cough).
see Patel, [0013]).
Regarding claim 10, Hansen and Lu teach the electronic device of Claim 9 but fail to teach wherein the processor is further configured to: process the samples to extract one or more spectral audio features; and use the embedding model to transform the spectral audio features to the enrollment audio embeddings in order to correlate the enrollment audio data with a physiological structure of the target subject. However, Patel teaches wherein the processor is further configured to: process the samples to extract one or more spectral audio features(see Patel, [0056] FIG. 3 is a flowchart of an example of cough reconstruction in accordance with an example of the present invention. In box 310, extracted features from an audio signal may be received. The extracted features may correspond to those, for example, stored in box 250 of FIG. 2. The extracted features may include the mean, normalization constant, and/or phase of the spectrogram in addition to projection scores, which may be used in reconstruction); and use the embedding model to transform the spectral audio features to the enrollment audio embeddings in order to correlate the enrollment audio data with a physiological structure of the target subject(see Patel, [0051] So, for example, the frequency-based representation of the audio signal or portion of audio signal may be compared with the cough model including principal components indicative of coughs. The lesser-dimensional matrix provided in box 230 may include a score for each of the principal components of the audio signal or portion thereof based on the vectors of the cough model. A plurality of scores (e.g. one score per eigenvalue) may be obtained in box 230 for use in determining whether or not the audio signal or portion thereof corresponds with a cough).
Hansen, Lu and Patel are considered to be analogous to the claimed invention because they relate to speech recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Hansen and Lu on using a trained model to identify and process audio recording with the classification of cough in an audio stream teachings of Patel to process a user’s cough without compromising user privacy (see Patel, [0013]).
Regarding claim 17, Hansen and Lu teach the non-transitory computer readable medium of Claim 16 but fail to teach wherein computer readable program code further causes the at least one processor to: process the samples to extract one or more spectral audio features; and use the embedding model to transform the spectral audio features to the enrollment audio embeddings in order to correlate the enrollment audio data with a physiological structure of the target subject. However, Patel teaches wherein the computer readable program code further causes the at least one processor to: process the samples to extract one or more spectral audio features (see Patel, [0056] FIG. 3 is a flowchart of an example of cough reconstruction in accordance with an example of the present invention. In box 310, extracted features from an audio signal may be received. The extracted features may correspond to those, for example, stored in box 250 of FIG. 2. The extracted features may include the mean, normalization constant, and/or phase of the spectrogram in addition to projection scores, which may be used in reconstruction); and use the embedding model to transform the spectral audio features to the enrollment audio embeddings in order to correlate the enrollment audio data with a physiological structure of the target subject(see Patel, [0051] So, for example, the frequency-based representation of the audio signal or portion of audio signal may be compared with the cough model including principal components indicative of coughs. The lesser-dimensional matrix provided in box 230 may include a score for each of the principal components of the audio signal or portion thereof based on the vectors of the cough model. A plurality of scores (e.g. one score per eigenvalue) may be obtained in box 230 for use in determining whether or not the audio signal or portion thereof corresponds with a cough).
Hansen, Lu and Patel are considered to be analogous to the claimed invention because they relate to speech recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Hansen and Lu on using a trained model to identify and process audio recording with the classification of cough in an audio stream teachings of Patel to process a user’s cough without compromising user privacy (see Patel, [0013]).





5.	Claims 4, 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over J. H. L. Hansen and T. Hasan, "Speaker Recognition by Machines and Humans: A tutorial review," in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 74-99, Nov. 2015 in view of Lu, L., Liu, L., Hussain, M. J., & Liu, Y. (2017). I sense you by breath: Speaker recognition via breath biometrics. IEEE Transactions on Dependable and Secure Computing, 17(2), 306-319, further in view of Patel, et.al., US Patent Application Publication 2020/0029929 (referenced in Applicant’s IDS of 8/20/2020) further in view of Gerl, et.al., US Patent Application Publication 2009/0119103.
Regarding claim 4, Hansen, Lu and Patel teach the method of Claim 3 but fail to teach the match profile of the target subject comprises a first match profile; and performing the enrollment stage further comprises: creating a second match profile by transforming the first match profile, wherein the first match profile corresponds to a first audio event of the target subject and the second match profile corresponds to a second audio event of the target subject. However Gerl teaches the match profile of the target subject comprises a first match profile (see Gerl, [0063] At 908, a speaker identification component 810 may identify a speaker. In this method, the segment of the current received utterance is processed to determine likelihood functions with respect to each speaker model within the speaker model set. At start-up, the speaker model set may include the UBM. In time, additional speaker models will be created and used to identify speech; speaker model interpreted as first match profile); and performing the enrollment stage further comprises: creating a second match profile by transforming the first match profile, wherein the first match profile corresponds to a first audio event of the target subject and the second match profile corresponds to a second audio event of the target subject (see Gerl,[0037, 0069] a model adaptation may compare a speaker model that is a member of the speaker model set before and after a potential change. The comparison may determine the divergence or distances between each of the speaker models prior to or after the adaptation. Some systems may determine a Kullback-Leibler entropy. Other systems may execute a cross-correlation. By these exemplary analyses additional processes may be processed with the predetermined criterion to identify a match; speaker model adaptation is interpreted as creating second match profile).
Hansen, Lu, Patel and Gerl are considered to be analogous to the claimed invention because they relate to speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified see Gerl, [0006]).
Regarding claim 11, Hansen, Lu and Patel teach the electronic device of Claim 10 but fail to teach the match profile of the target subject comprises a first match profile; and to perform the enrollment stage, the processor is further configured to: create a second match profile by transforming the first match profile, wherein the first match profile corresponds to a first audio event of the target subject and the second match profile corresponds to a second audio event of the target subject. However Gerl teaches the match profile of the target subject comprises a first match profile (see Gerl, [0063] At 908, a speaker identification component 810 may identify a speaker. In this method, the segment of the current received utterance is processed to determine likelihood functions with respect to each speaker model within the speaker model set. At start-up, the speaker model set may include the UBM. In time, additional speaker models will be created and used to identify speech; speaker model interpreted as first match profile); and to perform the enrollment stage, the processor is further configured to: create a second match profile by transforming the first match profile, wherein the first match profile corresponds to a first audio event of the target subject and the second match profile corresponds to a second audio event of the target subject(see Gerl,[0037, 0069] a model adaptation may compare a speaker model that is a member of the speaker model set before and after a potential change. The comparison may determine the divergence or distances between each of the speaker models prior to or after the adaptation. Some systems may determine a Kullback-Leibler entropy. Other systems may execute a cross-correlation. By these exemplary analyses additional processes may be processed with the predetermined criterion to identify a match; speaker model adaptation is interpreted as creating second match profile).
Hansen, Lu, Patel and Gerl are considered to be analogous to the claimed invention because they relate to speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Hansen, Lu and Patel on using a trained model to identify and process audio recording with the automatic retrain speaker model teachings of Gerl to reduce the training and storing of numerous voice files (see Gerl, [0006]).
Regarding claim 18, Hansen, Lu and Patel teach the non-transitory computer readable medium of Claim 17 but fail to teach the match profile of the target subject comprises a first match profile; and to perform the enrollment stage, the processor is further configured to: create a second match profile by transforming the first match profile, wherein the first match profile corresponds to a first audio event of the target subject and the second match profile corresponds to a second audio event of the target subject. However Gerl teaches the match profile of the target subject comprises a first match profile (see Gerl, [0063] At 908, a speaker identification component 810 may identify a speaker. In this method, the segment of the current received utterance is processed to determine likelihood functions with respect to each speaker model within the speaker model set. At start-up, the speaker model set may include the UBM. In time, additional speaker models will be created and used to identify speech; speaker model interpreted as first match profile); and to perform the enrollment stage, the processor is further configured to: create a second match profile by transforming the first match profile, wherein the first match profile corresponds to a first audio event of the target subject and the second match profile corresponds to a second audio event of the target subject(see Gerl,[0037, 0069] a model adaptation may compare a speaker model that is a member of the speaker model set before and after a potential change. The comparison may determine the divergence or distances between each of the speaker models prior to or after the adaptation. Some systems may determine a Kullback-Leibler entropy. Other systems may execute a cross-correlation. By these exemplary analyses additional processes may be processed with the predetermined criterion to identify a match; speaker model adaptation is interpreted as creating second match profile).
Hansen, Lu and Gerl are considered to be analogous to the claimed invention because they relate to speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Hansen, Lu and Patel on using a trained model to identify and process audio recording with the automatic retrain speaker model teachings of Gerl to reduce the training and storing of numerous voice files (see Gerl, [0006]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Baughman et.al., US Patent 8,589,167 teaches comparison of different models to determine speaker authentication (see Baughman, col 12, lines 32-47).
M. Zhang, Y. Chen, L. Li and D. Wang, "Speaker recognition with cough, laugh and "Wei"," 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 497-501 teaches proposes a speaker recognition task with speech events, such as cough and laugh (see Zhang, abstract).

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656