Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Compact Prosecution
Examiner proposes including the limitations wherein using a classifying model trained with features such as reverberation based features and spectral based features  and noise based features  extracted from phone calls and the call data may include an audio signal containing a recording of some or all of the audio that was streamed or otherwise received from the caller device, and various types of metadata fields such as  phone number, call timestamps, associated protocols and carrier information into the claims to further prosecution. 

Response to Arguments
Applicant’s arguments with respect to claim(s) Claims 1-9 and 11-25  have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 
Regarding the addition of the proposed limitation wherein a classification model is applied against a call data of an incoming calls (ongoing calls or recent calls), the limitation overcame the cited art however a new reference was found Examiner conducted a through search and a detailed explanation is provided below. 
The combination of Balasubramaniyan in view of Kolbegger discloses wherein computer applies a classifier/classification model on call data of a plurality of calls or the 
Kolbegger also addresses  classification of audio calls taken into account call audio information  and media metadata. Section 0062, lines 1-6 “Additional data (besides the audio itself) may be taken into account when performing call classification, such as call and media metadata. Please see the screenshot below. 

    PNG
    media_image1.png
    611
    964
    media_image1.png
    Greyscale

Figure 1: Kolbegger shows clearly in fig. 5 how calls are classified taking into accounts both meta data and Features of the Audio data.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9 and 11-25 are rejected under 35 U.S.C. 103 as being unpatentable over Balasubramaniyan et al (US 2017/0126884) in view of Kolbegger et al. (20150365530).
Claim 1, Balasubramaniyan discloses a computer-implemented method for detecting certain types of caller devices, (Abstract lines 2-3- “detecting call provenance” ) the method comprising receiving, by a computer, (Computer 900 in fig. 2) call data for an incoming call, the call data comprising an audio signal (Call audio, Section 0026, lines 1-3) and metadata; (Section 0029, lines 3-5 “metadata associated with call audio”)
extracting, by a computer, one or more feature-types from the audio signal of the call data for the incoming call; (Section 0031, lines 7-14- thus based on the extracted features the sources (phone device) of the call stream is identified) 
generating, by the computer using a classifier model applied on the call data of the incoming call, (Section 0042, line 8 machine learning classifier 810) a device-type score for the incoming call based upon the one or more feature-types and the metadata (Section 0064, lines 10-12) of the call data  for the incoming call, (Section 0042 , lines 9-12” based on the feature vector (score) the classifier can identify and verify a call audio source (participating phone device-Section 0013, lines 4-6) wherein the classifier model is trained according to the call data for a plurality of calls; (Section 0042, lines 10-11-the feature vector is used to train the classifier that can consistently identify call source, consistently means plurality of calls are used for the training) and
(regarding a classifier model being applied on the call data of the incoming call based upon the one or more feature types and the metadata of the call data for the incoming call, the secondary reference (Kolbegger -20150365530 also addresses this limitation in section 0029 lines 21-23- “the speech classifier module group together certain call features by monitoring the start times and stop times of each respective call feature” and in section 0062 lines 1-4 “Additional data (besides the audio itself) may be taken into account when performing call classification, such as call and media metadata”- this means classifying calls are based on feature types  and meta data of the calls). 
determining, by the computer, that the incoming call originated from a voice assistant device (Section 0036, lines 5-7 “Google Voice”) in response to determining that the device-type score (Feature vector) for the incoming call. (Section 0036, lines 12-18 teaches that based on the codecs such as G.729 (features vector) it can be determined that the call originated from Skype or Google Voice (Voice Assistant device) 
(Note: each corresponding codec has its matching  feature vector (score)-see Section 0061, lines 22-27)

    PNG
    media_image2.png
    975
    1447
    media_image2.png
    Greyscale

Balasubramaniyan does not discloses that the determination is done based on the satisfaction of a threshold value.
Kolbegger discloses based on a score meeting a threshold the system can determines the source of the media. (Section 0044, lines 9-12- thus “a threshold is established to determine whether the signal meets the standard of what  is considered speech or a call feature which is considered to classify the call as either a robocall or any other call. See Section 0059, lines 4-6)
 Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching using a threshold value to make determination of source media (where an audio media originated). The motivation is that it helps user to decide if authenticity of the audio call or signal. 
Claim 2, Balasubramaniyan in view of Kolbegger discloses wherein extracting the one or more feature-types comprises (Balasubramaniyan: Section 0042, lines 1-3- features are aggregated into loss, noise and quality which are the feature types) for at least one feature-type generating by the computer, a plurality of audio segments parsed from the audio signal of the incoming call; (Balasubramaniyan Section 0042, lines 1-5 the features were extracted from the call audio) and extracting, by the computer, one or more features of the at least one feature-type from the plurality of audio segments.( Balasubramaniyan: Section 0034, lines 5 “20ms audio frames” reads on audio samples which can represent a feature also see Section 0036, lines 3-5) 
Claim 3, Balasubramaniyan in view of Kolbegger discloses wherein a feature-type of the at least one feature-type is selected from the group consisting of a reverberation-based feature or features, (Balasubramaniyan: Section 0010, lines 6-8 thus the noise profile is determined by the characteristic unit shown in fig. 1) one or more acoustic features, and sound-to-noise ratio features (Balasubramaniyan “Noise features” Correlation between the noise and the audio signal reads on the sound to noise ratio Section 0058, lines 4-7).
Claim 4, Balasubramaniyan in view of Kolbegger discloses wherein the computer generates the plurality of segments by executing a voice activity detector program configured to generate the plurality of segments parsed from the audio signal, (Balasubramaniyan: Section 0054, lines 1-4- thus the audio signal splits the received audio into packet containing 30ms audio each)   wherein each segment of the plurality of segments generated by the voice activity detector program contains a (Balasubramaniyan: Section 0046, lines 7-8 voice activity detection in VOIP (call from voice assistant device)
 Claim 5, Balasubramaniyan in view of Kolbegger discloses wherein extracting the one or more feature-types comprises (Balasubramaniyan: Section 0042, lines 1-3- as explained above) extracting, by the computer, one or more features of the at least one feature-type based upon an entire audio signal. (Balasubramaniyan: Section 0042, lines 5-7 “provenance fingerprint” is a feature for the entire audio signal since the source of the audio signal is for the entire signal) 
Claim 6, Balasubramaniyan in view of Kolbegger  discloses wherein a feature-type of the at least one feature-type is selected from the group consisting of a spectral feature,  (Balasubramaniyan: Section 0044, lines 11-12- thus the short term energy analysis of the signal reads on the spectral feature) custom mel frequency cepstral coefficients (MFCC) spectral features, a harmonic mean spectral feature, and a short-window correlation acoustic feature. (Kolbegger: Section 0065, lines 9-10- thus Channel waveform 420A shows a shorter time in speech reads on the short window correlation acoustic feature- see Fig. 4A) 
Claim 7, Balasubramaniyan in view of Kolbegger discloses wherein the classifier model is associated with one or more standardization parameters, and wherein generating the device-type score for the incoming call using the classifier model (Balasubramaniyan: Section 0061, lines 1-7 “classifier 810” based on the labels the system can predict the provenance (source) of the call) comprises standardizing, by the computer, values of each respective feature-type according to a standardization parameter corresponding respectively to each feature type. (Balasubramaniyan: Section 0062, lines feature vector (score) is used to determine the provenance of the call (source or device of the call). 
(regarding standardization parameter Balasubramaniyan describes how the multi-label classifier can use a set of standard reduction techniques to convert the multi-label data into a single label which is used to determine the phone type of the call- See section 0062, lines 18-21)
Claim 8, Balasubramaniyan in view of Kolbegger  discloses wherein receiving the call data for the incoming call comprises storing into a database the call data for each respective call of the plurality of calls. (Balasubramaniyan: Section 0069, lines 10-12- “the detection system records the call in a database and analyze the recorded call for fraud prevention”)  
Claim 9, Balasubramaniyan in view of Kolbegger discloses wherein the call data for the plurality of calls used to train the classification model includes at least one call originated from a voice assistant device. (Balasubramaniyan: Section 0062, lines 15-16- thus “re-encoded using iLBC (traversed a VoIP network)”- this means based on the network codec the system is trained to detect call from VoIP-Voice assistant) 
Claim 11, Balasubramaniyan in view of Kolbegger discloses wherein extracting one or more features of a feature-type from each respective call comprises: 
generating, by the computer, one or more statistical parameters based upon a linear predictive coding (Kolbegger: Section 0055, lines predictive model for call classification)  residual of the audio signal and appending, by the computer, the one or more statistical parameters as a reverberation feature of the one or more features of the (Balasubramaniyan: Section 0053, lines 1-5: Linear predictive codec used the residual, the synthesis filters to encode the original speech into a set of parameters that can be transmitted- transmitted means the linear predictive codec is attached to the audio data)
Claim 12, Balasubramaniyan in view of Kolbegger  discloses wherein extracting one or more features of a feature-type from each respective call comprises generating by the computer one or more statistical parameters  (Balasubramaniyan: Section 0032 lines 11-14- thus the Characteristic of a networks such as the Codec shown in Table 1 reads on the statistical parameters) based upon at least one of a spectral rolloff of the audio signal, a spectral contrast of the audio signal, (Balasubramaniyan: Section 0059, lines 9-10 deviation of the received audio signal) a spectral flatness of the audio signal, a spectral bandwidth of the audio signal,  (Balasubramaniyan: Section 0036, lines 12-13 –thus the system determines that the call came from VoIP because of the use of G.729 which requires very low bandwidth) a spectral centroid of the audio signal, and a Fast Fourier Transform of the audio signal; 
 appending, by the computer, the one or more statistical parameters as one or more spectral based features of the one or more features of the respective call data. (Balasubramaniyan: Section 0058, based on the attached waveform codecs (statistical parameters) the incoming call can be identified as a VoIP call (voice assistant) of  Cellular Telephony- See Table 1) 
Claim 13, Balasubramaniyan discloses a computer implemented method for detecting certain types of caller devices, (Abstract lines 2-3- “detecting call provenance” ) the method comprising:
(Call audio, Section 0026, lines 1-3) and metadata of the respective call, (Section 0029, lines 3-5 “metadata associated with call audio”)
 wherein the plurality of calls includes one or more voice assistant calls that involved one or more voice assistant devices, (Table 1 shows that at least one of the calls originated from a voice assistant device-those from “VoIP network”) 
 the call data for each voice assistant call indicates a device-type is a voice assistant device; Section 0036, lines 12-18 teaches that based on the codecs such as G.729 (features vector) it can be determined that the call originated from Skype or Google Voice (Voice Assistant device) 
 for each respective call in the plurality of calls, extracting, by the computer, one or more feature-types from the audio signal of the call data; (Section 0031, lines 7-14- thus based on the extracted features the sources (phone device) of the call stream is identified) 
 training, by the computer, a classification model based on the one or more feature-types extracted from the call data of the plurality of calls, (Section 0042, line 8 machine learning classifier 810 is trained based on the feature extracted from the audio data)
 wherein the classifier model is trained to generate a device-type score according to one or more machine-learning algorithms used for the one or more feature-types; (Section 0042, lines 10-11-the feature vector (score) is used to train the classifier that can consistently identify call source, consistently means plurality of calls are used for the training)
 generating, by the computer, one or more standardization parameters (See section 0062, lines 18-21- standard reduction techniques) ) for the classification model, wherein each feature-type is normalized according to a corresponding standardization parameter; (Section 0061, lines 1-7 “classifier 810” based on the labels the system can predict the provenance (source) of the call) 
identifying, by the computer, that an incoming call is a voice assistant call involving a voice assistant device  (Section 0036, lines 5-7 “Google Voice”) based upon the device-type score (Feature vector) for the incoming call generated by applying the classification model on the call data of an incoming call. (Section 0036, lines 12-18 teaches that based on the codecs such as G.729 (features vector) it can be determined that the call originated from Skype or Google Voice (Voice Assistant device) 
(Note: each corresponding codec has its matching  feature vector (score)-see Section 0061, lines 22-27)
Balasubramaniyan does not discloses storing, by the computer, the classification model in a machine-readable storage.
Kolbegger disclose in Fig. 2 that the speech classifier module 260 is stored in memory 235.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of having a storage for the trained models. The motivation is that it helps data to be accessed easily. 
Claim 14, Balasubramaniyan in view of Kolbegger discloses wherein extracting the one or more feature types comprises, (Balasubramaniyan: Section 0042, lines 1-3- features are aggregated into loss, noise and quality which are the feature types)  for the call data of each respective call generating by the computer a plurality of audio segments parsed from the audio signal of the respective call data (Balasubramaniyan Section 0042, lines 1-5 the features were extracted from the call audio) and extracting, by the computer, one or more features of a feature-type from the plurality of audio segments. (Balasubramaniyan: Section 0034, lines 5 “20ms audio frames” reads on audio samples which can represent a feature also see Section 0036, lines 3-5)
Claim 15, Balasubramaniyan in view of Kolbegger wherein the feature-type is selected from the group consisting of a reverberation-based feature, (Balasubramaniyan: Section 0010, lines 6-8 thus the noise profile is determined by the characteristic unit shown in fig. 1) o one or more cepstral acoustic features, and a sound to-noise ratio. (Balasubramaniyan “Noise features” Correlation between the noise and the audio signal reads on the sound to noise ratio Section 0058, lines 4-7).
Claim 16, Balasubramaniyan in view of Kolbegger discloses wherein the computer generate the plurality of segments from each respective call audio by executing a voice activity detector program configured to generate the plurality of segments parsed from the audio signal of the respective call data, (Balasubramaniyan: Section 0054, lines 1-4- thus the audio signal splits the received audio into packet containing 30ms audio each)
 (Balasubramaniyan: Section 0046, lines 7-8 voice activity detection in VOIP (call from voice assistant device)

Claim 17, Balasubramaniyan in view of Kolbegger discloses wherein extracting the one or more feature-types comprises extracting, by the computer, one or more features of a feature-type based upon an entire audio signal. (Balasubramaniyan: Section 0042, lines 5-7 “provenance fingerprint” is a feature for the entire audio signal since the source of the audio signal is for the entire signal) 

Claim 18, Balasubramaniyan in view of Kolbegger discloses wherein a feature-type of the at least one feature-type is selected from the group consisting of a spectral feature, (Balasubramaniyan: Section 0044, lines 11-12- thus the short term energy analysis of the signal reads on the spectral feature) custom mel frequency cepstral coefficients (MFCC) spectral features, a harmonic mean spectral feature, and a short-window correlation acoustic feature. (Kolbegger: Section 0065, lines 9-10- thus Channel waveform 420A shows a shorter time in speech reads on the short window correlation acoustic feature- see Fig. 4A) 

Claim 19, Balasubramaniyan in view of Kolbegger discloses wherein receiving the call data for the plurality of calls comprises storing, into the database, the call data for each respective call of the plurality of calls. (Balasubramaniyan: Section 0069, lines 10-12- “the detection system records the call in a database and analyze the recorded call for fraud prevention”)  
Claim 20, Balasubramaniyan in view of Kolbegger discloses wherein extracting one or more features of a feature type from each respective call comprises: (Balasubramaniyan Section 0042, lines 1-5 the features were extracted from the call audio) generating, by the computer, one or more statistical parameters based upon a linear predictive coding residual of the audio signal; and appending, by the computer, the one or more statistical parameters as a reverberation feature of the one or more features of the respective call data. (Balasubramaniyan: Section 0053, lines 1-5: Linear predictive codec used the residual, the synthesis filters to encode the original speech into a set of parameters that can be transmitted- transmitted means the linear predictive codec is attached to the audio data)

Claim 21, Balasubramaniyan in view of Kolbegger discloses wherein extracting one or more features of a feature type from each respective call comprises generating, by the computer, one or more statistical parameters (Balasubramaniyan: Section 0032 lines 11-14- thus the Characteristic of a networks such as the Codec shown in Table 1 reads on the statistical parameters) based upon at least one of: 
a spectral rolloff of the audio signal, a spectral contrast of the audio signal, (Balasubramaniyan: Section 0059, lines 9-10 deviation of the received audio signal) a spectral flatness of the audio signal, a spectral bandwidth of the audio signal, a spectral centroid of the audio signal, (Balasubramaniyan: Section 0036, lines 12-13 –thus the system determines that the call came from VoIP because of the use of G.729 which requires very low bandwidth) and a Fast Fourier Transform of the audio signal and appending, by the computer, the one or more statistical parameters as one or more spectral based features of the one or more features of the respective call data. (Balasubramaniyan: Section 0058, based on the attached waveform codecs (statistical parameters) the incoming call can be identified as a VoIP call (voice assistant) of  Cellular Telephony- See Table 1) 

Claim 22, Balasubramaniyan in view of Kolbegger discloses wherein extracting one or more features of a feature type from each respective call (Balasubramaniyan Section 0042, lines 1-5 the features were extracted from the call audio) comprises: extracting, by the computer, mel frequency cepstral coefficients of the audio signal; and appending, (Balasubramaniyan: Section 0029, lines 3-5- thus metadata which includes features are associated (appending) with the call audio) 
 by the computer, the mel frequency cepstral coefficients as one or more spectral-based features of the respective call data. (Balasubramaniyan: Section 0068, lines 6- pitch (i.e frequency))
Claim 23, Balasubramaniyan in view of Kolbegger discloses wherein extracting one or more features of a feature type from each respective call  (Balasubramaniyan Section 0042, lines 1-5 the features were extracted from the call audio) comprises:
 generating, by the computer one or more metrics of a signal-to-noise ratio of the audio signal and appending, by the computer, the one or more metrics as a noise-based feature of the one or more features of the respective call data. (Balasubramaniyan “Noise features” Correlation between the noise and the audio signal reads on the sound to noise ratio Section 0058, lines 4-7).

Claim 24, Balasubramaniyan in view of Kolbegger discloses wherein extracting one or more features of a feature type from each respective call (Balasubramaniyan: Section 0042, lines 1-5 the features were extracted from the call audio) comprises: 
generating, by the computer, a short-window correlation measurement from the audio signal and appending by the computer, the short-window correlation measurement as an acoustic feature of the one or more features of the respective call data. (Balasubramaniyan: Section 0058, lines 4-7 during speech activity a correlation between the noise and the audio signal happens based on the spectral range (short window))
Claim 25, Balasubramaniyan in view of Kolbegger discloses for each respective voice assistant call in the plurality of calls receiving, by the computer from a client computer, (Balasubramaniyan: Monitor 991 in fig. 9) an indicator that the voice assistant call originated from a voice assistant device. (Balasubramaniyan: Section 0027, lines 9-13- thus the caller identification device means the monitor is used to indicate the caller identification which is the provenance/source of the call). 

Claim 10 is  rejected under 35 U.S.C. 103 as being unpatentable over Balasubramaniyan et al (US 2017/0126884) in view of Kolbegger (20150365530) as applied to claims 1-9 and 11-25 above, and further in view of Dowlatkhah (20180249006).
Claim 10, Balasubramaniyan in view of Tang discloses further comprising generating, by the computer, an indicator via a client computer  (Balasubramaniyan: Monitor 991 in fig. 9- see section 0030 lines 5-6) that the incoming call originated from the voice assistant device in response to the computer determining that the device-type score  (Balasubramaniyan: Section 0027, lines 9-13- thus the caller identification device means the monitor is used to indicate the caller identification which is the provenance/source of the call) satisfies the threshold. (Tang: Col. 5 lines 40-44- thus if the score meets a threshold value, a source media is determined)
Balasubramaniyan in view of Kolbegger does not disclose wherein the incoming call is indicated via a GUI of a client computer.
Dowlatkhah discloses a system of processing an automated call based on preferences and conditions where a call from voice assistant device (Robocall)  is indicated via a GUI of a client computer. (Section 0012, See Fig. 9 and 10, also see Section 0039, lines 22-25) 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of displaying the call information via GUI of a client computer. The motivation is that it enables the user be aware of the caller before answering. 



Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Typrin et al. (US9641954) discloses a cloud-based service that receives a user request to bind a cell phone number to a voice-controlled assistant. For example, the user may complete and submit to the cloud-based service, a web-based form through which the cell phone number and an ID associated with the voice-controlled assistant are specified. Other techniques may also be used to request a binding, including, for example, the user submitting the request directly to the cellular carrier through a website, via a phone call, via a text message, or the like, or the user entering a voice command through the VCA, specifying a cell phone number to which the VCA is to be bound.
Dolan et al.  (US8325901) discloses provides a method of pro­viding a called party the ability to screen calls, the method comprising: receiving, at a call processing system, a call from a calling party intended for a called party; placing an out­bound call from the call processing system to a first phone address associated with the called party; transmitting, in sub­stantially real-time at least a portion of a voice communica­tion from the calling party to a first communication device associated with the first phone address so that the called party can screen the call; providing at least the following call han­dling options to the called party: accept the call on the first communication device.
Gaubitch et al. (2018/0041823) discloses wherein the classifying the call based on the comparison of the feature vector to the model includes predicting, based on the comparison of the feature vector to the model, a relative geographic . 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AKWASI M SARPONG/Primary  Examiner, Art Unit 2675                                                                                                                                                                                                        03/16/2022